skip to main content
10.1145/3643991.3644914acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

A Mutation-Guided Assessment of Acceleration Approaches for Continuous Integration: An Empirical Study of YourBase

Published: 02 July 2024 Publication History

Abstract

Continuous Integration (CI) is a popular software development practice that quickly verifies updates to codebases. To cope with the ever-increasing demand for faster software releases, CI acceleration approaches have been proposed; however, adoption of CI acceleration is not without risks. For example, CI acceleration products may mislabel change sets (e.g., a build labeled as failing that passes in an unaccelerated setting or vice versa) or produce results that are inconsistent with an unaccelerated build (e.g., the underlying reasons for failure differ between (un)accelerated builds). These inconsistencies threaten the trustworthiness of CI acceleration products.
In this paper, we propose an approach inspired by mutation testing to systematically evaluate the trustworthiness of CI acceleration. We apply our approach to YourBase, a program analysis-based CI acceleration product, and uncover issues that hinder its trustworthiness. First, we study how often the same build in accelerated and unaccelerated CI settings produce different mutation testing outcomes. We call mutants with different outcomes in the two settings "gap mutants". Next, we study the code locations where gap mutants appear. Finally, we inspect gap mutants to understand why acceleration causes them to survive. Our analysis of ten open-source projects uncovers 2,237 gap mutants. We find that: (1) the gap mutants account for 0.11%--23.50% of the studied mutants; (2) 88.95% of gap mutants can be mapped to specific source code functions and classes using the dependency representation of the studied CI acceleration product; and (3) 69% of gap mutants survive CI acceleration due to deterministic reasons that can be classified into six fault patterns. Our results show that even deterministic CI acceleration solutions suffer from trustworthiness limitations, and highlight the ways in which trustworthiness could be pragmatically improved.

References

[1]
Rabe Abdalkareem, Suhaib Mujahid, and Emad Shihab. 2020. A machine learning approach to improve the detection of ci skip commits. IEEE Transactions on Software Engineering (2020).
[2]
Bram Adams. 2009. Co-evolution of source code and the build system. In 2009 IEEE International Conference on Software Maintenance. IEEE, 461--464.
[3]
Amal Akli, Guillaume Haben, Sarra Habchi, Mike Papadakis, and Yves Le Traon. 2022. Predicting Flaky Tests Categories using Few-Shot Learning. arXiv preprint arXiv:2208.14799 (2022).
[4]
Hirohisa Aman, Sousuke Amasaki, Tomoyuki Yokogawa, and Minoru Kawahara. 2020. A comparative study of vectorization-based static test case prioritization methods. In 2020 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 80--88.
[5]
Richard Baker and Ibrahim Habli. 2012. An empirical evaluation of mutation testing for improving the test quality of safety-critical software. IEEE Transactions on Software Engineering 39, 6 (2012), 787--805.
[6]
Lívia Barbosa and Andre Hora. 2022. How and Why Developers Migrate Python Tests. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 538--548.
[7]
Jonathan Bell, Gail Kaiser, Eric Melski, and Mohan Dattatreya. 2015. Efficient dependency detection for safe Java test acceleration. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 770--781.
[8]
Moritz Beller, Chu-Pan Wong, Johannes Bader, Andrew Scott, Mateusz Machalica, Satish Chandra, and Erik Meijer. 2021. What it would take to use mutation testing in industry---a study at facebook. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 268--277.
[9]
Jean-Marcel Belmont. 2018. Hands-On Continuous Integration and Delivery: Build and release quality software at scale with Jenkins, Travis CI, and CircleCI. Packt Publishing Ltd.
[10]
Antonia Bertolino, Antonio Guerriero, Breno Miranda, Roberto Pietrantuono, and Stefano Russo. 2020. Learning-to-Rank vs Ranking-to-Learn: Strategies for Regression Testing in Continuous Integration. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE '20). Association for Computing Machinery, New York, NY, USA, 1--12.
[11]
Qi Cao, Ruiyin Wen, and Shane McIntosh. 2017. Forecasting the Duration of Incremental Build Jobs. In Proc. of the International Conference on Software Maintenance and Evolution (ICSME). 524--528.
[12]
Cagatay Catal and Deepti Mishra. 2013. Test case prioritization: a systematic mapping study. Software Quality Journal 21, 3 (2013), 445--478.
[13]
An Ran Chen, Tse-Hsun Peter Chen, and Shaowei Wang. 2023. T-Evos: A Large-Scale Longitudinal Study on CI Test Execution and Failure. IEEE Transactions on Software Engineering 49, 4 (2023), 2352--2365.
[14]
Junjie Chen, Yiling Lou, Lingming Zhang, Jianyi Zhou, Xiaoleng Wang, Dan Hao, and Lu Zhang. 2018. Optimizing Test Prioritization via Test Distribution Analysis. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA, 656--667.
[15]
Frédéric Dadeau, Pierre-Cyrille Héam, and Rafik Kheddam. 2011. Mutation-based test generation from security protocols in HLPSL. In 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation. IEEE, 240--248.
[16]
Frédéric Dadeau, Pierre-Cyrille Héam, Rafik Kheddam, Ghazi Maatoug, and Michael Rusinowitch. 2015. Model-based mutation testing from security protocols in HLPSL. Software Testing, Verification and Reliability 25, 5-7 (2015), 684--711.
[17]
Richard A DeMillo, Richard J Lipton, and Frederick G Sayward. 1978. Hints on test data selection: Help for the practicing programmer. Computer 11, 4 (1978), 34--41.
[18]
Lin Deng, Nariman Mirzaei, Paul Ammann, and Jeff Offutt. 2015. Towards mutation analysis of android apps. In 2015 IEEE Eighth International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE, 1--10.
[19]
Anna Derezińska and Konrad Hałas. 2014. Analysis of mutation operators for the python language. In Proceedings of the Ninth International Conference on Dependability and Complex Systems DepCoS-RELCOMEX. June 30--July 4, 2014, Brunów, Poland. Springer, 155--164.
[20]
Anna Derezinska and Konrad Halas. 2014. Experimental evaluation of mutation testing approaches to python programs. In 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops. IEEE, 156--164.
[21]
Daniel Di Nardo, Nadia Alshahwan, Lionel Briand, and Yvan Labiche. 2015. Coverage-based regression test case selection, minimization and prioritization: A case study on an industrial system. Software Testing, Verification and Reliability 25, 4 (2015), 371--396.
[22]
Stefan Dösinger, Richard Mordinyi, and Stefan Biffl. 2012. Communicating continuous integration servers for increasing effectiveness of automated testing. In 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. IEEE, 374--377.
[23]
Thomas Durieux, Rui Abreu, Martin Monperrus, Tegawendé F Bissyandé, and Luís Cruz. 2019. An analysis of 35+ million jobs of Travis CI. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 291--295.
[24]
Kathleen M Eisenhardt. 1989. Building theories from case study research. Academy of management review 14 (1989), 532--550.
[25]
Matheus Ferreira, Lincoln Costa, and Francisco Carlos Souza. 2020. Search-based Test Data Generation for Mutation Testing: a tool for Python programs. In Anais da IV Escola Regional de Engenharia de Software. SBC, 116--125.
[26]
Keheliya Gallaba, John Ewart, Yves Junqueira, and Shane McIntosh. 2021. Accelerating Continuous Integration by Caching Environments and Inferring Dependencies. IEEE Transactions on Software Engineering (2021), To appear.
[27]
Keheliya Gallaba, Maxime Lamothe, and Shane McIntosh. 2022. Lessons from Eight Years of Operational Data from a Continuous Integration Service: An Exploratory Case Study of CircleCI. In Proc. of the International Conference on Software Engineering (ICSE). 1330--1342.
[28]
Keheliya Gallaba and Shane McIntosh. 2020. Use and Misuse of Continuous Integration Features: An Empirical Study of Projects that (mis)use Travis CI. IEEE Transactions on Software Engineering 46, 1 (2020), 33--50.
[29]
Alessio Gambi, Jonathan Bell, and Andreas Zeller. 2018. Practical test dependency detection. In 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST). IEEE, 1--11.
[30]
Taher Ahmed Ghaleb, Daniel Alencar Da Costa, and Ying Zou. 2019. An empirical study of the long duration of continuous integration builds. Empirical Software Engineering 24 (2019), 2102--2139.
[31]
Milos Gligoric, Rupak Majumdar, Rohan Sharma, Lamyaa Eloussi, and Darko Marinov. 2014. Regression test selection for distributed software histories. In International Conference on Computer Aided Verification. Springer, 293--309.
[32]
Martin Gruber, Stephan Lukasczyk, Florian Kroiß, and Gordon Fraser. 2021. An empirical study of flaky tests in python. In 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE, 148--158.
[33]
Sarra Habchi, Maxime Cordy, Mike Papadakis, and Yves Le Traon. 2021. On the use of mutation in injecting test order-dependency. arXiv preprint arXiv:2104.07441 (2021).
[34]
Mary Jean Harrold. 2000. Testing: a roadmap. In Proceedings of the Conference on the Future of Software Engineering. 61--72.
[35]
Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016. Usage, costs, and benefits of continuous integration in open-source projects. In 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 426--437.
[36]
Toshiki Hirao, Shane McIntosh, Akinori Ihara, and Kenichi Matsumoto. 2019. The review linkage graph for code review analytics: a recovery approach and empirical study. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Tallinn, Estonia) (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA, 578--589.
[37]
Dominik Holling, Sebastian Banescu, Marco Probst, Ana Petrovska, and Alexander Pretschner. 2016. Nequivack: Assessing mutation score confidence. In 2016 IEEE Ninth International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE, 152--161.
[38]
Mostafa Jangali, Yiming Tang, Niclas Alexandersson, Philipp Leitner, Jinqiu Yang, and Weiyi Shang. 2022. Automated generation and evaluation of JMH microbenchmark suites from unit tests. IEEE Transactions on Software Engineering (2022).
[39]
Yue Jia and Mark Harman. 2010. An analysis and survey of the development of mutation testing. IEEE transactions on software engineering 37, 5 (2010), 649--678.
[40]
Xianhao Jin and Francisco Servant. 2022. Which builds are really safe to skip? Maximizing failure observation for build selection in continuous integration. Journal of Systems and Software 188 (2022), 111292.
[41]
René Just, Darioush Jalali, Laura Inozemtseva, Michael D Ernst, Reid Holmes, and Gordon Fraser. 2014. Are mutants a valid substitute for real faults in software testing?. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 654--665.
[42]
Samuel J Kaufman, Ryan Featherman, Justin Alvin, Bob Kurtz, Paul Ammann, and René Just. 2022. Prioritizing mutants to guide mutation testing. In Proceedings of the 44th International Conference on Software Engineering. 1743--1754.
[43]
Eero Kauhanen, Jukka K. Nurminen, Tommi Mikkonen, and Matvei Pashkovskiy. 2021. Regression Test Selection Tool for Python in Continuous Integration Process. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 618--621.
[44]
Remo Lachmann, Sandro Schulze, Manuel Nieke, Christoph Seidl, and Ina Schaefer. 2016. System-level test case prioritization using machine learning. In 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 361--368.
[45]
Jingjing Liang, Sebastian Elbaum, and Gregg Rothermel. 2018. Redefining Prioritization: Continuous Prioritization for Continuous Integration. In Proceedings of the 40th International Conference on Software Engineering (Gothenburg, Sweden) (ICSE '18). Association for Computing Machinery, New York, NY, USA, 688--698.
[46]
Christian Macho, Shane McIntosh, and Martin Pinzger. 2016. Predicting Build Co-Changes with Source Code Change and Commit Categories. In Proc. of the International Conference on Software Analysis, Evolution, and Reengineering (SANER). 541--551.
[47]
Christian Macho, Shane McIntosh, and Martin Pinzger. 2017. Extracting build changes with builddiff. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 368--378.
[48]
Christian Macho, Shane McIntosh, and Martin Pinzger. 2018. Automatically repairing dependency-related build breakage. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 106--117.
[49]
Shane McIntosh, Bram Adams, Meiyappan Nagappan, and Ahmed E Hassan. 2014. Mining co-change information to understand when build changes are necessary. In 2014 IEEE International Conference on Software Maintenance and Evolution. IEEE, 241--250.
[50]
Ade Miller. 2008. A hundred days of continuous integration. In Agile 2008 conference. IEEE, 289--293.
[51]
Peter Miller. 1998. Recursive make considered harmful. AUUGN Journal of AUUG Inc 19, 1 (1998), 14--25.
[52]
Ioannis K Moutsatsos, Imtiaz Hossain, Claudia Agarinis, Fred Harbinski, Yann Abraham, Luc Dobler, Xian Zhang, Christopher J Wilson, Jeremy L Jenkins, Nicholas Holway, et al. 2017. Jenkins-CI, an open-source continuous integration system, as a scientific data and image-processing platform. SLAS DISCOVERY: Advancing Life Sciences R&D 22, 3 (2017), 238--249.
[53]
Armin Najafi, Weiyi Shang, and Peter C. Rigby. 2019. Improving Test Effectiveness Using Test Executions History: An Industrial Experience Report. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 213--222.
[54]
Milos Ojdanic, Ezekiel Soremekun, Renzo Degiovanni, Mike Papadakis, and Yves Le Traon. 2023. Mutation Testing in Evolving Systems: Studying the Relevance of Mutants to Code Evolution. ACM Trans. Softw. Eng. Methodol. 32, 1, Article 14 (feb 2023), 39 pages.
[55]
Milos Ojdanic, Ezekiel Soremekun, Renzo Degiovanni, Mike Papadakis, and Yves Le Traon. 2023. Mutation Testing in Evolving Systems: Studying the Relevance of Mutants to Code Evolution. ACM Trans. Softw. Eng. Methodol. 32, 1, Article 14 (feb 2023), 39 pages.
[56]
Rongqi Pan, Mojtaba Bagherzadeh, Taher A. Ghaleb, and Lionel Briand. 2021. Test case selection and prioritization using machine learning: a systematic literature review. Empirical Software Engineering 27, 2 (14 Dec 2021), 29.
[57]
Mike Papadakis, Christopher Henard, and Yves Le Traon. 2014. Sampling program inputs with mutation analysis: Going beyond combinatorial interaction testing. In 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation. IEEE, 1--10.
[58]
Mike Papadakis, Marinos Kintis, Jie Zhang, Yue Jia, Yves Le Traon, and Mark Harman. 2019. Mutation testing advances: an analysis and survey. In Advances in Computers. Vol. 112. Elsevier, 275--378.
[59]
Mike Papadakis, Donghwan Shin, Shin Yoo, and Doo-Hwan Bae. 2018. Are mutation scores correlated with real fault detection? a large scale empirical study on the relationship between mutants and real faults. In Proceedings of the 40th International Conference on Software Engineering. 537--548.
[60]
Goran Petrović and Marko Ivanković. 2018. State of mutation testing at google. In Proceedings of the 40th international conference on software engineering: Software engineering in practice. 163--171.
[61]
Goran Petrović, Marko Ivanković, Gordon Fraser, and René Just. 2021. Does mutation testing improve testing practices?. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 910--921.
[62]
Goran Petrović, Marko Ivanković, Gordon Fraser, and René Just. 2021. Practical mutation testing at scale: A view from Google. IEEE Transactions on Software Engineering 48, 10 (2021), 3900--3912.
[63]
Goran Petrovic, Marko Ivankovic, Bob Kurtz, Paul Ammann, and René Just. 2018. An industrial application of mutation testing: Lessons, challenges, and research directions. In 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE, 47--53.
[64]
Gustavo Pinto, Fernando Castor, Rodrigo Bonifacio, and Marcel Rebouças. 2018. Work practices and challenges in continuous integration: A survey with Travis CI users. Software: Practice and Experience 48, 12 (2018), 2223--2236.
[65]
Alessandro Viola Pizzoleto, Fabiano Cutigi Ferrari, Jeff Offutt, Leo Fernandes, and Márcio Ribeiro. 2019. A systematic literature review of techniques and metrics to reduce the cost of mutation testing. Journal of Systems and Software 157 (2019), 110388.
[66]
Jackson A. Prado Lima and Silvia R. Vergilio. 2020. Test Case Prioritization in Continuous Integration environments: A systematic mapping study. Information and Software Technology 121 (2020), 106268.
[67]
Chuck Rossi, Elisa Shibley, Shi Su, Kent Beck, Tony Savor, and Michael Stumm. 2016. Continuous deployment of mobile software at facebook (showcase). In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 12--23.
[68]
Gregg Rothermel, Roland H Untch, Chengyun Chu, and Mary Jean Harrold. 1999. Test case prioritization: An empirical study. In Proceedings IEEE International Conference on Software Maintenance-1999 (ICSM'99).'Software Maintenance for Business Change'(Cat. No. 99CB36360). IEEE, 179--188.
[69]
Hyunmin Seo, Caitlin Sadowski, Sebastian Elbaum, Edward Aftandilian, and Robert Bowdidge. 2014. Programmers' build errors: a case study (at google). In Proceedings of the 36th International Conference on Software Engineering. 724--734.
[70]
Daniel Ståhl and Jan Bosch. 2013. Experienced benefits of continuous integration in industry software product development: A case study. In The 12th IASTED International Conference on Software Engineering. 736--743.
[71]
Daniel Ståhl and Jan Bosch. 2014. Modeling continuous integration practice differences in industry software development. Journal of Systems and Software 87 (2014), 48--59.
[72]
Daniel Ståhl and Jan Bosch. 2016. Industry application of continuous integration modeling: a multiple-case study. In 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C). IEEE, 270--279.
[73]
Mohsen Vakilian, Raluca Sauciuc, J David Morgenthaler, and Vahab Mirrokni. 2015. Automated decomposition of build targets. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 123--133.
[74]
Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov. 2015. Quality and productivity outcomes relating to continuous integration in GitHub. In Proceedings of the 2015 10th joint meeting on foundations of software engineering. 805--816.
[75]
Carmine Vassallo, Sebastian Proksch, Harald C Gall, and Massimiliano Di Penta. 2019. Automated reporting of anti-patterns and decay in continuous integration. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 105--115.
[76]
Moshi Wei, Yuchao Huang, Jinqiu Yang, Junjie Wang, and Song Wang. 2022. Cocofuzzing: Testing neural code models with coverage-guided fuzzing. IEEE Transactions on Reliability (2022).
[77]
Tao Xiao, Dong Wang, Shane McIntosh, Hideaki Hata, Raula Gaikovina Kula, Takashi Ishio, and Kenichi Matsumoto. 2021. Characterizing and mitigating self-admitted technical debt in build systems. IEEE Transactions on Software Engineering 48, 10 (2021), 4214--4228.
[78]
Yunwen Ye and Kouichi Kishida. 2003. Toward an understanding of the motivation of open source software developers. In 25th International Conference on Software Engineering, 2003. Proceedings. IEEE, 419--429.
[79]
Zhe Yu, Fahmid Fahid, Tim Menzies, Gregg Rothermel, Kyle Patrick, and Snehit Cherian. 2019. TERMINATOR: Better Automated UI Test Case Prioritization. Association for Computing Machinery, New York, NY, USA.
[80]
Fiorella Zampetti, Carmine Vassallo, Sebastiano Panichella, Gerardo Canfora, Harald Gall, and Massimiliano Di Penta. 2020. An empirical characterization of bad practices in continuous integration. Empirical Software Engineering 25, 2 (2020), 1095--1135.
[81]
Yuecai Zhu, Emad Shihab, and Peter C. Rigby. 2018. Test Re-Prioritization in Continuous Testing Environments. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). 69--79.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories
April 2024
788 pages
ISBN:9798400705878
DOI:10.1145/3643991
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Qualifiers

  • Research-article

Conference

MSR '24
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 11
    Total Downloads
  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 04 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media