skip to main content
10.1145/3510003.3510194acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Open access

FlakiMe: laboratory-controlled test flakiness impact assessment

Published: 05 July 2022 Publication History

Abstract

Much research on software testing makes an implicit assumption that test failures are deterministic such that they always witness the presence of the same defects. However, this assumption is not always true because some test failures are due to so-called flaky tests, i.e., tests with non-deterministic outcomes. To help testing researchers better investigate flakiness, we introduce a test flakiness assessment and experimentation platform, called FlakiMe. FlakiMe supports the seeding of a (controllable) degree of flakiness into the behaviour of a given test suite. Thereby, FlakiMe equips researchers with ways to investigate the impact of test flakiness on their techniques under laboratory-controlled conditions. To demonstrate the application of FlakiMe, we use it to assess the impact of flakiness on mutation testing and program repair (the PRAPR and ARJA methods). These results indicate that a 10% flakiness is sufficient to affect the mutation score, but the effect size is modest (2% - 5% ), while it reduces the number of patches produced for repair by 20% up to 100% of repair problems; a devastating impact on this application of testing. Our experiments with FlakiMe demonstrate that flakiness affects different testing applications in very different ways, thereby motivating the need for a laboratory-controllable flakiness impact assessment platform and approach such as FlakiMe.

References

[1]
Azeem Ahmad, Ola Leifler, and Kristian Sandahl. 2019. Empirical Analysis of Factors and their Effect on Test Flakiness - Practitioners' Perceptions. CoRR abs/1906.00673 (2019). arXiv:1906.00673
[2]
Nadia Alshahwan, Andrea Ciancone, Mark Harman, Yue Jia, Ke Mao, Alexandru Marginean, Alexander Mols, Hila Peleg, Federica Sarro, and Ilya Zorin. 2019. Some Challenges for Software Testing Research (Invited Talk Paper). In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (Beijing, China) (ISSTA 2019). Association for Computing Machinery, New York, NY, USA, 1--3.
[3]
Abdulrahman Alshammari, Christopher Morris, Michael Hilton, and Jonathan Bell. 2021. FlakeFlagger: Predicting Flakiness Without Rerunning Tests. In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering. IEEE, 1572--1584.
[4]
Jonathan Bell, Owolabi Legunsen, Michael Hilton, Lamyaa Eloussi, Tifany Yung, and Darko Marinov. 2018. DeFlaker: Automatically Detecting Flaky Tests. In Proceedings of the 40th International Conference on Software Engineering - ICSE '18. ACM, 433--444.
[5]
B. H. P. Camara, M. A. G. Silva, A. T. Endo, and S. R. Vergilio. 2021. What is the Vocabulary of Flaky Tests? An Extended Replication. In Proceedings of the 29th IEEE/ACM International Conference on Program Comprehension. IEEE/ACM, 11.
[6]
Henry Coles, Thomas Laurent, Christopher Henard, Mike Papadakis, and Anthony Ventresque. 2016. PIT: A Practical Mutation Testing Tool for Java (Demo). In Proceedings of the 25th International Symposium on Software Testing and Analysis (Saarbrücken, Germany) (ISSTA 2016). Association for Computing Machinery, New York, NY, USA, 449--452.
[7]
Thomas Durieux, Fernanda Madeiral, Matias Martinez, and Rui Abreu. 2019. Empirical Review of Java Program Repair Tools: A Large-Scale Experiment on 2,141 Bugs and 23,551 Repair Attempts. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Tallinn, Estonia) (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA, 302--313.
[8]
Saikat Dutta, August Shi, Rutvik Choudhary, Zhekun Zhang, Aryaman Jain, and Sasa Misailovic. 2020. Detecting flaky tests in probabilistic and machine learning applications. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 211--224.
[9]
Moritz Eck, Fabio Palomba, Marco Castelluccio, and Alberto Bacchelli. 2019. Understanding Flaky Tests: The Developer's Perspective. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 830--840. arXiv:1907.01466
[10]
Zebao Gao and Atif M. Memon. 2015. Which of My Failures are Real? Using Relevance Ranking to Raise True Failures to the Top. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW). IEEE, 62--69.
[11]
Ali Ghanbari, Samuel Benton, and Lingming Zhang. 2019. Practical Program Repair via Bytecode Mutation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (Beijing, China) (ISSTA 2019). Association for Computing Machinery, New York, NY, USA, 19--30.
[12]
Martin Gruber, Stephan Lukasczyk, Florian Kroiß, and Gordon Fraser. 2021. An Empirical Study of Flaky Tests in Python. In Proceedings of the IEEE International Conference on Software Testing, Verification and Validation. IEEE, 11. arXiv:2101.09077
[13]
Guillaume Haben, Sarra Habchi, Mike Papadakis, Maxime Cordy, and Yves Le Traon. 2021. A Replication Study on the Usability of Code Vocabulary in Predicting Flaky Tests. In Proceedings of the 18th International Conference on Mining Software Repositories. ACM, 11.
[14]
Mark Harman and Peter O'Hearn. 2018. From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis (keynote paper). In Proceedings of the 18th IEEE International Working Conference on Source Code Analysis and Manipulation. 1--23.
[15]
Facebook Inc. 2019. Facebook Testing and Verification request for proposals.
[16]
René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: A Database of Existing Faults to Enable Controlled Testing Studies for Java Programs. In Proceedings of the 2014 International Symposium on Software Testing and Analysis (San Jose, CA, USA) (ISSTA 2014). Association for Computing Machinery, New York, NY, USA, 437--440.
[17]
Tariq M. King, Dionny Santiago, Justin Phillips, and Peter J. Clarke. 2018. Towards a Bayesian Network Model for Predicting Flaky Automated Tests. In Proceedings of the IEEE International Conference on Software Quality, Reliability and Security Companion. IEEE, 100--107.
[18]
Emily Kowalczyk, Karan Nair, Zebao Gao, Leo Silberstein, Teng Long, and Atif Memon. 2020. Modeling and ranking flaky tests at Apple. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice. ACM, 110--119.
[19]
Adriaan Labuschagne, Laura Inozemtseva, and Reid Holmes. 2017. Measuring the cost of regression testing in practice: a study of Java projects using continuous integration. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering. ACM, 821--830.
[20]
Wing Lam, Patrice Godefroid, Suman Nath, Anirudh Santhiar, and Suresh Thummalapenta. 2019. Root causing flaky tests in a large-scale industrial setting. In Proceedings of the 28th ACMSIGSOFT International Symposium on Software Testing and Analysis. ACM, Beijing, China, 101--111.
[21]
Wing Lam, Kivanç Muşlu, Hitesh Sajnani, and Suresh Thummalapenta. 2020. A study on the lifecycle of flaky tests. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. ACM, 1471--1482.
[22]
Wing Lam, Reed Oei, August Shi, Darko Marinov, and Tao Xie. 2019. iDFlakies: A Framework for Detecting and Partially Classifying Flaky Tests. In Proceedings of the 12th IEEE Conference on Software Testing, Validation and Verification (ICST). IEEE, 312--322.
[23]
Wing Lam, August Shi, Reed Oei, Sai Zhang, Michael D. Ernst, and Tao Xie. 2020. Dependent-test-aware regression testing techniques. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 298--311.
[24]
Johannes Lampel, Sascha Just, Sven Apel, and Andreas Zeller. 2021. When life gives you oranges: detecting and diagnosing intermittent job failures at Mozilla. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, New York, NY, USA, 1381--1392.
[25]
Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012. GenProg: A Generic Method for Automatic Software Repair. IEEE Transactions on Software Engineering 38, 1 (2012), 54--72.
[26]
Claire Leong, Abhayendra Singh, Mike Papadakis, Yves Le Traon, and John Micco. 2019. Assessing Transition-Based Test Selection Algorithms at Google. In Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice (Montreal, Quebec, Canada) (ICSE-SEIP '19). IEEE Press, 101--110.
[27]
Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An empirical analysis of flaky tests. In 22nd International Symposium on Foundations of Software Engineering (FSE 2014), Shing-Chi Cheung, Alessandro Orso, and Margaret-Anne Storey (Eds.). ACM, Hong Kong, China, 643--653.
[28]
Jean Malm, Adnan Causevic, Björn Lisper, and Sigrid Eldh. 2020. Automated Analysis of Flakiness-mitigating Delays. In Proceedings of the IEEE/ACM 1st International Conference on Automation of Software Test. ACM, 81--84.
[29]
Atif M. Memon and Myra B. Cohen. 2013. Automated testing of GUI applications: models, tools, and controlling flakiness. In Proceedings of the 35th International Conference on Software Engineering, David Notkin, Betty H. C. Cheng, and Klaus Pohl (Eds.). IEEE Computer Society, San Francisco, CA, USA, 1479--1480.
[30]
Atif M. Memon, Zebao Gao, Bao N. Nguyen, Sanjeev Dhanda, Eric Nickell, Rob Siemborski, and John Micco. 2017. Taming Google-Scale Continuous Testing. In 39th international Conference on Software Engineering, Software Engineering in Practice Track (ICSE-SEIP). IEEE, Buenos Aires, Argentina, 233--242.
[31]
Mike Papadakis, Marinos Kintis, Jie Zhang, Yue Jia, Yves Le Traon, and Mark Harman. 2019. Chapter Six - Mutation Testing Advances: An Analysis and Survey. In Advances in Computers, Atif M. Memon (Ed.). Advances in Computers, Vol. 112. Elsevier, 275--378.
[32]
Owain Parry, Gregory M. Kapfhammer, Michael Hilton, and Phil McMinn. 2020. Flake It 'Till You Make It. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops. ACM, 11--12.
[33]
Samad Paydar and Aidin Azamnouri. 2019. An Experimental Study on Flakiness and Fragility of Randoop Regression Test Suites. In Fundamentals of Software Engineering, Hossein Hojjat and Mieke Massink (Eds.). Springer International Publishing, 111--126.
[34]
Gustavo Pinto, Breno Miranda, Supun Dissanayake, Marcelo D'Amorim, Christoph Treude, and Antonia Bertolino. 2020. What is the Vocabulary of Flaky Tests?. In Proceedings of the 17th International Conference on Mining Software Repositories. ACM, 492--502.
[35]
Kai Presler-Marshall, Eric Horton, Sarah Heckman, and Kathryn T Stolee. 2019. Wait Wait . No, Tell Me. Analyzing Selenium Configuration Effects on Test Flakiness .. In Proceedings of the IEEE/ACM 14th International Workshop on Automation of Software Test (AST). IEEE, Montreal, Canada, 2--8.
[36]
Yihao Qin, Shangwen Wang, Kui Liu, Xiaoguang Mao, and Tegawende F. Bissyande. 2021. On the Impact of Flaky Tests in Automated Program Repair. In Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE, 295--306.
[37]
August Shi, Jonathan Bell, and Darko Marinov. 2019. Mitigating the effects of flaky tests on mutation testing. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 112--122.
[38]
August Shi, Wing Lam, Reed Oei, Tao Xie, and Darko Marinov. 2019. IFixFlakies: A Framework for Automatically Fixing Order-Dependent Flaky Tests. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). Association for Computing Machinery, 545--555.
[39]
Denini Silva, Leopoldo Teixeira, and Marcelo D'Amorim. 2020. Shake It! Detecting Flaky Tests Caused by Concurrency with Shaker. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution. IEEE, 301--311.
[40]
Bela Vancsics, Tamas Gergely, and Arpad Beszedes. 2020. Simulating the Effect of Test Flakiness on Fault Localization Effectiveness. In Proceedings of the IEEE Workshop on Validation, Analysis and Evolution of Software Tests. IEEE, 28--35.
[41]
Roberto Verdecchia, Emilio Cruciani, Breno Miranda, and Antonia Bertolino. 2021. Know You Neighbor: Fast Static Prediction of Test Flakiness. IEEE Access 9 (2021), 76119--76134.
[42]
W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A Survey on Software Fault Localization. IEEE Transactions on Software Engineering 42, 8 (2016), 707--740.
[43]
Yuan Yuan and Wolfgang Banzhaf. 2020. ARJA: Automated Repair of Java Programs via Multi-Objective Genetic Programming. IEEE Transactions on Software Engineering 46, 10 (2020), 1040--1067.

Cited By

View all
  • (2024)Automated Unit Test Improvement using Large Language Models at MetaCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663839(185-196)Online publication date: 10-Jul-2024
  • (2024)Automatically Reproducing Timing-Dependent Flaky-Test Failures2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00032(269-280)Online publication date: 27-May-2024
  • (2024)Test Code Flakiness in Mobile AppsInformation and Software Technology10.1016/j.infsof.2023.107394168:COnline publication date: 17-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '22: Proceedings of the 44th International Conference on Software Engineering
May 2022
2508 pages
ISBN:9781450392211
DOI:10.1145/3510003
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2022

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ICSE '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)115
  • Downloads (Last 6 weeks)16
Reflects downloads up to 04 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Automated Unit Test Improvement using Large Language Models at MetaCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663839(185-196)Online publication date: 10-Jul-2024
  • (2024)Automatically Reproducing Timing-Dependent Flaky-Test Failures2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00032(269-280)Online publication date: 27-May-2024
  • (2024)Test Code Flakiness in Mobile AppsInformation and Software Technology10.1016/j.infsof.2023.107394168:COnline publication date: 17-Apr-2024
  • (2024)Flakiness goes liveInformation and Software Technology10.1016/j.infsof.2023.107373167:COnline publication date: 12-Apr-2024
  • (2023)Transforming Test Suites into CroissantsProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598119(1080-1092)Online publication date: 12-Jul-2023
  • (2023)Software Testing Research Challenges: An Industrial Perspective2023 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST57152.2023.00008(1-10)Online publication date: Apr-2023
  • (2023)Debugging Flaky Tests using Spectrum-based Fault Localization2023 IEEE/ACM International Conference on Automation of Software Test (AST)10.1109/AST58925.2023.00017(128-139)Online publication date: May-2023
  • (2023)Test flakiness’ causes, detection, impact and responsesJournal of Systems and Software10.1016/j.jss.2023.111837206:COnline publication date: 1-Dec-2023
  • (2022)Static test flakiness prediction: How Far Can We Go?Empirical Software Engineering10.1007/s10664-022-10227-127:7Online publication date: 1-Dec-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media