skip to main content
10.1145/3213846.3213875acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article
Public Access

Evaluating test-suite reduction in real software evolution

Published: 12 July 2018 Publication History

Abstract

Test-suite reduction (TSR) speeds up regression testing by removing redundant tests from the test suite, thus running fewer tests in the future builds. To decide whether to use TSR or not, a developer needs some way to predict how well the reduced test suite will detect real faults in the future compared to the original test suite. Prior research evaluated the cost of TSR using only program versions with seeded faults, but such evaluations do not explicitly predict the effectiveness of the reduced test suite in future builds.
We perform the first extensive study of TSR using real test failures in (failed) builds that occurred for real code changes. We analyze 1478 failed builds from 32 GitHub projects that run their tests on Travis. Each failed build can have multiple faults, so we propose a family of mappings from test failures to faults. We use these mappings to compute Failed-Build Detection Loss (FBDL), the percentage of failed builds where the reduced test suite misses to detect all the faults detected by the original test suite. We find that FBDL can be up to 52.2%, which is higher than suggested by traditional TSR metrics. Moreover, traditional TSR metrics are not good predictors of FBDL, making it difficult for developers to decide whether to use reduced test suites.

References

[1]
{n. d.}. https://github.com/caelum/vraptor4/commit/b2437ab1.
[2]
{n. d.}. https://travisci.org/caelum/vraptor4/builds/15235447.
[3]
{n. d.}. https://github.com/caelum/vraptor4/commit/021d10b7.
[4]
{n. d.}. https://github.com/caelum/vraptor4/commit/49742a2d.
[5]
{n. d.}. A web MVC action-based framework. https://github.com/caelum/ vraptor4.
[6]
{n. d.}. Docker. https://www.docker.com/.
[7]
{n. d.}. GitHub. https://github.com/.
[8]
{n. d.}. Multiclass classification. https://en.wikipedia.org/wiki/Multiclass_ classification.
[9]
{n. d.}. PIT Mutation Operators. http://pitest.org/quickstart/mutators/.
[10]
{n. d.}. Real World Mutation Testing. http://pitest.org.
[11]
{n. d.}. Travis-CI. https://travisci.org/.
[12]
{n. d.}. Travis CI caelum/vraptor4 Builds. https://travisci.org/caelum/vraptor4.
[13]
{n. d.}. Travis Docker Image. https://hub.docker.com/r/travisci/.
[14]
Abdulkareem Alali, Huzefa Kagdi, and Jonathan I. Maletic. 2008. What’s a Typical Commit? A Characterization of Open Source Software Repositories. In ICPC. 182–191.
[15]
Jonathan Bell, Gail Kaiser, Eric Melski, and Mohan Dattatreya. 2015. Efficient Dependency Detection for Safe Java Test Acceleration. In FSE. 770–781.
[16]
Moritz Beller, Georgios Gousios, and Andy Zaidman. 2017. TravisTorrent: Synthesizing Travis CI and GitHub for Full-Stack Research on Continuous Integration. In MSR. 447–450.
[17]
Christian Bird, Peter C. Rigby, Earl T. Barr, David J. Hamilton, Daniel M. German, and Prem Devanbu. 2009. The Promises and Perils of Mining Git. In MSR. 1–10.
[18]
Jennifer Black, Emanuel Melachrinoudis, and David Kaeli. 2004.
[19]
Bi-Criteria Models for All-Uses Test Suite Reduction. In ICSE. 106–115.
[20]
Hudson Borges, Andre Hora, and Marco Tulio Valente. 2016.
[21]
Predicting the Popularity of GitHub Repositories. In PROMISE. 9:1–9:10.
[22]
Caius Brindescu, Mihai Codoban, Sergii Shmarkatiuk, and Danny Dig. 2014.
[23]
How Do Centralized and Distributed Version Control Systems Impact Software Changes?. In ICSE. 322–333.
[24]
Junjie Chen, Yanwei Bai, Dan Hao, Lingming Zhang, Lu Zhang, and Bing Xie. 2017. How Do Assertions Impact Coverage-Based Test-Suite Reduction?. In ICST. 418–423.
[25]
T. Y. Chen and M. F. Lau. 1995. Heuristics Towards the Optimization of the Size of a Test Suite. In SQM. 415–424.
[26]
T. Y. Chen and M. F. Lau. 1998. A New Heuristic for Test Suite Reduction. IST 40, 5-6 (1998), 347–354.
[27]
T. Y. Chen and M. F. Lau. 1998. A Simulation Study on Some Heuristics for Test Suite Reduction. IST 40, 13 (1998), 777–787.
[28]
Sebastian Elbaum, Gregg Rothermel, and John Penix. 2014. Techniques for Improving Regression Testing in Continuous Integration Development Environments. In FSE. 235–245.
[29]
Gordon Fraser and Andreas Zeller. 2010. Mutation-Driven Generation of Unit Tests and Oracles. In ISSTA. 147–158.
[30]
Jingyao Geng, Zheng Li, Ruilian Zhao, and Junxia Guo. 2016. Search Based Test Suite Minimization for Fault Detection and Localization: A Co-driven Method. In SBSE. 34–48.
[31]
Rahul Gopinath, Carlos Jensen, and Alex Groce. 2014. Code Coverage for Suite Evaluation for Developers. In ICSE. 72–82.
[32]
Arnaud Gotlieb and Dusica Marijan. 2014. FLOWER: Optimal Test Suite Reduction As a Network Maximum Flow. In ISSTA. 171–180.
[33]
Alex Groce, Mohammed Amin Alipour, Chaoqiang Zhang, Yang Chen, and John Regehr. 2014. Cause Reduction for Quick Testing. In ICST. 243–252.
[34]
J. P. Guilford. 1956.
[35]
Fundamental Statistics in Psychology and Education.
[36]
Alex Gyori, August Shi, Farah Hariri, and Darko Marinov. 2015. Reliable Testing: Detecting State-Polluting Tests to Prevent Test Dependency. In ISSTA. 223–233.
[37]
Dan Hao, Lu Zhang, Xingxia Wu, Hong Mei, and Gregg Rothermel. 2012. On-Demand Test Suite Reduction. In ICSE. 738–748.
[38]
Mary Jean Harrold, Rajiv Gupta, and Mary Lou Soffa. 1993. A Methodology for Controlling the Size of a Test Suite. TOSEM 2, 3 (1993), 270–285.
[39]
Mary Jean Harrold, David Rosenblum, Gregg Rothermel, and Elaine Weyuker. 2001. Empirical Studies of a Prediction Model for Regression Test Selection. TSE 27, 3 (2001), 248–263.
[40]
Kim Herzig, Michaela Greiler, Jacek Czerwonka, and Brendan Murphy. 2015. The Art of Testing Less without Sacrificing Quality. In ICSE. 483–493.
[41]
Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016.
[42]
Usage, Costs, and Benefits of Continuous Integration in Open-Source Projects. In ASE. 426–437.
[43]
Dennis Jeffrey and Neelam Gupta. 2007. Improving Fault Detection Capability by Selectively Retaining Test Cases During Test Suite Reduction. TSE 33, 2 (2007), 108–123.
[44]
Yue Jia and Mark Harman. 2011. An Analysis and Survey of the Development of Mutation Testing. TSE 37, 5 (2011), 649–678.
[45]
David S. Johnson. 1974. Approximation Algorithms for Combinatorial Problems. JCSS (1974), 256–278.
[46]
James A. Jones and Mary Jean Harrold. 2001. Test-Suite Reduction and Prioritization for Modified Condition/Decision Coverage. In ICSM. 92–102.
[47]
René Just, Darioush Jalali, Laura Inozemtseva, Michael D. Ernst, Reid Holmes, and Gordon Fraser. 2014. Are Mutants a Valid Substitute for Real Faults in Software Testing?. In FSE. 654–665.
[48]
Wing Lam, Sai Zhang, and Michael D. Ernst. 2015.
[49]
When Tests Collide: Evaluating and Coping with the Impact of Test Dependence. Technical Report UW-CSE-15-03- 01. University of Washington, CSE.
[50]
Jun-Wei Lin and Chin-Yu Huang. 2009. Analysis of Test Suite Reduction with Enhanced Tie-breaking Techniques. IST 51, 4 (2009), 679–690.
[51]
Yafeng Lu, Yiling Lou, Shiyang Cheng, Lingming Zhang, Dan Hao, Yangfan Zhou, and Lu Zhang. 2016. How Does Regression Test Prioritization Perform in Real-world Software Evolution?. In ICSE. 535–546.
[52]
Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014.
[53]
An Empirical Analysis of Flaky Tests. In FSE. 643–653.
[54]
Xue-ying Ma, Bin-kui Sheng, and Cheng-qing Ye. 2005.
[55]
Test-Suite Reduction Using Genetic Algorithm. In APPT. 253–262.
[56]
Atif Memon, Zebao Gao, Bao Nguyen, Sanjeev Dhanda, Eric Nickell, Rob Siemborski, and John Micco. 2017.
[57]
Taming Google-Scale Continuous Testing. In ICSE-SEIP. 233–242.
[58]
A Jefferson Offutt, Jie Pan, and Jeffrey M Voas. 1995. Procedures for Reducing the Size of Coverage-based Test Sets. In ICTCS. 111–123.
[59]
Mike Papadakis and Nicos Malevris. 2010. Automatic Mutation Test Case Generation via Dynamic Symbolic Execution. In ISSRE. 121–130.
[60]
Leandro Sales Pinto, Saurabh Sinha, and Alessandro Orso. 2012. Understanding Myths and Realities of Test-Suite Evolution. In FSE. 33:1–33:11.
[61]
Napol Rachatasumrit and Miryung Kim. 2012. An Empirical Investigation into the Impact of Refactoring on Regression Testing. In ICSM. 357–366.
[62]
David S. Rosenblum and Elaine J. Weyuker. 1996. Predicting the Cost-effectiveness of Regression Testing Strategies. In FSE. 118–126.
[63]
David S. Rosenblum and Elaine J. Weyuker. 1997. Using Coverage Information to Predict the Cost-Effectiveness of Regression Testing Strategies. TSE 23, 3 (1997), 146–156.
[64]
Gregg Rothermel, Sebastian Elbaum, Alexey Malishevsky, Praveen Kallakuri, and Brian Davia. 2002. The Impact of Test Suite Granularity on the Cost-effectiveness of Regression Testing. In ICSE. 130–140.
[65]
Gregg Rothermel, Mary Jean Harrold, Jeffery Ostrin, and Christie Hong. 1998. An Empirical Study of the Effects of Minimization on the Fault Detection Capabilities of Test Suites. In ICSM. 34–43.
[66]
Gregg Rothermel, Mary Jean Harrold, Jeffery von Ronne, and Christie Hong. 2002. Empirical Studies of Test-Suite Reduction. STVR 12, 4 (2002), 219–249.
[67]
August Shi, Alex Gyori, Milos Gligoric, Andrey Zaytsev, and Darko Marinov. 2014. Balancing Trade-offs in Test-Suite Reduction. In FSE. 246–256.
[68]
August Shi, Tifany Yung, Alex Gyori, and Darko Marinov. 2015.
[69]
Comparing and Combining Test-Suite Reduction and Regression Test Selection. In ESEC/FSE. 237–247.
[70]
Gustavo Soares, Bruno Catao, Catuxe Varjao, Solon Aguiar, Rohit Gheyi, and Tiago Massoni. 2011. Analyzing Refactorings on Software Repositories. In SBES. 164–173.
[71]
Amitabh Srivastava and Jay Thiagarajan. 2002. Effectively Prioritizing Tests in Development Environment. In ISSTA. 97–106.
[72]
W. Eric Wong, Joseph R. Horgan, Saul London, and Aditya P. Mathur. 1995. Effect of Test Set Minimization on Fault Detection Effectiveness. In ICSE. 41–50.
[73]
W. Eric Wong, Joseph R. Horgan, Aditya P. Mathur, and Alberto Pasquini. 1997.
[74]
Test Set Size Minimization and Fault Detection Effectiveness: A Case Study in a Space Application. In COMPSAC. 522–529.
[75]
Jochen Wuttke, Kıvanç Muşlu, Sai Zhang, and David Notkin. 2013.
[76]
Test Dependence: Theory and Manifestation. Technical Report UW-CSE-13-07-02. University of Washington, CSE.
[77]
Shin Yoo and Mark Harman. 2007. Pareto Efficient Multi-Objective Test Case Selection. In ISSTA. 140–150.
[78]
Shin Yoo and Mark Harman. 2012. Regression Testing Minimization, Selection and Prioritization: A Survey. STVR 22, 2 (2012), 67–120.
[79]
Lingming Zhang, Darko Marinov, Lu Zhang, and Sarfraz Khurshid. 2011.
[80]
An Empirical Study of JUnit Test-Suite Reduction. In ISSRE. 170–179.
[81]
Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D. Ernst, and David Notkin. 2014.
[82]
Empirically Revisiting the Test Independence Assumption. In ISSTA. 385–396.
[83]
Hao Zhong, Lu Zhang, and Hong Mei. 2008.

Cited By

View all
  • (2024)Optimization of Automated and Manual Software Tests in Industrial Practice: A Survey and Historical AnalysisIEEE Transactions on Software Engineering10.1109/TSE.2024.341819150:8(2005-2020)Online publication date: Aug-2024
  • (2023)State of Practical Applicability of Regression Testing Research: A Live Systematic Literature ReviewACM Computing Surveys10.1145/357985155:13s(1-36)Online publication date: 13-Jul-2023
  • (2023)Optimizing Continuous Integration by Dynamic Test Proportion Selection2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00048(438-449)Online publication date: Mar-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis
July 2018
379 pages
ISBN:9781450356992
DOI:10.1145/3213846
  • General Chair:
  • Frank Tip,
  • Program Chair:
  • Eric Bodden
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Test-suite reduction
  2. continuous integration
  3. regression testing

Qualifiers

  • Research-article

Funding Sources

Conference

ISSTA '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)137
  • Downloads (Last 6 weeks)44
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimization of Automated and Manual Software Tests in Industrial Practice: A Survey and Historical AnalysisIEEE Transactions on Software Engineering10.1109/TSE.2024.341819150:8(2005-2020)Online publication date: Aug-2024
  • (2023)State of Practical Applicability of Regression Testing Research: A Live Systematic Literature ReviewACM Computing Surveys10.1145/357985155:13s(1-36)Online publication date: 13-Jul-2023
  • (2023)Optimizing Continuous Integration by Dynamic Test Proportion Selection2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00048(438-449)Online publication date: Mar-2023
  • (2023)Semantic‐aware two‐phase test case prioritization for continuous integrationSoftware Testing, Verification and Reliability10.1002/stvr.186434:1Online publication date: 26-Sep-2023
  • (2022)Towards developer support for merging forked test casesProceedings of the 26th ACM International Systems and Software Product Line Conference - Volume A10.1145/3546932.3547002(131-141)Online publication date: 12-Sep-2022
  • (2022)On the use of mutation analysis for evaluating student test suite qualityProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3533767.3534217(263-275)Online publication date: 18-Jul-2022
  • (2022)ReCoverProceedings of the 19th International Conference on Mining Software Repositories10.1145/3524842.3528490(196-200)Online publication date: 23-May-2022
  • (2022)Prioritization and parallel execution of test cases for certification testing of embedded systemsSoftware Quality Journal10.1007/s11219-022-09594-131:2(471-496)Online publication date: 22-Jul-2022
  • (2021)SIGSOFT Outstanding Doctoral Dissertation AwardACM SIGSOFT Software Engineering Notes10.1145/3468744.346874946:3(17-18)Online publication date: 21-Jul-2021
  • (2021)Empirically evaluating readily available information for regression test optimization in continuous integrationProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464834(491-504)Online publication date: 11-Jul-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media