research-article

Public Access

Evaluating test-suite reduction in real software evolution

Authors:

Suleman Mahmood,

Darko MarinovAuthors Info & Claims

ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 84 - 94

https://doi.org/10.1145/3213846.3213875

Published: 12 July 2018 Publication History

Abstract

Test-suite reduction (TSR) speeds up regression testing by removing redundant tests from the test suite, thus running fewer tests in the future builds. To decide whether to use TSR or not, a developer needs some way to predict how well the reduced test suite will detect real faults in the future compared to the original test suite. Prior research evaluated the cost of TSR using only program versions with seeded faults, but such evaluations do not explicitly predict the effectiveness of the reduced test suite in future builds.

We perform the first extensive study of TSR using real test failures in (failed) builds that occurred for real code changes. We analyze 1478 failed builds from 32 GitHub projects that run their tests on Travis. Each failed build can have multiple faults, so we propose a family of mappings from test failures to faults. We use these mappings to compute Failed-Build Detection Loss (FBDL), the percentage of failed builds where the reduced test suite misses to detect all the faults detected by the original test suite. We find that FBDL can be up to 52.2%, which is higher than suggested by traditional TSR metrics. Moreover, traditional TSR metrics are not good predictors of FBDL, making it difficult for developers to decide whether to use reduced test suites.

References

[1]

{n. d.}. https://github.com/caelum/vraptor4/commit/b2437ab1.

[2]

{n. d.}. https://travisci.org/caelum/vraptor4/builds/15235447.

[3]

{n. d.}. https://github.com/caelum/vraptor4/commit/021d10b7.

[4]

{n. d.}. https://github.com/caelum/vraptor4/commit/49742a2d.

[5]

{n. d.}. A web MVC action-based framework. https://github.com/caelum/ vraptor4.

[6]

{n. d.}. Docker. https://www.docker.com/.

[7]

{n. d.}. GitHub. https://github.com/.

[8]

{n. d.}. Multiclass classification. https://en.wikipedia.org/wiki/Multiclass_ classification.

[9]

{n. d.}. PIT Mutation Operators. http://pitest.org/quickstart/mutators/.

[10]

{n. d.}. Real World Mutation Testing. http://pitest.org.

[11]

{n. d.}. Travis-CI. https://travisci.org/.

[12]

{n. d.}. Travis CI caelum/vraptor4 Builds. https://travisci.org/caelum/vraptor4.

[13]

{n. d.}. Travis Docker Image. https://hub.docker.com/r/travisci/.

[14]

Abdulkareem Alali, Huzefa Kagdi, and Jonathan I. Maletic. 2008. What’s a Typical Commit? A Characterization of Open Source Software Repositories. In ICPC. 182–191.

Digital Library

[15]

Jonathan Bell, Gail Kaiser, Eric Melski, and Mohan Dattatreya. 2015. Efficient Dependency Detection for Safe Java Test Acceleration. In FSE. 770–781.

Digital Library

[16]

Moritz Beller, Georgios Gousios, and Andy Zaidman. 2017. TravisTorrent: Synthesizing Travis CI and GitHub for Full-Stack Research on Continuous Integration. In MSR. 447–450.

Digital Library

[17]

Christian Bird, Peter C. Rigby, Earl T. Barr, David J. Hamilton, Daniel M. German, and Prem Devanbu. 2009. The Promises and Perils of Mining Git. In MSR. 1–10.

Digital Library

[18]

Jennifer Black, Emanuel Melachrinoudis, and David Kaeli. 2004.

[19]

Bi-Criteria Models for All-Uses Test Suite Reduction. In ICSE. 106–115.

Digital Library

[20]

Hudson Borges, Andre Hora, and Marco Tulio Valente. 2016.

[21]

Predicting the Popularity of GitHub Repositories. In PROMISE. 9:1–9:10.

[22]

Caius Brindescu, Mihai Codoban, Sergii Shmarkatiuk, and Danny Dig. 2014.

[23]

How Do Centralized and Distributed Version Control Systems Impact Software Changes?. In ICSE. 322–333.

[24]

Junjie Chen, Yanwei Bai, Dan Hao, Lingming Zhang, Lu Zhang, and Bing Xie. 2017. How Do Assertions Impact Coverage-Based Test-Suite Reduction?. In ICST. 418–423.

[25]

T. Y. Chen and M. F. Lau. 1995. Heuristics Towards the Optimization of the Size of a Test Suite. In SQM. 415–424.

[26]

T. Y. Chen and M. F. Lau. 1998. A New Heuristic for Test Suite Reduction. IST 40, 5-6 (1998), 347–354.

[27]

T. Y. Chen and M. F. Lau. 1998. A Simulation Study on Some Heuristics for Test Suite Reduction. IST 40, 13 (1998), 777–787.

[28]

Sebastian Elbaum, Gregg Rothermel, and John Penix. 2014. Techniques for Improving Regression Testing in Continuous Integration Development Environments. In FSE. 235–245.

Digital Library

[29]

Gordon Fraser and Andreas Zeller. 2010. Mutation-Driven Generation of Unit Tests and Oracles. In ISSTA. 147–158.

Digital Library

[30]

Jingyao Geng, Zheng Li, Ruilian Zhao, and Junxia Guo. 2016. Search Based Test Suite Minimization for Fault Detection and Localization: A Co-driven Method. In SBSE. 34–48.

[31]

Rahul Gopinath, Carlos Jensen, and Alex Groce. 2014. Code Coverage for Suite Evaluation for Developers. In ICSE. 72–82.

Digital Library

[32]

Arnaud Gotlieb and Dusica Marijan. 2014. FLOWER: Optimal Test Suite Reduction As a Network Maximum Flow. In ISSTA. 171–180.

Digital Library

[33]

Alex Groce, Mohammed Amin Alipour, Chaoqiang Zhang, Yang Chen, and John Regehr. 2014. Cause Reduction for Quick Testing. In ICST. 243–252.

Digital Library

[34]

J. P. Guilford. 1956.

[35]

Fundamental Statistics in Psychology and Education.

[36]

Alex Gyori, August Shi, Farah Hariri, and Darko Marinov. 2015. Reliable Testing: Detecting State-Polluting Tests to Prevent Test Dependency. In ISSTA. 223–233.

Digital Library

[37]

Dan Hao, Lu Zhang, Xingxia Wu, Hong Mei, and Gregg Rothermel. 2012. On-Demand Test Suite Reduction. In ICSE. 738–748.

Digital Library

[38]

Mary Jean Harrold, Rajiv Gupta, and Mary Lou Soffa. 1993. A Methodology for Controlling the Size of a Test Suite. TOSEM 2, 3 (1993), 270–285.

Digital Library

[39]

Mary Jean Harrold, David Rosenblum, Gregg Rothermel, and Elaine Weyuker. 2001. Empirical Studies of a Prediction Model for Regression Test Selection. TSE 27, 3 (2001), 248–263.

Digital Library

[40]

Kim Herzig, Michaela Greiler, Jacek Czerwonka, and Brendan Murphy. 2015. The Art of Testing Less without Sacrificing Quality. In ICSE. 483–493.

Digital Library

[41]

Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016.

[42]

Usage, Costs, and Benefits of Continuous Integration in Open-Source Projects. In ASE. 426–437.

[43]

Dennis Jeffrey and Neelam Gupta. 2007. Improving Fault Detection Capability by Selectively Retaining Test Cases During Test Suite Reduction. TSE 33, 2 (2007), 108–123.

Digital Library

[44]

Yue Jia and Mark Harman. 2011. An Analysis and Survey of the Development of Mutation Testing. TSE 37, 5 (2011), 649–678.

Digital Library

[45]

David S. Johnson. 1974. Approximation Algorithms for Combinatorial Problems. JCSS (1974), 256–278.

Digital Library

[46]

James A. Jones and Mary Jean Harrold. 2001. Test-Suite Reduction and Prioritization for Modified Condition/Decision Coverage. In ICSM. 92–102.

Digital Library

[47]

René Just, Darioush Jalali, Laura Inozemtseva, Michael D. Ernst, Reid Holmes, and Gordon Fraser. 2014. Are Mutants a Valid Substitute for Real Faults in Software Testing?. In FSE. 654–665.

Digital Library

[48]

Wing Lam, Sai Zhang, and Michael D. Ernst. 2015.

[49]

When Tests Collide: Evaluating and Coping with the Impact of Test Dependence. Technical Report UW-CSE-15-03- 01. University of Washington, CSE.

[50]

Jun-Wei Lin and Chin-Yu Huang. 2009. Analysis of Test Suite Reduction with Enhanced Tie-breaking Techniques. IST 51, 4 (2009), 679–690.

Digital Library

[51]

Yafeng Lu, Yiling Lou, Shiyang Cheng, Lingming Zhang, Dan Hao, Yangfan Zhou, and Lu Zhang. 2016. How Does Regression Test Prioritization Perform in Real-world Software Evolution?. In ICSE. 535–546.

Digital Library

[52]

Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014.

[53]

An Empirical Analysis of Flaky Tests. In FSE. 643–653.

[54]

Xue-ying Ma, Bin-kui Sheng, and Cheng-qing Ye. 2005.

[55]

Test-Suite Reduction Using Genetic Algorithm. In APPT. 253–262.

[56]

Atif Memon, Zebao Gao, Bao Nguyen, Sanjeev Dhanda, Eric Nickell, Rob Siemborski, and John Micco. 2017.

[57]

Taming Google-Scale Continuous Testing. In ICSE-SEIP. 233–242.

Digital Library

[58]

A Jefferson Offutt, Jie Pan, and Jeffrey M Voas. 1995. Procedures for Reducing the Size of Coverage-based Test Sets. In ICTCS. 111–123.

[59]

Mike Papadakis and Nicos Malevris. 2010. Automatic Mutation Test Case Generation via Dynamic Symbolic Execution. In ISSRE. 121–130.

Digital Library

[60]

Leandro Sales Pinto, Saurabh Sinha, and Alessandro Orso. 2012. Understanding Myths and Realities of Test-Suite Evolution. In FSE. 33:1–33:11.

[61]

Napol Rachatasumrit and Miryung Kim. 2012. An Empirical Investigation into the Impact of Refactoring on Regression Testing. In ICSM. 357–366.

[62]

David S. Rosenblum and Elaine J. Weyuker. 1996. Predicting the Cost-effectiveness of Regression Testing Strategies. In FSE. 118–126.

Digital Library

[63]

David S. Rosenblum and Elaine J. Weyuker. 1997. Using Coverage Information to Predict the Cost-Effectiveness of Regression Testing Strategies. TSE 23, 3 (1997), 146–156.

Digital Library

[64]

Gregg Rothermel, Sebastian Elbaum, Alexey Malishevsky, Praveen Kallakuri, and Brian Davia. 2002. The Impact of Test Suite Granularity on the Cost-effectiveness of Regression Testing. In ICSE. 130–140.

Digital Library

[65]

Gregg Rothermel, Mary Jean Harrold, Jeffery Ostrin, and Christie Hong. 1998. An Empirical Study of the Effects of Minimization on the Fault Detection Capabilities of Test Suites. In ICSM. 34–43.

Digital Library

[66]

Gregg Rothermel, Mary Jean Harrold, Jeffery von Ronne, and Christie Hong. 2002. Empirical Studies of Test-Suite Reduction. STVR 12, 4 (2002), 219–249.

[67]

August Shi, Alex Gyori, Milos Gligoric, Andrey Zaytsev, and Darko Marinov. 2014. Balancing Trade-offs in Test-Suite Reduction. In FSE. 246–256.

Digital Library

[68]

August Shi, Tifany Yung, Alex Gyori, and Darko Marinov. 2015.

[69]

Comparing and Combining Test-Suite Reduction and Regression Test Selection. In ESEC/FSE. 237–247.

[70]

Gustavo Soares, Bruno Catao, Catuxe Varjao, Solon Aguiar, Rohit Gheyi, and Tiago Massoni. 2011. Analyzing Refactorings on Software Repositories. In SBES. 164–173.

Digital Library

[71]

Amitabh Srivastava and Jay Thiagarajan. 2002. Effectively Prioritizing Tests in Development Environment. In ISSTA. 97–106.

Digital Library

[72]

W. Eric Wong, Joseph R. Horgan, Saul London, and Aditya P. Mathur. 1995. Effect of Test Set Minimization on Fault Detection Effectiveness. In ICSE. 41–50.

Digital Library

[73]

W. Eric Wong, Joseph R. Horgan, Aditya P. Mathur, and Alberto Pasquini. 1997.

[74]

Test Set Size Minimization and Fault Detection Effectiveness: A Case Study in a Space Application. In COMPSAC. 522–529.

[75]

Jochen Wuttke, Kıvanç Muşlu, Sai Zhang, and David Notkin. 2013.

[76]

Test Dependence: Theory and Manifestation. Technical Report UW-CSE-13-07-02. University of Washington, CSE.

[77]

Shin Yoo and Mark Harman. 2007. Pareto Efficient Multi-Objective Test Case Selection. In ISSTA. 140–150.

Digital Library

[78]

Shin Yoo and Mark Harman. 2012. Regression Testing Minimization, Selection and Prioritization: A Survey. STVR 22, 2 (2012), 67–120.

Digital Library

[79]

Lingming Zhang, Darko Marinov, Lu Zhang, and Sarfraz Khurshid. 2011.

[80]

An Empirical Study of JUnit Test-Suite Reduction. In ISSRE. 170–179.

Digital Library

[81]

Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D. Ernst, and David Notkin. 2014.

[82]

Empirically Revisiting the Test Independence Assumption. In ISSTA. 385–396.

[83]

Hao Zhong, Lu Zhang, and Hong Mei. 2008.

Cited By

Haas RNömmer RJuergens EApel S(2024)Optimization of Automated and Manual Software Tests in Industrial Practice: A Survey and Historical AnalysisIEEE Transactions on Software Engineering10.1109/TSE.2024.341819150:8(2005-2020)Online publication date: Aug-2024
https://doi.org/10.1109/TSE.2024.3418191
Greca RMiranda BBertolino A(2023)State of Practical Applicability of Regression Testing Research: A Live Systematic Literature ReviewACM Computing Surveys10.1145/357985155:13s(1-36)Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1145/3579851
Zhang LCui BZhang Z(2023)Optimizing Continuous Integration by Dynamic Test Proportion Selection2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00048(438-449)Online publication date: Mar-2023
https://doi.org/10.1109/SANER56733.2023.00048
Show More Cited By

Index Terms

Evaluating test-suite reduction in real software evolution
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Software evolution
    2. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Comparing and combining test-suite reduction and regression test selection
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

Regression testing is widely used to check that changes made to software do not break existing functionality, but regression test suites grow, and running them fully can become costly. Researchers have proposed test-suite reduction and regression test ...
Balancing trade-offs in test-suite reduction
FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

Regression testing is an important activity but can get expensive for large test suites. Test-suite reduction speeds up regression testing by identifying and removing redundant tests based on a given set of requirements. Traditional research on test-...
Test-Suite Reduction and Prioritization for Modified Condition/Decision Coverage
ICSM '01: Proceedings of the IEEE International Conference on Software Maintenance (ICSM'01)

Software testing is particularly expensive for developers of high-assurance software, such as software that is produced for commercial airborne systems. One reason for this expense is the Federal Aviation Administration's requirement that test suites be ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

July 2018

379 pages

ISBN:9781450356992

DOI:10.1145/3213846

General Chair:
Frank Tip
Northeastern University, USA
,
Program Chair:
Eric Bodden
University of Paderborn, Germany / Fraunhofer IEM, Germany

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

ISSTA '18

Sponsor:

SIGSOFT

ISSTA '18: International Symposium on Software Testing and Analysis

July 16 - 21, 2018

Amsterdam, Netherlands

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
669
Total Downloads

Downloads (Last 12 months)137
Downloads (Last 6 weeks)44

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Haas RNömmer RJuergens EApel S(2024)Optimization of Automated and Manual Software Tests in Industrial Practice: A Survey and Historical AnalysisIEEE Transactions on Software Engineering10.1109/TSE.2024.341819150:8(2005-2020)Online publication date: Aug-2024
https://doi.org/10.1109/TSE.2024.3418191
Greca RMiranda BBertolino A(2023)State of Practical Applicability of Regression Testing Research: A Live Systematic Literature ReviewACM Computing Surveys10.1145/357985155:13s(1-36)Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1145/3579851
Zhang LCui BZhang Z(2023)Optimizing Continuous Integration by Dynamic Test Proportion Selection2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00048(438-449)Online publication date: Mar-2023
https://doi.org/10.1109/SANER56733.2023.00048
Li YWang ZWang JChen JMou RLi G(2023)Semantic‐aware two‐phase test case prioritization for continuous integrationSoftware Testing, Verification and Reliability10.1002/stvr.186434:1Online publication date: 26-Sep-2023
https://doi.org/10.1002/stvr.1864
Schulze SKrüger JWünsche JFelfernig AFuentes L(2022)Towards developer support for merging forked test casesProceedings of the 26th ACM International Systems and Software Product Line Conference - Volume A10.1145/3546932.3547002(131-141)Online publication date: 12-Sep-2022
https://dl.acm.org/doi/10.1145/3546932.3547002
Perretta JDeOrio AGuha ABell JRyu SSmaragdakis Y(2022)On the use of mutation analysis for evaluating student test suite qualityProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3533767.3534217(263-275)Online publication date: 18-Jul-2022
https://dl.acm.org/doi/10.1145/3533767.3534217
Altiero FCorazza ADi Martino SPeron AStarace LLo DMcIntosh SNovielli N(2022)ReCoverProceedings of the 19th International Conference on Mining Software Repositories10.1145/3524842.3528490(196-200)Online publication date: 23-May-2022
https://dl.acm.org/doi/10.1145/3524842.3528490
Dirim ŞÖzener OSözer H(2022)Prioritization and parallel execution of test cases for certification testing of embedded systemsSoftware Quality Journal10.1007/s11219-022-09594-131:2(471-496)Online publication date: 22-Jul-2022
https://doi.org/10.1007/s11219-022-09594-1
Shi A(2021)SIGSOFT Outstanding Doctoral Dissertation AwardACM SIGSOFT Software Engineering Notes10.1145/3468744.346874946:3(17-18)Online publication date: 21-Jul-2021
https://dl.acm.org/doi/10.1145/3468744.3468749
Elsner DHauer FPretschner AReimer SCadar CZhang X(2021)Empirically evaluating readily available information for regression test optimization in continuous integrationProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464834(491-504)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3460319.3464834
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents