skip to main content
research-article

Revisiting Test Impact Analysis in Continuous Testing From the Perspective of Code Dependencies

Published: 01 June 2022 Publication History

Abstract

In continuous testing, developers execute automated test cases once or even several times per day to ensure the quality of the integrated code. Although continuous testing helps ensure the quality of the code and reduces maintenance effort, it also significantly increases test execution overhead. In this paper, we empirically evaluate the effectiveness of test impact analysis from the perspective of code dependencies in the continuous testing setting. We first applied test impact analysis to one year of software development history in 11 large-scale open-source systems. We found that even though the number of changed files is small in daily commits (median ranges from 3 to 28 files), around 50 percent or more of the test cases are still impacted and need to be executed. Motivated by our finding, we further studied the code dependencies between source code files and test cases, and among test cases. We found that 1) test cases often focus on testing the integrated behaviour of the systems and 15 percent of the test cases have dependencies with more than 20 source code files; 2) 18 percent of the test cases have dependencies with other test cases, and test case inheritance is the most common cause of test case dependencies; and 3) we documented four dependency-related test smells that we uncovered in our manual study. Our study provides the first step towards studying and understanding the effectiveness of test impact analysis in the continuous testing setting and provides insights on improving test design and execution.

References

[1]
M. Gligoric, L. Eloussi, and D. Marinov, “Practical regression test selection with dynamic file dependencies,” in Proc. Int. Symp. Softw. Testing Anal., 2015, pp. 211–222.
[2]
A. Shi, T. Yung, A. Gyori, and D. Marinov, “Comparing and combining test-suite reduction and regression test selection,” in Proc. 10th Joint Meeting Foundations Softw. Eng., 2015, pp. 237–247.
[3]
G. Rothermel and M. J. Harrold, “A safe, efficient regression test selection technique,” ACM Trans. Softw. Eng. Methodol., vol. 6, no. 2, pp. 173–210, 1997.
[4]
L. Zhang, D. Marinov, L. Zhang, and S. Khurshid, “An empirical study of junit test-suite reduction,” in Proc. IEEE 22nd Int. Symp. Softw. Rel. Eng., 2011, pp. 170–179.
[5]
A. Vahabzadeh, A. Stocco, and A. Mesbah, “Fine-grained test minimization,” in Proc. 40th Int. Conf. Softw. Eng., 2018, pp. 210–221.
[6]
S. Yoo and M. Harman, “Regression testing minimization, selection and prioritization: A survey,” Softw. Testing Verification Rel., vol. 22, no. 2, pp. 67–120, Mar. 2012.
[7]
A. Orso, N. Shi, and M. J. Harrold, “Scaling regression testing to large software systems,” in Proc. 12th ACM SIGSOFT 12th Int. Symp. Foundations Softw. Eng., 2004, pp. 241–251.
[8]
O. Legunsen, F. Hariri, A. Shi, Y. Lu, L. Zhang, and D. Marinov, “An extensive study of static regression test selection in modern software evolution,” in Proc. 24th ACM SIGSOFT Int. Symp. Foundations Softw. Eng., 2016, pp. 583–594.
[9]
Microsoft, “Test impact analysis in visual studio test,” Accessed: May2019. 2019. [Online]. Available: https://docs.microsoft.com/en-us/azure/devops/pipelines/test/test-impact-analysis?view=azure-devops
[10]
A. Jenkins, “Apache Jenkins CI test results,” Accessed: Nov.2019, 2019. [Online]. Available: https://builds.apache.org/
[11]
A. Najafi, W. Shang, and P. C. Rigby, “Improving test effectiveness using test executions history: An industrial experience report,” in Proc. 41st Int. Conf. Softw. Eng., 2019, pp. 213–222.
[12]
B. Vasilescu, Y. Yu, H. Wang, P. Devanbu, and V. Filkov, “Quality and productivity outcomes relating to continuous integration in github,” in Proc. 10th Joint Meeting Foundations Softw. Eng., 2015, pp. 805–816.
[13]
D. Saff and M. D. Ernst, “Reducing wasted development time via continuous testing,” in Proc. 14th Int. Symp. Softw. Rel. Eng., 2003, pp. 281–292. [Online]. Available: https://doi.org/10.1109/ISSRE.2003.1251050
[14]
K. Muslu, Y. Brun, and A. Meliou, “Data debugging with continuous testing,” in Proc. Joint Meeting Eur. Softw. Eng. Conf. ACM SIGSOFT Symp. Foundations Softw. Eng., 2013, pp. 631–634. [Online]. Available: https://doi.org/10.1145/2491411.2494580
[15]
T.-H. Chenet al., “Analytics-driven load testing: An industrial experience report on load testing of large-scale systems,” in Proc. 39th Int. Conf. Softw. Eng.: Softw. Eng. Practice Track, 2017, pp. 243–252.
[16]
A. Memonet al., “Taming google-scale continuous testing,” in Proc. 39th Int. Conf. Softw. Eng.: Softw. Eng. Practice Track, 2017, pp. 233–242. [Online]. Available: https://doi.org/10.1109/ICSE-SEIP.2017.16
[17]
J. Chen, Y. Bai, D. Hao, Y. Xiong, H. Zhang, and B. Xie, “Learning to prioritize test programs for compiler testing,” in Proc. 39th Int. Conf. Softw. Eng., 2017, pp. 700–711.
[18]
Z. Li, M. Harman, and R. M. Hierons, “Search algorithms for regression test case prioritization,” IEEE Trans. Softw. Eng., vol. 33, no. 4, pp. 225–237, Apr. 2007.
[19]
H. Mei, D. Hao, L. Zhang, L. Zhang, J. Zhou, and G. Rothermel, “A static approach to prioritizing junit test cases,” IEEE Trans. Softw. Eng., vol. 38, no. 6, pp. 1258–1275, Nov./Dec. 2012.
[20]
G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold, “Prioritizing test cases for regression testing,” IEEE Trans. Softw. Eng., vol. 27, no. 10, pp. 929–948, Oct. 2001.
[21]
R. K. Saha, L. Zhang, S. Khurshid, and D. E. Perry, “An information retrieval approach for regression test prioritization based on program changes,” in Proc. 37th Int. Conf. Softw. Eng., 2015, pp. 268–279.
[22]
S. W. Thomas, H. Hemmati, A. E. Hassan, and D. Blostein, “Static test case prioritization using topic models,” Empir. Softw. Eng., vol. 19, no. 1, pp. 182–212, Feb. 2014.
[23]
S. Yoo, M. Harman, and D. Clark, “Fault localization prioritization: Comparing information-theoretic and coverage-based approaches,” ACM Trans. Softw. Eng. Methodol., vol. 22, no. 3, pp. 19:1–19:29, Jul. 2013.
[24]
L. Zhang, D. Hao, L. Zhang, G. Rothermel, and H. Mei, “Bridging the gap between the total and additional test-case prioritization strategies,” in Proc. Int. Conf. Softw. Eng., 2013, pp. 192–201.
[25]
S. Elbaum, G. Rothermel, and J. Penix, “Techniques for improving regression testing in continuous integration development environments,” in Proc. 22nd ACM SIGSOFT Int. Symp. Foundations Softw. Eng., 2014, pp. 235–245.
[26]
D. Marijan, A. Gotlieb, and S. Sen, “Test case prioritization for continuous regression testing: An industrial case study,” in Proc. IEEE Int. Conf. Softw. Maintenance, 2013, pp. 540–543.
[27]
Y. Zhu, E. Shihab, and P. C. Rigby, “Test re-prioritization in continuous testing environments,” in Proc. 34th Int. Conf. Softw. Maintenance Evol., 2018, pp. 69–79.
[28]
Q. Luo, K. Moran, L. Zhang, and D. Poshyvanyk, “How do static and dynamic test case prioritization techniques perform on modern software systems? An extensive study on github projects,” IEEE Trans. Softw. Eng., vol. 45, no. 11, pp. 1054–1080, Nov. 2019.
[29]
E. Engström, P. Runeson, and M. Skoglund, “A systematic review on regression test selection techniques,” Inf. Softw. Technol., vol. 52, no. 1, pp. 14–30, 2010.
[30]
S. Zhanget al., “Empirically revisiting the test independence assumption,” in Proc. Int. Symp. Softw. Testing Anal., 2014, pp. 385–396. [Online]. Available: http://doi.acm.org/10.1145/2610384.2610404
[31]
A. Gambi, J. Bell, and A. Zeller, “Practical test dependency detection,” in Proc. 11th IEEE Int. Conf. Softw. Testing Verification Valid., 2018, pp. 1–11. [Online]. Available: https://doi.org/10.1109/ICST.2018.00011
[32]
D. Spadini, M. Aniche, M. Bruntink, and A. Bacchelli, “To mock or not to mock?: An empirical study on mocking practices,” in Proc. 14th Int. Conf. Mining Softw. Repositories, 2017, pp. 402–412.
[33]
L. S. Pinto, S. Sinha, and A. Orso, “Understanding myths and realities of test-suite evolution,” in Proc. ACM SIGSOFT 20th Int. Symp. Foundations Softw. Eng., 2012, pp. 33:1–33:11.
[34]
Q. Luo, F. Hariri, L. Eloussi, and D. Marinov, “An empirical analysis of flaky tests,” in Proc. 22nd ACM SIGSOFT Int. Symp. Foundations Softw. Eng., 2014, pp. 643–653.
[35]
A. Vahabzadeh, A. M. Fard, and A. Mesbah, “An empirical study of bugs in test code,” in Proc. IEEE Int. Conf. Softw. Maintenance Evol., 2015, pp. 101–110.
[36]
F. Palomba and A. Zaidman, “Does refactoring of test smells induce fixing flaky tests?,” in Proc. IEEE Int. Conf. Softw. Maintenance Evol., 2017, pp. 1–12.
[37]
JavaParser, Accessed: Feb.1, 2019, 2019. [Online]. Available: https://javaparser.org/
[38]
A. Vahabzadeh, A. M. Fard, and A. Mesbah, “An empirical study of bugs in test code,” in Proc. IEEE Int. Conf. Softw. Maintenance Evol., 2015, pp. 101–110.
[39]
A. Isazadeh, H. Izadkhah, and I. Elgedawy, Source Code Modularization: Theory and Techniques. Berlin, Germany: Springer, 2017.
[40]
E. Hautus, “Improving java software through package structure analysis,” in Proc. 6th IASTED Int. Conf. Softw. Eng. Appl., 2002, pp. 1–5.
[41]
S. Grant, J. R. Cordy, and D. B. Skillicorn, “Using heuristics to estimate an appropriate number of latent topics in source code analysis,” Sci. Comput. Program., vol. 78, no. 9, pp. 1663–1678, 2013.
[42]
S. Grant, J. R. Cordy, and D. B. Skillicorn, “Using topic models to support software maintenance,” in Proc. 16th Eur. Conf. Softw. Maintenance Reengineering, 2012, pp. 403–408.
[43]
D. Spadini, M. Aniche, M. Bruntink, and A. Bacchelli, “Mock objects for testing java systems,” Empir. Softw. Eng., Nov. 2018. [Online]. Available: https://doi.org/10.1007/s10664-018-9663-0
[44]
T. D. LaToza, G. Venolia, and R. DeLine, “Maintaining mental models: A study of developer work habits,” in Proc. 28th Int. Conf. Softw. Eng., 2006, pp. 492–501.
[45]
D. Janzen and H. Saiedian, “Test-driven development: Concepts, taxonomy, and future direction,” Computer, vol. 38, no. 9, pp. 43–50, Sep. 2005.
[46]
E. Daka and G. Fraser, “A survey on unit testing practices and problems,” in Proc. 25th Int. Symp. Softw. Rel. Eng., 2014, pp. 201–211.
[47]
A. Leitner, M. Oriol, A. Zeller, I. Ciupa, and B. Meyer, “Efficient unit test case minimization,” in Proc. 22nd IEEE/ACM Int. Conf. Automated Softw. Eng., 2007, pp. 417–420.
[48]
M. Ghafari, C. Ghezzi, and K. Rubinov, “Automatically identifying focal methods under test in unit test cases,” in Proc. IEEE 15th Int. Working Conf. Source Code Anal. Manipulation, 2015, pp. 61–70.
[49]
Y. Lei and J. H. Andrews, “Minimization of randomized unit test cases,” in Proc. 16th IEEE Int. Symp. Softw. Rel. Eng., 2005, pp. 267–276. [Online]. Available: https://doi.org/10.1109/ISSRE.2005.28
[50]
A. Qusef, R. Oliveto, and A. De Lucia, “Recovering traceability links between unit tests and classes under test: An improved method,” in Proc. IEEE Int. Conf. Softw. Maintenance, 2010, pp. 1–10.
[51]
P. Bouillon, J. Krinke, N. Meyer, and F. Steimann, “Ezunit: A framework for associating failed unit tests with potential programming errors,” in Proc. 8th Int. Conf. Agile Processes Softw. Eng. Extreme Program., 2007, pp. 101–104. [Online]. Available: http://dl.acm.org/citation.cfm?id=1768961.1768979
[52]
J. Shore, “Fail fast [software debugging],” IEEE Softw., vol. 21, no. 5, pp. 21–25, Sep./Oct. 2004.
[53]
S. Boslaugh and P. Watters, Statistics in a Nutshell: A Desktop Quick Reference. Newton, MA, USA: O’Reilly Media, 2008.
[54]
J. Sim and C. C. Wright, “The kappa statistic in reliability studies: Use, interpretation, and sample size requirements,” Phys. Ther., vol. 85, no. 3, pp. 257–268, Mar. 2005.
[55]
G. Meszaros, XUnit Test Patterns: Refactoring Test Code. Reading, MA, USA: Addison-Wesley, 2007.
[56]
Z. Li, T.-H. P. Chen, J. Yang, and W. Shang, “DLFinder: Characterizing and detecting duplicate logging code smells,” in Proc. 41th Int. Conf. Softw. Eng., 2019, pp. 147–149.
[57]
S. R. Chidamber and C. F. Kemerer, “A metrics suite for object oriented design,” IEEE Trans. Softw. Eng., vol. 20, no. 6, pp. 476–493, Jun. 1994.
[58]
L. Moonen and A. Yamashita, “Do code smells reflect important maintainability aspects?,” in Proc. IEEE Int. Conf. Softw. Maintenance, 2012, pp. 306–315.
[59]
“Three reasons why we should not use inheritance in our tests,” Accessed: May2019, 2019. [Online]. Available: https://www.petrikainulainen.net/programming/unit-testing/3-reasons-why-we-should-not-use-inheritance-in-our-tests/
[60]
L. Prechelt, B. Unger, M. Philippsen, and W. F. Tichy, “A controlled experiment on inheritance depth as a cost factor for code maintenance,” J. Syst. Softw., vol. 65, no. 2, pp. 115–126, 2003. [Online]. Available: https://doi.org/10.1016/S0164-1212(02)00053-5
[61]
F. Palomba and A. Zaidman, “The smell of fear: On the relation between test smells and flaky tests,” Empir. Softw. Eng., vol. 24, no. 5, pp. 2907–2946, 2019.
[62]
D. Spadini, F. Palomba, A. Zaidman, M. Bruntink, and A. Bacchelli, “On the relation of test smells to software code quality,” in Proc. Int. Conf. Softw. Maintenance Evol., 2018, pp. 1–12.
[63]
A. Van Deursen, L. Moonen, A. Van Den Bergh, and G. Kok, “Refactoring test code,” in Proc. 2nd Int. Conf. Extreme Program. Flexible Processes Softw. Eng., 2001, pp. 92–95.
[65]
“Parameterized tests in JUnit,” Accessed: May2019. 2019. [Online]. Available: https://github.com/junit-team/junit4/wiki/parameterized-tests
[66]
L. Koskela, Effective Unit Testing: A guide for Java Developers. Shelter Island, NY, USA: Manning Publications, 2013.
[67]
J. Bell, O. Legunsen, M. Hilton, L. Eloussi, T. Yung, and D. Marinov, “Deflaker: Automatically detecting flaky tests,” in Proc. 40th Int. Conf. Softw. Eng., 2018, pp. 433–444.
[68]
M. Beller, G. Gousios, and A. Zaidman, “Oops, my tests broke the build: An explorative analysis of travis CI with GitHub,” in Proc. 14th Int. Conf. Mining Softw. Repositories, 2017, pp. 356–367. [Online]. Available: https://doi.org/10.1109/MSR.2017.62
[69]
T. Rausch, W. Hummer, P. Leitner, and S. Schulte, “An empirical analysis of build failures in the continuous integration workflows of java-based open-source software,” in Proc. 14th Int. Conf. Mining Softw. Repositories, 2017, pp. 345–355.
[70]
T. Lutellieret al., “Measuring the impact of code dependencies on software architecture recovery techniques,” IEEE Trans. Softw. Eng., vol. 44, no. 2, pp. 159–181, Feb. 2018.

Cited By

View all
  • (2021)How disabled tests manifest in test maintainability challenges?Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468609(1045-1055)Online publication date: 20-Aug-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering  Volume 48, Issue 6
June 2022
355 pages

Publisher

IEEE Press

Publication History

Published: 01 June 2022

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2021)How disabled tests manifest in test maintainability challenges?Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468609(1045-1055)Online publication date: 20-Aug-2021

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media