skip to main content
10.1109/ICSE.2007.66acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
Article

Predicting Faults from Cached History

Published: 24 May 2007 Publication History

Abstract

We analyze the version history of 7 software systems to predict the most fault prone entities and files. The basic assumption is that faults do not occur in isolation, but rather in bursts of several related faults. Therefore, we cache locations that are likely to have faults: starting from the location of a known (fixed) fault, we cache the location itself, any locations changed together with the fault, recently added locations, and recently changed locations. By consulting the cache at the moment a fault is fixed, a developer can detect likely fault-prone locations. This is useful for prioritizing verification and validation resources on the most fault prone files or entities. In our evaluation of seven open source projects with more than 200,000 revisions, the cache selects 10% of the source code files; these files account for 73%-95% of faults-- a significant advance beyond the state of the art.

References

[1]
{1} E. Alpaydin, Introduction to Machine Learning: The MIT Press, 2004.
[2]
{2} B. Behlendorf, C. M. Pilato, G. Stein, K. Fogel, K. Hancock, and B. Collins-Sussman, "Subversion Project Homepage," 2005.
[3]
{3} J. Bevan and E. J. Whitehead, Jr., "Identification of Software Instabilities," Proc. of 2003 Working Conference on Reverse Engineering (WCRE 2003), Victoria, Canada, 2003.
[4]
{4} J. Bevan, E. J. Whitehead, Jr., S. Kim, and M. Godfrey, "Facilitating Software Evolution with Kenyon," Proc. of the 2005 European Software Engineering Conference and 2005 Foundations of Software Engineering (ESEC/FSE 2005), Lisbon, Portugal, 2005.
[5]
{5} D. Cubranic and G. C. Murphy, "Hipikat: Recommending pertinent software development artifacts," Proc. of 25th International Conference on Software Engineering (ICSE), Portland, Oregon, 2003, pp. 408-418.
[6]
{6} V. Dallmeier, P. Weißgerber, and T. Zimmermann, "APFEL: A Preprocessing Framework For Eclipse," http://www.st.cs.unisb.de/softevo/apfel/, 2005.
[7]
{7} M. Fischer, M. Pinzger, and H. Gall, "Populating a Release History Database from Version Control and Bug Tracking Systems," Proc. of 2003 Int'l Conference on Software Maintenance (ICSM'03), 2003, pp. 23-32.
[8]
{8} H. Gall, M. Jazayeri, and J. Krajewski, "CVS Release History Data for Detecting Logical Couplings," Proc. of Sixth International Workshop on Principles of Software Evolution (IWPSE'03), Helsinki, Finland, 2003, pp. 13-23.
[9]
{9} M. W. Godfrey and L. Zou, "Using Origin Analysis to Detect Merging and Splitting of Source Code Entities," IEEE Trans. on Software Engineering, vol. 31, pp. 166-181, 2005.
[10]
{10} T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy, "Predicting Fault Incidence Using Software Change History," IEEE Transactions on Software Engineering, vol. 26, pp. 653-661, 2000.
[11]
{11} A. E. Hassan and R. C. Holt, "The Top Ten List: Dynamic Fault Prediction," Proc. of International Conference on Software Maintenance (ICSM 2005), Budapest, Hungary, 2005, pp. 263-272.
[12]
{12} T. M. Khoshgoftaar and E. B. Allen, "Ordering Fault-Prone Software Modules," Software Quality Journal, vol. 11, pp. 19- 37, 2003.
[13]
{13} T. M. Khoshgoftaar and E. B. Allen, "Predicting the Order of Fault-Prone Modules in Legacy Software," Proc. of The Ninth International Symposium on Software Reliability Engineering, Paderborn, Germany, 1998, pp. 344-353.
[14]
{14} S. Kim, K. Pan, and E. J. Whitehead, Jr., "When Functions Change Their Names: Automatic Detection of Origin Relationships," Proc. of 12th Working Conference on Reverse Engineering (WCRE 2005), Pittsburgh, PA, USA, 2005, pp. 143-152.
[15]
{15} S. Kim, T. Zimmermann, K. Pan, and E. J. Whitehead, Jr., "Automatic Identification of Bug Introducing Changes," Proc. of International Conference on Automated Software Engineering (ASE 2006), Tokyo, Japan, 2006.
[16]
{16} A. J. Ko and B. A. Myers, "A Framework and Methodology for Studying the Causes of Software Errors in Programming Systems," Journal of Visual Languages and Computing, vol. 16, pp. 41-84, 2005.
[17]
{17} A. Mockus and L. G. Votta, "Identifying Reasons for Software Changes Using Historic Databases," Proc. of International Conference on Software Maintenance (ICSM 2000), San Jose, California, USA, 2000, pp. 120-130.
[18]
{18} A. Mockus and D. M. Weiss, "Predicting Risk of Software Changes," Bell Labs Technical Journal, vol. 5, pp. 169-180, 2002.
[19]
{19} N. Nagappan and T. Ball, "Use of Relative Code Churn Measures to Predict System Defect Density," Proc. of 2005 Int'l Conference on Software Engineering (ICSE 2005), Saint Louis, Missouri, USA, 2005, pp. 284-292.
[20]
{20} N. Nagappan, T. Ball, and A. Zeller, "Mining Metrics to Predict Component Failures," Proc. of 2006 Int'l Conference on Software Engineering (ICSE 2006), Shanghai, China, 2006, pp. 452-461.
[21]
{21} T. J. Ostrand, E. J. Weyuker, and R. M. Bell, "Predicting the Location and Number of Faults in Large Software Systems," IEEE Transactions on Software Engineering, vol. 31, pp. 340- 355, 2005.
[22]
{22} J. ¿liwerski, T. Zimmermann, and A. Zeller, "When Do Changes Induce Fixes?," Proc. of Int'l Workshop on Mining Software Repositories (MSR 2005), Saint Louis, Missouri, USA, 2005.
[23]
{23} J. ¿liwerski, T. Zimmermann, and A. Zeller, "HATARI: Raising Risk Awareness. Research Demonstration," Proc. of the 2005 European Software Engineering Conference and 2005 Foundations of Software Engineering (ESEC/FSE 2005), Lisbon, Portugal, 2005, pp. 107-110.
[24]
{24} P. Weißgerber and S. Diehl, "Identifying Refactorings from Source-Code Changes," Proc. of International Conference on Automated Software Engineering (ASE 2006), Tokyo, Japan, 2006, pp. 231-240.
[25]
{25} T. Zimmermann and P. Weißgerber, "Preprocessing CVS Data for Fine-Grained Analysis," Proc. of Proc. Intl. Workshop on Mining Software Repositories (MSR), Edinburgh, Scotland, 2004.
[26]
{26} T. Zimmermann, P. Weißgerber, S. Diehl, and A. Zeller, "Mining Version Histories to Guide Software Changes," IEEE Trans. Software Eng., vol. 31, pp. 429-445, 2005.

Cited By

View all
  • (2024)Mining Action Rules for Defect Reduction PlanningProceedings of the ACM on Software Engineering10.1145/36608091:FSE(2309-2331)Online publication date: 12-Jul-2024
  • (2024)Machine Learning-based Models for Predicting Defective PackagesProceedings of the 2024 8th International Conference on Machine Learning and Soft Computing10.1145/3647750.3647755(25-31)Online publication date: 26-Jan-2024
  • (2024)Developer Productivity for Humans, Part 7: Software QualityIEEE Software10.1109/MS.2023.332483041:1(25-30)Online publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '07: Proceedings of the 29th international conference on Software Engineering
May 2007
784 pages
ISBN:0769528287

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 24 May 2007

Check for updates

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Mining Action Rules for Defect Reduction PlanningProceedings of the ACM on Software Engineering10.1145/36608091:FSE(2309-2331)Online publication date: 12-Jul-2024
  • (2024)Machine Learning-based Models for Predicting Defective PackagesProceedings of the 2024 8th International Conference on Machine Learning and Soft Computing10.1145/3647750.3647755(25-31)Online publication date: 26-Jan-2024
  • (2024)Developer Productivity for Humans, Part 7: Software QualityIEEE Software10.1109/MS.2023.332483041:1(25-30)Online publication date: 1-Jan-2024
  • (2023)A Multidimensional Analysis of Bug Density in SAP HANAProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613875(1997-2007)Online publication date: 30-Nov-2023
  • (2023)UniLoc: Unified Fault Localization of Continuous Integration FailuresACM Transactions on Software Engineering and Methodology10.1145/359379932:6(1-31)Online publication date: 8-May-2023
  • (2023)Code-line-level Bugginess Identification: How Far have We Come, and How Far have We Yet to Go?ACM Transactions on Software Engineering and Methodology10.1145/358257232:4(1-55)Online publication date: 27-May-2023
  • (2023)RefactorScore: Evaluating Refactor Prone CodeIEEE Transactions on Software Engineering10.1109/TSE.2023.332461349:11(5008-5026)Online publication date: 1-Nov-2023
  • (2022)Technical debts and faults in open-source quantum software systemsJournal of Systems and Software10.1016/j.jss.2022.111458193:COnline publication date: 1-Nov-2022
  • (2021)Fault Localization With Data Flow Information and an Artificial Neural NetworkInternational Journal of Software Innovation10.4018/IJSI.20210701059:3(66-78)Online publication date: 1-Jul-2021
  • (2021)An Empirical Examination of the Impact of Bias on Just-in-time Defect PredictionProceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)10.1145/3475716.3475791(1-12)Online publication date: 11-Oct-2021
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media