article

A systematic and comprehensive investigation of methods to build and evaluate fault prediction models

Authors:

Erik Arisholm,

Lionel C. Briand,

Eivind B. JohannessenAuthors Info & Claims

Journal of Systems and Software, Volume 83, Issue 1

Pages 2 - 17

https://doi.org/10.1016/j.jss.2009.06.055

Published: 01 January 2010 Publication History

Abstract

This paper describes a study performed in an industrial setting that attempts to build predictive models to identify parts of a Java system with a high fault probability. The system under consideration is constantly evolving as several releases a year are shipped to customers. Developers usually have limited resources for their testing and would like to devote extra resources to faulty system parts. The main research focus of this paper is to systematically assess three aspects on how to build and evaluate fault-proneness models in the context of this large Java legacy system development project: (1) compare many data mining and machine learning techniques to build fault-proneness models, (2) assess the impact of using different metric sets such as source code structural measures and change/fault history (process measures), and (3) compare several alternative ways of assessing the performance of the models, in terms of (i) confusion matrix criteria such as accuracy and precision/recall, (ii) ranking ability, using the receiver operating characteristic area (ROC), and (iii) our proposed cost-effectiveness measure (CE). The results of the study indicate that the choice of fault-proneness modeling technique has limited impact on the resulting classification accuracy or cost-effectiveness. There is however large differences between the individual metric sets in terms of cost-effectiveness, and although the process measures are among the most expensive ones to collect, including them as candidate measures significantly improves the prediction models compared with models that only include structural measures and/or their deltas between releases - both in terms of ROC area and in terms of CE. Further, we observe that what is considered the best model is highly dependent on the criteria that are used to evaluate and compare the models. And the regular confusion matrix criteria, although popular, are not clearly related to the problem at hand, namely the cost-effectiveness of using fault-proneness prediction models to focus verification efforts to deliver software with less faults at less cost.

References

[1]

Arisholm, E., Briand, L.C., Fuglerud, M., 2007. Data mining techniques for building fault-proneness models in Telecom Java Software. In: The 18th IEEE International Symposium on Software Reliability, 2007. ISSRE '07, pp. 215-224.

Abstract

References

Cited By

Index Terms

Recommendations

Fault Prediction Capability of Program File's Logical-Coupling Metrics

Investigating fault prediction capabilities of five prediction models for software quality

Performance Evaluation of Ensemble Methods For Software Fault Prediction: An Experiment

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations