research-article

Works for Me! Cannot Reproduce – A Large Scale Empirical Study of Non-reproducible Bugs

Authors:

Mohammad M. Rahman,

Marco CastelluccioAuthors Info & Claims

Empirical Software Engineering, Volume 27, Issue 5

https://doi.org/10.1007/s10664-022-10153-2

Published: 01 September 2022 Publication History

Abstract

Software developers attempt to reproduce software bugs to understand their erroneous behaviours and to fix them. Unfortunately, they often fail to reproduce (or fix) them, which leads to faulty, unreliable software systems. However, to date, only a little research has been done to better understand what makes the software bugs non-reproducible. In this article, we conduct a multimodal study to better understand the non-reproducibility of software bugs. First, we perform an empirical study using 576 non-reproducible bug reports from two popular software systems (Firefox, Eclipse) and identify 11 key factors that might lead a reported bug to non-reproducibility. Second, we conduct a user study involving 13 professional developers where we investigate how the developers cope with non-reproducible bugs. We found that they either close these bugs or solicit for further information, which involves long deliberations and counter-productive manual searches. Third, we offer several actionable insights on how to avoid non-reproducibility (e.g., false-positive bug report detector) and improve reproducibility of the reported bugs (e.g., sandbox for bug reproduction) by combining our analyses from multiple studies (e.g., empirical study, developer study). Fourth, we explain the differences between reproducible and non-reproducible bug reports by systematically interpreting multiple machine learning models that classify these reports with high accuracy. We found that links to existing bug reports might help improve the reproducibility of a reported bug. Finally, we detect the connected bug reports to a non-reproducible bug automatically and further demonstrate how 93 bugs connected to 71 non-reproducible bugs from our dataset can offer complementary information (e.g., attachments, screenshots, program flows).

References

[1]

Amoui M, Kaushik N, Al-Dabbagh A, Tahvildari L, Li S, Liu W (2013) Search-based duplicate defect detection: An industrial experience. In: Proc. MSR, pp 173–182

[2]

An L, Castelluccio M, and Khomh F An empirical study of dll injection bugs in the firefox ecosystem EMSE 2019 24 1799-1822

[3]

Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc Y (2008) Is it a bug or an enhancement? a text-based approach to classify change requests. In: Proc. CASCON, p 15

[4]

Apache Lucene Core (2019) https://lucene.apache.org/core

[5]

Aranda J, Venolia G (2009) The secret life of bugs: Going past the errors and omissions in software repositories. In: Proc. ICSE, pp 298–308

[6]

Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report?. In: Proc. FSE, pp 308–318

[7]

Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. arXiv:1607.04606

[8]

Breiman L Random forests Mach. Learn. 2001 45 1 5-32

[9]

Cessie SL and Houwelingen JCV Ridge estimators in logistic regression JSTOR 1992 41 1 191-201

[10]

Chaparro O, Bernal-Cárdenas C, Lu J, Moran K, Marcus A, Di Penta M, Poshyvanyk D, Ng V (2019) Assessing the quality of the steps to reproduce in bug reports. In: Proc.ESEC/FSE, pp 86–96

[11]

Chaparro O, Florez J M, Marcus A (2017) Using observed behavior to reformulate queries during text retrieval-based bug localization. In: Proc. ICSME, p to appear

[12]

Chaparro O, Florez J M, Singh U, Marcus A (2019) Reformulating queries for duplicate bug report detection. In: Proc. SANER, pp 218–229

[13]

Chaparro O, Lu J, Zampetti F, Moreno L, Di Penta M, Marcus A, Bavota G, Ng V (2017) Detecting missing information in bug descriptions. In: Proc. ESEC/FSE, pp 396–407

[14]

Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proc. SIGKDD, pp 785–794

[15]

Dam H K, Tran T, Ghose A (2018) Explainable software analytics. In: Proc. ICSE-C, pp 53–56

[16]

Doxygen (2020) https://www.doxygen.nl/index.html

[17]

Firefox profiler (2020) https://profiler.firefox.com

[18]

Fagerland M W (2012) t-tests, non-parametric tests, and large studies–a paradox of statistical practice?. BMC Med Res Methodol, 12(78)

[19]

Fan Y, Xia X, D.Lo, Hassan A E (2018) Chaff from the wheat: Characterizing and determining valid bug reports. TSE

[20]

Furnas GW, Landauer TK, Gomez LM, and Dumais ST The Vocabulary Problem in Human-system Communication Commun. ACM 1987 30 11 964-971

[21]

Glaser BG and Strauss AL The discovery of grounded theory : strategies for qualitative research 1967 Chicago Aldine Publishing

[22]

Goyal A and Sardana N Nrfixer: Sentiment based model for predicting the fixability of non-reproducible bugs e-Informatica 2017 11 1 103-116

[23]

Guo P J, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: An empirical study of microsoft windows. In: Proc. ICSE, pp 495–504

[24]

Guo P J, Zimmermann T, Nagappan N, Murphy B (2011) “not my bug!” and other reasons for software bug report reassignments. In: Proc. CSCW, pp 395–404

[25]

Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: How misclassification impacts bug prediction. In: Proc. ICSE, pp 392–401

[26]

Hindle A and Onuczko C Preventing duplicate bug reports by continuously querying bug reports Empirical Softw. Engg. 2019 24 2 902-936

[27]

ICSME replication package (2020). https://github.com/masud-technope/ICSME2020-Replication-Package

[28]

Jiarpakdee J, Tantithamthavorn C, Dam H K, Grundy J (2020) An empirical study of model-agnostic techniques for defect prediction models. TSE

[29]

Jiarpakdee J, Tantithamthavorn C, Grundy J (2021) Practitioners’ perceptions of the goals and visual explanations of defect prediction models. In: Proc. MSR, pp 432–443

[30]

John G H, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proc. UAI, pp 338–345

[31]

Joorabchi M E, Mirzaaghaei M, Mesbah A (2014) Works for me! characterizing non-reproducible bug reports. In: Proc. MSR, pp 62–71

[32]

Lin B, Zampetti F, Bavota G, Di Penta M, Lanza M, Oliveto R (2018) Sentiment analysis for software engineering: How far can we go?. In: Proc. ICSE, pp 94–104

[33]

Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, and Lee S From local explanations to global understanding with explainable ai for trees Nature machine intelligence 2020 2 1 56-67

[34]

Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? on automatically classifying app reviews. In: Proc. RE, pp 116–125

[35]

Nayrolles M, Hamou-Lhadj A (2018) Towards a classification of bugs to facilitate software maintainability tasks. In: Proc. SQUADE, pp 25–32

[36]

O’Callahan R, Jones C, Froyd N, Huey K, Noll A, Partush N (2017) Engineering record and replay for deployability. In: Proc. USENIX, pp 377–389

[37]

Parnin C, Orso A (2011) Are Automated Debugging Techniques Actually Helping Programmers?. In: Proc. ISSTA, pp 199–209

[38]

Pernosco (2020) https://pernos.co/about/overview

[39]

Ponzanelli L, Mocci A, Bacchelli A, Lanza M, Fullerton D (2014) Improving Low Quality Stack Overflow Post Detection. In: Proc. ICSME, pp 541–544

[40]

Quinlan J R (1993) C4.5: Programs for machine learning. Morgan Kaufmann Publishers Inc.

[41]

Rahman M M, Khomh F, Castelluccio M (2020) Why are some bugs non-reproducible? an empirical investigation using data fusion. In: Proc. ICSME, p 12

[42]

Rahman M M, Roy C K, Collins J (2016) CORRECT: Code Reviewer Recommendation Based on Cross-Project and Technology Experience. In: Proc. ICSE, p to appear

[43]

Rahman MM, Roy CK, and Lo D Automatic query reformulation for code search using crowdsourced knowledge EMSE 2019 24 1869-1924

[44]

Researcher posts facebook bug report to mark Zuckerberg’s wall (2013) https://cnet.co/2PvIH9O

[45]

Report: (2019) Software failure caused $1.7 trillion in financial losses in 2017. https://tek.io/2FBNl2i

[46]

Ribeiro M T, Singh S, Guestrin C (2016) ”why should i trust you?”: Explaining the predictions of any classifier. In: Proc. KDD, pp 1135–1144

[47]

Royston JP An extension of shapiro and wilk’s w test for normality to large samples J R Stat Soc 1982 31 2 115-124

[48]

Sarkar A, Rigby P C, Bartalos B (2019) Improving bug triaging with high confidence predictions at ericsson. In: Proc. ICSME, pp 81–91

[49]

Srcml (2020) https://www.srcml.org/

[50]

Shapley values (2021) https://christophm.github.io/interpretable-ml-book/shapley.html

[51]

Shafiq H A, Arshad Z (2014) Automated debugging and bug fixing solutions : A systematic literature review and classification

[52]

Shi Z, Keung J, Song Q (2014) An Empirical Study of BM25 and BM25F Based Feature Location Techniques. In: Proc. InnoSWDev, pp 106–114

[53]

Socher R, Perelygin A, Wu J, Chuang J, Manning C D, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proc. EMNLP, pp 1631–1642

[54]

Tan L, Liu C, Li Z, Wang X, Zhou Y, and Zhai C Bug characteristics in open source software EMSE 2014 19 6 1665-1705

[55]

Thelwall M, Buckley K, Paltoglou G, Cai D, and Kappas A Sentiment strength detection in short informal text JASIST 2010 61 12 2544-2558

[56]

Thongtanunam P, Kula R G, Yoshida N, Iida H, Matsumoto K (2015) Who Should Review my Code?. In: Proc. SANER, pp 141–150

[57]

Tian Y, Sun C, Lo D (2012) Improved duplicate bug report identification. In: Proc. CSMR, pp 385–390

[58]

Vyas D, Fritz T, Shepherd D (2014) Bug reproduction: A collaborative practice within software maintenance activities. In: COOP, pp 189–207

[59]

Wang S, Lo D (2014) Version history, similar report, and structure: Putting them together for improved bug localization. In: Proc. ICPC, pp 53–63

[60]

Wang S and Lo D Amalgam+: Composing rich information sources for accurate bug localization JSEP 2016 28 10 921-942

[61]

Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. Proc. ICSE, pp 461–470

[62]

Wattanakriengkrai S, Thongtanunam P, Tantithamthavorn C, Hata H, Matsumoto K (2020) Predicting defective lines using a model-agnostic technique. TSE

[63]

WEKA Toolkit. http://www.cs.waikato.ac.nz/ml/weka

[64]

Works for me (2022) https://bit.ly/2M94cff

[65]

Wong C P, Xiong Y, Zhang H, Hao D, Zhang L, Mei H (2014) Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: Proc. ICSME, pp 181–190

[66]

Xia X, Lo D, Shihab E, and Wang X Automated bug report field reassignment and refinement prediction TSR 2016 65 3 1094-1113

[67]

Yang X, Lo D, Xia X, Bao L, Sun J (2016) Combining word embedding with information retrieval to recommend similar bug reports. In: Proc. ISSRE, pp 127–137

[68]

Ye X, Shen H, Ma X, Bunescu R, Liu C (2016) From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering. In: Proc. ICSE, pp 404–415

[69]

Yuan T, Lo D, Lawall J (2014) Automated Construction of a Software-specific Word Similarity Database. In: Proc. CSMR-WCRE, pp 44–53

[70]

Zhao Y, Yu T, Su T, Liu Y, Zheng W, Zhang J, Halfond WGJ (2019) Recdroid: Automatically reproducing android application crashes from bug reports. In: Proc. ICSE, pp 128–139

[71]

Zhou J, Zhang H, Lo D (2012) Where Should the Bugs Be Fixed? - More Accurate Information Retrieval-based Bug Localization Based on Bug Reports. In: Proc. ICSE

[72]

Zimmermann T, Nagappan N, Guo P J, Murphy B (2012) Characterizing and predicting which bugs get reopened. In: Proc. ICSE, pp 1074–1083

Cited By

Carneiro GFerreira JRamalho FMassoni T(2023)Similar Bug Reports Recommendation System using BERTProceedings of the XXXVII Brazilian Symposium on Software Engineering10.1145/3613372.3613396(378-387)Online publication date: 25-Sep-2023
https://dl.acm.org/doi/10.1145/3613372.3613396

Index Terms

Works for Me! Cannot Reproduce – A Large Scale Empirical Study of Non-reproducible Bugs

Index terms have been assigned to the content through auto-classification.

Recommendations

A large-scale empirical study on the lifecycle of code smell co-occurrences
Abstract Context
Code smells are suboptimal design or implementation choices made by programmers during the development of a software system that possibly lead to low code maintainability and higher maintenance costs.
...
An empirical study of dormant bugs
MSR 2014: Proceedings of the 11th Working Conference on Mining Software Repositories

Over the past decade, several research efforts have studied the quality of software systems by looking at post-release bugs. However, these studies do not account for bugs that remain dormant (i.e., introduced in a version of the software system, but ...
Revisiting reopened bugs in open source software systems
Abstract
Reopened bugs can degrade the overall quality of a software system since they require unnecessary rework by developers. Moreover, reopened bugs also lead to a loss of trust in the end-users regarding the quality of the software. Thus, predicting ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Empirical Software Engineering

Empirical Software Engineering Volume 27, Issue 5

Sep 2022

845 pages

ISSN:1382-3256

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 September 2022

Accepted: 22 March 2022

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Carneiro GFerreira JRamalho FMassoni T(2023)Similar Bug Reports Recommendation System using BERTProceedings of the XXXVII Brazilian Symposium on Software Engineering10.1145/3613372.3613396(378-387)Online publication date: 25-Sep-2023
https://dl.acm.org/doi/10.1145/3613372.3613396

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents