skip to main content
research-article

Works for Me! Cannot Reproduce – A Large Scale Empirical Study of Non-reproducible Bugs

Published: 01 September 2022 Publication History

Abstract

Software developers attempt to reproduce software bugs to understand their erroneous behaviours and to fix them. Unfortunately, they often fail to reproduce (or fix) them, which leads to faulty, unreliable software systems. However, to date, only a little research has been done to better understand what makes the software bugs non-reproducible. In this article, we conduct a multimodal study to better understand the non-reproducibility of software bugs. First, we perform an empirical study using 576 non-reproducible bug reports from two popular software systems (Firefox, Eclipse) and identify 11 key factors that might lead a reported bug to non-reproducibility. Second, we conduct a user study involving 13 professional developers where we investigate how the developers cope with non-reproducible bugs. We found that they either close these bugs or solicit for further information, which involves long deliberations and counter-productive manual searches. Third, we offer several actionable insights on how to avoid non-reproducibility (e.g., false-positive bug report detector) and improve reproducibility of the reported bugs (e.g., sandbox for bug reproduction) by combining our analyses from multiple studies (e.g., empirical study, developer study). Fourth, we explain the differences between reproducible and non-reproducible bug reports by systematically interpreting multiple machine learning models that classify these reports with high accuracy. We found that links to existing bug reports might help improve the reproducibility of a reported bug. Finally, we detect the connected bug reports to a non-reproducible bug automatically and further demonstrate how 93 bugs connected to 71 non-reproducible bugs from our dataset can offer complementary information (e.g., attachments, screenshots, program flows).

References

[1]
Amoui M, Kaushik N, Al-Dabbagh A, Tahvildari L, Li S, Liu W (2013) Search-based duplicate defect detection: An industrial experience. In: Proc. MSR, pp 173–182
[2]
An L, Castelluccio M, and Khomh F An empirical study of dll injection bugs in the firefox ecosystem EMSE 2019 24 1799-1822
[3]
Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc Y (2008) Is it a bug or an enhancement? a text-based approach to classify change requests. In: Proc. CASCON, p 15
[5]
Aranda J, Venolia G (2009) The secret life of bugs: Going past the errors and omissions in software repositories. In: Proc. ICSE, pp 298–308
[6]
Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report?. In: Proc. FSE, pp 308–318
[7]
Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. arXiv:1607.04606
[8]
Breiman L Random forests Mach. Learn. 2001 45 1 5-32
[9]
Cessie SL and Houwelingen JCV Ridge estimators in logistic regression JSTOR 1992 41 1 191-201
[10]
Chaparro O, Bernal-Cárdenas C, Lu J, Moran K, Marcus A, Di Penta M, Poshyvanyk D, Ng V (2019) Assessing the quality of the steps to reproduce in bug reports. In: Proc.ESEC/FSE, pp 86–96
[11]
Chaparro O, Florez J M, Marcus A (2017) Using observed behavior to reformulate queries during text retrieval-based bug localization. In: Proc. ICSME, p to appear
[12]
Chaparro O, Florez J M, Singh U, Marcus A (2019) Reformulating queries for duplicate bug report detection. In: Proc. SANER, pp 218–229
[13]
Chaparro O, Lu J, Zampetti F, Moreno L, Di Penta M, Marcus A, Bavota G, Ng V (2017) Detecting missing information in bug descriptions. In: Proc. ESEC/FSE, pp 396–407
[14]
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proc. SIGKDD, pp 785–794
[15]
Dam H K, Tran T, Ghose A (2018) Explainable software analytics. In: Proc. ICSE-C, pp 53–56
[17]
[18]
Fagerland M W (2012) t-tests, non-parametric tests, and large studies–a paradox of statistical practice?. BMC Med Res Methodol, 12(78)
[19]
Fan Y, Xia X, D.Lo, Hassan A E (2018) Chaff from the wheat: Characterizing and determining valid bug reports. TSE
[20]
Furnas GW, Landauer TK, Gomez LM, and Dumais ST The Vocabulary Problem in Human-system Communication Commun. ACM 1987 30 11 964-971
[21]
Glaser BG and Strauss AL The discovery of grounded theory : strategies for qualitative research 1967 Chicago Aldine Publishing
[22]
Goyal A and Sardana N Nrfixer: Sentiment based model for predicting the fixability of non-reproducible bugs e-Informatica 2017 11 1 103-116
[23]
Guo P J, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: An empirical study of microsoft windows. In: Proc. ICSE, pp 495–504
[24]
Guo P J, Zimmermann T, Nagappan N, Murphy B (2011) “not my bug!” and other reasons for software bug report reassignments. In: Proc. CSCW, pp 395–404
[25]
Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: How misclassification impacts bug prediction. In: Proc. ICSE, pp 392–401
[26]
Hindle A and Onuczko C Preventing duplicate bug reports by continuously querying bug reports Empirical Softw. Engg. 2019 24 2 902-936
[28]
Jiarpakdee J, Tantithamthavorn C, Dam H K, Grundy J (2020) An empirical study of model-agnostic techniques for defect prediction models. TSE
[29]
Jiarpakdee J, Tantithamthavorn C, Grundy J (2021) Practitioners’ perceptions of the goals and visual explanations of defect prediction models. In: Proc. MSR, pp 432–443
[30]
John G H, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proc. UAI, pp 338–345
[31]
Joorabchi M E, Mirzaaghaei M, Mesbah A (2014) Works for me! characterizing non-reproducible bug reports. In: Proc. MSR, pp 62–71
[32]
Lin B, Zampetti F, Bavota G, Di Penta M, Lanza M, Oliveto R (2018) Sentiment analysis for software engineering: How far can we go?. In: Proc. ICSE, pp 94–104
[33]
Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, and Lee S From local explanations to global understanding with explainable ai for trees Nature machine intelligence 2020 2 1 56-67
[34]
Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? on automatically classifying app reviews. In: Proc. RE, pp 116–125
[35]
Nayrolles M, Hamou-Lhadj A (2018) Towards a classification of bugs to facilitate software maintainability tasks. In: Proc. SQUADE, pp 25–32
[36]
O’Callahan R, Jones C, Froyd N, Huey K, Noll A, Partush N (2017) Engineering record and replay for deployability. In: Proc. USENIX, pp 377–389
[37]
Parnin C, Orso A (2011) Are Automated Debugging Techniques Actually Helping Programmers?. In: Proc. ISSTA, pp 199–209
[39]
Ponzanelli L, Mocci A, Bacchelli A, Lanza M, Fullerton D (2014) Improving Low Quality Stack Overflow Post Detection. In: Proc. ICSME, pp 541–544
[40]
Quinlan J R (1993) C4.5: Programs for machine learning. Morgan Kaufmann Publishers Inc.
[41]
Rahman M M, Khomh F, Castelluccio M (2020) Why are some bugs non-reproducible? an empirical investigation using data fusion. In: Proc. ICSME, p 12
[42]
Rahman M M, Roy C K, Collins J (2016) CORRECT: Code Reviewer Recommendation Based on Cross-Project and Technology Experience. In: Proc. ICSE, p to appear
[43]
Rahman MM, Roy CK, and Lo D Automatic query reformulation for code search using crowdsourced knowledge EMSE 2019 24 1869-1924
[44]
Researcher posts facebook bug report to mark Zuckerberg’s wall (2013) https://cnet.co/2PvIH9O
[45]
Report: (2019) Software failure caused $1.7 trillion in financial losses in 2017. https://tek.io/2FBNl2i
[46]
Ribeiro M T, Singh S, Guestrin C (2016) ”why should i trust you?”: Explaining the predictions of any classifier. In: Proc. KDD, pp 1135–1144
[47]
Royston JP An extension of shapiro and wilk’s w test for normality to large samples J R Stat Soc 1982 31 2 115-124
[48]
Sarkar A, Rigby P C, Bartalos B (2019) Improving bug triaging with high confidence predictions at ericsson. In: Proc. ICSME, pp 81–91
[51]
Shafiq H A, Arshad Z (2014) Automated debugging and bug fixing solutions : A systematic literature review and classification
[52]
Shi Z, Keung J, Song Q (2014) An Empirical Study of BM25 and BM25F Based Feature Location Techniques. In: Proc. InnoSWDev, pp 106–114
[53]
Socher R, Perelygin A, Wu J, Chuang J, Manning C D, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proc. EMNLP, pp 1631–1642
[54]
Tan L, Liu C, Li Z, Wang X, Zhou Y, and Zhai C Bug characteristics in open source software EMSE 2014 19 6 1665-1705
[55]
Thelwall M, Buckley K, Paltoglou G, Cai D, and Kappas A Sentiment strength detection in short informal text JASIST 2010 61 12 2544-2558
[56]
Thongtanunam P, Kula R G, Yoshida N, Iida H, Matsumoto K (2015) Who Should Review my Code?. In: Proc. SANER, pp 141–150
[57]
Tian Y, Sun C, Lo D (2012) Improved duplicate bug report identification. In: Proc. CSMR, pp 385–390
[58]
Vyas D, Fritz T, Shepherd D (2014) Bug reproduction: A collaborative practice within software maintenance activities. In: COOP, pp 189–207
[59]
Wang S, Lo D (2014) Version history, similar report, and structure: Putting them together for improved bug localization. In: Proc. ICPC, pp 53–63
[60]
Wang S and Lo D Amalgam+: Composing rich information sources for accurate bug localization JSEP 2016 28 10 921-942
[61]
Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. Proc. ICSE, pp 461–470
[62]
Wattanakriengkrai S, Thongtanunam P, Tantithamthavorn C, Hata H, Matsumoto K (2020) Predicting defective lines using a model-agnostic technique. TSE
[64]
[65]
Wong C P, Xiong Y, Zhang H, Hao D, Zhang L, Mei H (2014) Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: Proc. ICSME, pp 181–190
[66]
Xia X, Lo D, Shihab E, and Wang X Automated bug report field reassignment and refinement prediction TSR 2016 65 3 1094-1113
[67]
Yang X, Lo D, Xia X, Bao L, Sun J (2016) Combining word embedding with information retrieval to recommend similar bug reports. In: Proc. ISSRE, pp 127–137
[68]
Ye X, Shen H, Ma X, Bunescu R, Liu C (2016) From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering. In: Proc. ICSE, pp 404–415
[69]
Yuan T, Lo D, Lawall J (2014) Automated Construction of a Software-specific Word Similarity Database. In: Proc. CSMR-WCRE, pp 44–53
[70]
Zhao Y, Yu T, Su T, Liu Y, Zheng W, Zhang J, Halfond WGJ (2019) Recdroid: Automatically reproducing android application crashes from bug reports. In: Proc. ICSE, pp 128–139
[71]
Zhou J, Zhang H, Lo D (2012) Where Should the Bugs Be Fixed? - More Accurate Information Retrieval-based Bug Localization Based on Bug Reports. In: Proc. ICSE
[72]
Zimmermann T, Nagappan N, Guo P J, Murphy B (2012) Characterizing and predicting which bugs get reopened. In: Proc. ICSE, pp 1074–1083

Cited By

View all
  • (2023)Similar Bug Reports Recommendation System using BERTProceedings of the XXXVII Brazilian Symposium on Software Engineering10.1145/3613372.3613396(378-387)Online publication date: 25-Sep-2023

Index Terms

  1. Works for Me! Cannot Reproduce – A Large Scale Empirical Study of Non-reproducible Bugs
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Empirical Software Engineering
    Empirical Software Engineering  Volume 27, Issue 5
    Sep 2022
    845 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 September 2022
    Accepted: 22 March 2022

    Author Tags

    1. Bug reproduction
    2. Non-reproducibility
    3. Empirical study
    4. Grounded theory
    5. Developer feedback
    6. Reproducibility challenges
    7. Key factors
    8. Bug report classification
    9. Model interpretation

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Similar Bug Reports Recommendation System using BERTProceedings of the XXXVII Brazilian Symposium on Software Engineering10.1145/3613372.3613396(378-387)Online publication date: 25-Sep-2023

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media