research-article

Improving Automated Bug Triaging with Specialized Topic Model

Authors:

Jafar M. Al-Kofahi,

Tien N. Nguyen,

Xinyu WangAuthors Info & Claims

IEEE Transactions on Software Engineering, Volume 43, Issue 3

Pages 272 - 297

https://doi.org/10.1109/TSE.2016.2576454

Published: 01 March 2017 Publication History

Abstract

Bug triaging refers to the process of assigning a bug to the most appropriate developer to fix. It becomes more and more difficult and complicated as the size of software and the number of developers increase. In this paper, we propose a new framework for bug triaging, which maps the words in the bug reports (i.e., the term space) to their corresponding topics (i.e., the topic space). We propose a specialized topic modeling algorithm named multi-feature topic model (MTM) which extends Latent Dirichlet Allocation (LDA) for bug triaging. MTM considers product and component information of bug reports to map the term space to the topic space. Finally, we propose an incremental learning method named TopicMiner which considers the topic distribution of a new bug report to assign an appropriate fixer based on the affinity of the fixer to the topics. We pair TopicMiner with MTM ( TopicMiner $^{MTM}$ ). We have evaluated our solution on 5 large bug report datasets including GCC, OpenOffice, Mozilla, Netbeans, and Eclipse containing a total of 227,278 bug reports. We show that TopicMiner $^{MTM}$ can achieve top-1 and top-5 prediction accuracies of 0.4831-0.6868, and 0.7686-0.9084, respectively. We also compare TopicMiner $^{MTM}$ with Bugzie, LDA-KL, SVM-LDA, LDA-Activity, and Yang et al.'s approach. The results show that TopicMiner $^{MTM}$ on average improves top-1 and top-5 prediction accuracies of Bugzie by 128.48 and 53.22 percent, LDA-KL by 262.91 and 105.97 percent, SVM-LDA by 205.89 and 110.48 percent, LDA-Activity by 377.60 and 176.32 percent, and Yang et al.'s approach by 59.88 and 13.70 percent, respectively.

References

[1]

. (2016). {Online}. Available: https://bugs.eclipse.org/bugs/

[2]

. (2016). {Online}. Available: http://gcc.gnu.org/bugzilla/

[3]

. (2016). {Online}: https://bugzilla.mozilla.org/.

[4]

. (2016). {Online}. Available: http://netbeans.org/bugzilla/

[5]

. (2016). {Online}. Available: https://issues.apache.org/ooo/

[6]

H. Abdi, “<chapter-title>Bonferroni and šidák corrections for multiple comparisons</chapter-title>” in Encyclopedia of Measurement and Statistics, N. J. Salkind, Ed. Thousand Oaks, CA, USA: Sage, 2007.

[7]

J. Anvik, L. Hiew, and G. Murphy, “Who should fix this bug?” in Proc. 28th Int. Conf. Softw. Eng., 2006, pp. 361–370.

Digital Library

[8]

J. Anvik and G. Murphy, “Determining implementation expertise from bug reports,” presented at the Proc. 4th Int. Workshop Min. Softw. Repositories, Washington, DC, USA, 2007.

Digital Library

[9]

D. Bertram, A. Voida, S. Greenberg, and R. Walker, “<issue-title>Communication, collaboration, and bugs: The social nature of issue tracking in small, collocated teams</issue-title>,” in Proc. 2010 ACM Conf. Comput. Support. Coop. Work, 2010, pp. 291–300.

Digital Library

[10]

P. Bhattacharya and I. Neamtiu, “Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging,” in Proc. 2010 IEEE Int. Conf. Softw. Maint., 2010, pp. 1–10.

Digital Library

[11]

D. Binkley, D. Heinz, D. Lawrie, and J. Overfelt, “Understanding lda in source code analysis,” in Proc. 22nd Int. Conf. Program Comprehension, 2014, pp. 26–36.

Digital Library

[12]

D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet, et al., location,” J. Mach. Learn. Res., vol. Volume 3, pp. 993–1022, 2003.

Digital Library

[13]

G. Bortis and A. van der Hoek, “Porchlight: A tag-based approach to bug triaging,” in Proc. 35th Int. Conf. Softw. Eng., 2013, pp. 342–351.

Digital Library

[14]

N. Cliff, Ordinal Methods for Behavioral Data Analysis . Hove, United Kingdom: Psychology Press, 2014.

[15]

D. Čubranić, “Automatic bug triage using text categorization,” in Proc. 16th Int. Conf. Softw. Eng. Knowl. Eng., 2004, pp. 92–97.

[16]

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” J. Am. Soc. Inf. Sci., vol. Volume 41, no. Issue 6, pp. 391–407, 1990.

[17]

G. Heinrich, Parameter estimation for text analysis. {Online}. Available: http://www.arbylon.net/publications/text-est.pdf, 2005.

[18]

T. Hofmann, “Probabilistic latent semantic analysis,” in Proc. 99th Conf. Uncertain. Artif. Intell., 1999, pp. 289–296.

Digital Library

[19]

W. M. Ibrahim, N. Bettenburg, B. Adams, and A. E. Hassan, “On the relationship between comment update practices and software bugs,” J. Syst. Softw., vol. Volume 85, no. Issue 10, pp. 2293–2304, 2012.

Digital Library

[20]

G. Jeong, S. Kim, and T. Zimmermann, “Improving bug triage with bug tossing graphs,” in Proc. Joint Meeting Eur. Softw. Eng. Conf. ACM SIGSOFT Int. Symp. Found. Softw. Eng., 2009, pp. 111–120.

Digital Library

[21]

H. Kagdi, M. Gethers, D. Poshyvanyk, and M. Hammad, “Assigning change requests to software developers,” J. Softw.: Evol. Process, vol. Volume 24, no. Issue 1, pp. 3–33, 2012.

[22]

H. H. Kagdi, M. Gethers, D. Poshyvanyk, and M. Hammad, “Assigning change requests to software developers,” J. Softw. Maint., vol. Volume 24, no. Issue 1, pp. 3–33, 2012.

[23]

C. Kolassa, D. Riehle, and M. A. Salim, “<chapter-title>A model of the commit size distribution of open source</chapter-title>,” in SOFSEM 2013: Theory and Practice of Computer Science . Berlin, Germany: Springer, 2013.

[24]

M. Linares-Vásquez, K. Hossen, H. Dang, H. Kagdi, M. Gethers, and D. Poshyvanyk, “Triaging incoming change requests: Bug or commit history, or code authorship?” in Proc. 28th IEEE Int. Conf. Softw. Maint., 2012, pp. 451–460.

Digital Library

[25]

C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval, vol. Volume 1 . Cambridge, U.K.: Cambridge Univ. Press, 2008.

[26]

D. Matter, A. Kuhn, and O. Nierstrasz, “Assigning bug reports using a vocabulary-based expertise model of developers,” in Proc. 6th IEEE Int. Work. Conf. Min. Softw. Repositories, 2009, pp. 131–140.

Digital Library

[27]

H. Naguib, N. Narayan, B. Brugge, and D. Helal, “Bug report assignee recommendation using activity profiles,” in Proc. 10th IEEE Work. Conf. Min. Softw. Repositories, 2013, pp. 22–30.

Digital Library

[28]

A. Nguyen, T. Nguyen, J. Al-Kofahi, H. Nguyen, and T. Nguyen, “A topic-based approach for narrowing the search space of buggy files from a bug report,” in Proc. 26th IEEE/ACM Int. Conf. Autom. Softw. Eng., 2011, pp. 263–272.

Digital Library

[29]

A. T. Nguyen, T. T. Nguyen, T. N. Nguyen, D. Lo, and C. Sun, “Duplicate bug report detection with a combination of information retrieval and topic modeling,” in Proc. 27th IEEE/ACM Int. Conf. Autom. Softw. Eng., 2012, pp. 70–79.

Digital Library

[30]

A. Panichella, B. Dit, R. Oliveto, M. Di Penta, D. Poshyvanyk, and A. De Lucia, “How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms,” in Proc. Int. Conf. Softw. Eng., 2013, pp. 522–531.

Digital Library

[31]

M. F. Porter, “An algorithm for suffix stripping,” Program, vol. Volume 14, no. Issue 3, pp. 130–137, 1980.

[32]

R. K. Saha, M. Lease, S. Khurshid, and D. E. Perry, “Improving bug localization using structured information retrieval,” in Proc. IEEE/ACM 28th Int. Conf. Autom. Softw. Eng., 2013, pp. 345–355.

Digital Library

[33]

R. Shokripour, J. Anvik, Z. M. Kasirun, and S. Zamani, “Why so complicated? simple term filtering and weighting for location-based bug report assignment recommendation,” in Proc. 10th Work. Conf. Min. Softw. Repositories, 2013, pp. 2–11.

Digital Library

[34]

R. Shokripour, J. Anvik, Z. M. Kasirun, and S. Zamani, “A time-based approach to automatic bug report assignment,” J. Syst. Softw., vol. Volume 102, pp. 109–122, 2015.

Digital Library

[35]

K. Somasundaram and G. C. Murphy, “Automatic categorization of bug reports using latent Dirichlet allocation,” in Proc. 5th India Softw. Eng. Conf., 2012, pp. 125–130.

Digital Library

[36]

M. Steyvers and T. Griffiths, “<chapter-title>Probabilistic topic models</chapter-title>,” in Latent Semantic Analysis: A Road to Meaning, T. Landauer, D. McNamara, S. Dennis, and W. Kintsch, Eds., Hove, United Kingdom: Laurence Erlbaum, 2007.

[37]

C. Sun, D. Lo, S.-C. Khoo, and J. Jiang, “Towards more accurate retrieval of duplicate bug reports,” in Proc. 26th IEEE/ACM Int. Conf. Autom. Softw. Eng., 2011, pp. 253–262.

Digital Library

[38]

A. Tamrawi, T. Nguyen, J. Al-Kofahi, and T. Nguyen, “Fuzzy set and cache-based approach for bug triaging,” in Proc. 19th ACM SIGSOFT Symp. 13th Eur. Conf. Found. Softw. Eng., 2011, pp. 365–375.

Digital Library

[39]

Y. Tian, D. Lo, X. Xia, and C. Sun, “Automated prediction of bug report priority using multi-factor analysis,” Empir. Softw. Eng., pp. 1–30, 2014.

Digital Library

[40]

F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics Bull., vol Volume 1, no. Issue 6, pp. 80–83, 1945.

[41]

W. Wu, W. Zhang, Y. Yang, and Q. Wang, “Drex: Developer recommendation with k-nearest-neighbor search and expertise ranking,” in Proc. 18th Asia Pac. Softw. Eng. Conf., 2011, pp. 389–396.

Digital Library

[42]

X. Xia, Y. Ding, D. Lo, J. Al-Kofahi, T. Nguyen, and X. Wang, “Toward more accurate bug triaging with topic modeling,” Tech. Report. {Online}. Available: http://pan.baidu.com/s/1eRkv4Dc, 2015.

[43]

X. Xia, D. Lo, X. Wang, and B. Zhou, “Accurate developer recommendation for bug resolution,” in Proc. 20th Work. Conf. Reverse Eng., 2013, pp. 72–81.

[44]

X. Xia, D. Lo, M. Wen, E. Shihab, and B. Zhou, “An empirical study of bug report field reassignment,” in Proc. Softw. Evol. Week-IEEE Conf. Softw. Maint. Reengineering Reverse Eng., 2014, pp. 174–183.

[45]

X. Xie, W. Zhang, Y. Yang, and Q. Wang, “Dretom: Developer recommendation based on topic models for bug resolution,” in Proc. 8th Int. Conf. Predictive Models Softw. Eng., 2012, pp. 19–28.

Digital Library

[46]

G. Yang, T. Zhang, and B. Lee, “Towards semi-automatic bug triage and severity prediction based on topic model and multi-feature of bug reports,” in Proc. IEEE 38th Annu. Comput. Softw. Appl. Conf., 2014, pp. 97–106.

Digital Library

Cited By

Tufano RMastropaolo APepe FDabic ODi Penta MBavota GSpinellis DConstantinou EBacchelli A(2024)Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based StudyProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644918(571-583)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644918
Sanei ACheng J(2024)Characterizing Usability Issue Discussions in Open Source Software ProjectsProceedings of the ACM on Human-Computer Interaction10.1145/36373078:CSCW1(1-26)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3637307
Wang RJi XXu STian YJiang SHuang R(2024)An empirical assessment of different word embedding and deep learning models for bug assignmentJournal of Systems and Software10.1016/j.jss.2024.111961210:COnline publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1016/j.jss.2024.111961
Show More Cited By

Index Terms

Improving Automated Bug Triaging with Specialized Topic Model
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Index terms have been assigned to the content through auto-classification.

Recommendations

Improving bug triage with bug tossing graphs
ESEC/FSE '09: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering

bug report is typically assigned to a single developer who is then responsible for fixing the bug. In Mozilla and Eclipse, between 37%-44% of bug reports are "tossed" (reassigned) to other developers, for example because the bug has been assigned by ...
Towards Semi-automatic Bug Triage and Severity Prediction Based on Topic Model and Multi-feature of Bug Reports
COMPSAC '14: Proceedings of the 2014 IEEE 38th Annual Computer Software and Applications Conference

Bug fixing is an essential activity in the software maintenance, because most of the software systems have unavoidable defects. When new bugs are submitted, triagers have to find and assign appropriate developers to fix the bugs. However, if the bugs are ...
Analytical Study on Bug Triaging Practices

Software bugs are inevitable and fixing these bugs is a difficult and time consuming task. Bug report assignment is the activity of designating a developer who makes source code changes in order to fix the bug. Many bug assignment techniques have been ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Software Engineering

IEEE Transactions on Software Engineering Volume 43, Issue 3

March 2017

93 pages

ISSN:0098-5589

Issue’s Table of Contents

Copyright © 2017.

Publisher

IEEE Press

Publication History

Published: 01 March 2017

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

65
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tufano RMastropaolo APepe FDabic ODi Penta MBavota GSpinellis DConstantinou EBacchelli A(2024)Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based StudyProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644918(571-583)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644918
Sanei ACheng J(2024)Characterizing Usability Issue Discussions in Open Source Software ProjectsProceedings of the ACM on Human-Computer Interaction10.1145/36373078:CSCW1(1-26)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3637307
Wang RJi XXu STian YJiang SHuang R(2024)An empirical assessment of different word embedding and deep learning models for bug assignmentJournal of Systems and Software10.1016/j.jss.2024.111961210:COnline publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1016/j.jss.2024.111961
Wang WWu CHe J(2024)CLeBPIInformation and Software Technology10.1016/j.infsof.2023.107302164:COnline publication date: 10-Jan-2024
https://dl.acm.org/doi/10.1016/j.infsof.2023.107302
Tamanna SUddin GXia LZhang LOnut IShirani POnut IOnut IBranco P(2023)Characterizing Issue Management in Runtime SystemsProceedings of the 33rd Annual International Conference on Computer Science and Software Engineering10.5555/3615924.3615930(54-63)Online publication date: 11-Sep-2023
https://dl.acm.org/doi/10.5555/3615924.3615930
Assi MHassan SGeorgiou SZou Y(2023)Predicting the Change Impact of Resolving Defects by Leveraging the Topics of Issue Reports in Open Source Software SystemsACM Transactions on Software Engineering and Methodology10.1145/359380232:6(1-34)Online publication date: 30-Sep-2023
https://dl.acm.org/doi/10.1145/3593802
Jahanshahi HCevik MMousavi KBaşar A(2023)ADPTriage: Approximate Dynamic Programming for Bug TriageIEEE Transactions on Software Engineering10.1109/TSE.2023.330724349:10(4594-4609)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3307243
Ciniselli MPascarella LAghajani EScalabrino SOliveto RBavota GGrundy JPollock LPenta M(2023)Source Code Recommender Systems: The Practitioners' PerspectiveProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00182(2161-2172)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE48619.2023.00182
Mastropaolo APascarella LGuglielmi ECiniselli MScalabrino SOliveto RBavota GGrundy JPollock LPenta M(2023)On the Robustness of Code Generation Techniques: An Empirical Study on GitHub CopilotProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00181(2149-2160)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE48619.2023.00181
Diamantopoulos TSaoulidis NSymeonidis A(2023)Automated issue assignment using topic modelling on Jira issue tracking dataIET Software10.1049/sfw2.1212917:3(333-344)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1049/sfw2.12129
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents