skip to main content
research-article

Improving Automated Bug Triaging with Specialized Topic Model

Published: 01 March 2017 Publication History
  • Get Citation Alerts
  • Abstract

    Bug triaging refers to the process of assigning a bug to the most appropriate developer to fix. It becomes more and more difficult and complicated as the size of software and the number of developers increase. In this paper, we propose a new framework for bug triaging, which maps the words in the bug reports (i.e., the term space) to their corresponding topics (i.e., the topic space). We propose a specialized topic modeling algorithm named multi-feature topic model (MTM) which extends Latent Dirichlet Allocation (LDA) for bug triaging. MTM considers product and component information of bug reports to map the term space to the topic space. Finally, we propose an incremental learning method named TopicMiner which considers the topic distribution of a new bug report to assign an appropriate fixer based on the affinity of the fixer to the topics. We pair TopicMiner with MTM ( TopicMiner $^{MTM}$ ). We have evaluated our solution on 5 large bug report datasets including GCC, OpenOffice, Mozilla, Netbeans, and Eclipse containing a total of 227,278 bug reports. We show that TopicMiner $^{MTM}$ can achieve top-1 and top-5 prediction accuracies of 0.4831-0.6868, and 0.7686-0.9084, respectively. We also compare TopicMiner $^{MTM}$ with Bugzie, LDA-KL, SVM-LDA, LDA-Activity, and Yang et al.'s approach. The results show that TopicMiner $^{MTM}$ on average improves top-1 and top-5 prediction accuracies of Bugzie by 128.48 and 53.22 percent, LDA-KL by 262.91 and 105.97 percent, SVM-LDA by 205.89 and 110.48 percent, LDA-Activity by 377.60 and 176.32 percent, and Yang et al.'s approach by 59.88 and 13.70 percent, respectively.

    References

    [1]
    . (2016). {Online}. Available: https://bugs.eclipse.org/bugs/
    [2]
    . (2016). {Online}. Available: http://gcc.gnu.org/bugzilla/
    [3]
    . (2016). {Online}: https://bugzilla.mozilla.org/.
    [4]
    . (2016). {Online}. Available: http://netbeans.org/bugzilla/
    [5]
    . (2016). {Online}. Available: https://issues.apache.org/ooo/
    [6]
    H. Abdi, “<chapter-title>Bonferroni and šidák corrections for multiple comparisons</chapter-title>” in Encyclopedia of Measurement and Statistics, N. J. Salkind, Ed. Thousand Oaks, CA, USA: Sage, 2007.
    [7]
    J. Anvik, L. Hiew, and G. Murphy, “Who should fix this bug?” in Proc. 28th Int. Conf. Softw. Eng., 2006, pp. 361–370.
    [8]
    J. Anvik and G. Murphy, “Determining implementation expertise from bug reports,” presented at the Proc. 4th Int. Workshop Min. Softw. Repositories, Washington, DC, USA, 2007.
    [9]
    D. Bertram, A. Voida, S. Greenberg, and R. Walker, “<issue-title>Communication, collaboration, and bugs: The social nature of issue tracking in small, collocated teams</issue-title>,” in Proc. 2010 ACM Conf. Comput. Support. Coop. Work, 2010, pp. 291–300.
    [10]
    P. Bhattacharya and I. Neamtiu, “Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging,” in Proc. 2010 IEEE Int. Conf. Softw. Maint., 2010, pp. 1–10.
    [11]
    D. Binkley, D. Heinz, D. Lawrie, and J. Overfelt, “Understanding lda in source code analysis,” in Proc. 22nd Int. Conf. Program Comprehension, 2014, pp. 26–36.
    [12]
    D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet, et al., location,” J. Mach. Learn. Res., vol. Volume 3, pp. 993–1022, 2003.
    [13]
    G. Bortis and A. van der Hoek, “Porchlight: A tag-based approach to bug triaging,” in Proc. 35th Int. Conf. Softw. Eng., 2013, pp. 342–351.
    [14]
    N. Cliff, Ordinal Methods for Behavioral Data Analysis . Hove, United Kingdom: Psychology Press, 2014.
    [15]
    D. Čubranić, “Automatic bug triage using text categorization,” in Proc. 16th Int. Conf. Softw. Eng. Knowl. Eng., 2004, pp. 92–97.
    [16]
    S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” J. Am. Soc. Inf. Sci., vol. Volume 41, no. Issue 6, pp. 391–407, 1990.
    [17]
    G. Heinrich, Parameter estimation for text analysis. {Online}. Available: http://www.arbylon.net/publications/text-est.pdf, 2005.
    [18]
    T. Hofmann, “Probabilistic latent semantic analysis,” in Proc. 99th Conf. Uncertain. Artif. Intell., 1999, pp. 289–296.
    [19]
    W. M. Ibrahim, N. Bettenburg, B. Adams, and A. E. Hassan, “On the relationship between comment update practices and software bugs,” J. Syst. Softw., vol. Volume 85, no. Issue 10, pp. 2293–2304, 2012.
    [20]
    G. Jeong, S. Kim, and T. Zimmermann, “Improving bug triage with bug tossing graphs,” in Proc. Joint Meeting Eur. Softw. Eng. Conf. ACM SIGSOFT Int. Symp. Found. Softw. Eng., 2009, pp. 111–120.
    [21]
    H. Kagdi, M. Gethers, D. Poshyvanyk, and M. Hammad, “Assigning change requests to software developers,” J. Softw.: Evol. Process, vol. Volume 24, no. Issue 1, pp. 3–33, 2012.
    [22]
    H. H. Kagdi, M. Gethers, D. Poshyvanyk, and M. Hammad, “Assigning change requests to software developers,” J. Softw. Maint., vol. Volume 24, no. Issue 1, pp. 3–33, 2012.
    [23]
    C. Kolassa, D. Riehle, and M. A. Salim, “<chapter-title>A model of the commit size distribution of open source</chapter-title>,” in SOFSEM 2013: Theory and Practice of Computer Science . Berlin, Germany: Springer, 2013.
    [24]
    M. Linares-Vásquez, K. Hossen, H. Dang, H. Kagdi, M. Gethers, and D. Poshyvanyk, “Triaging incoming change requests: Bug or commit history, or code authorship?” in Proc. 28th IEEE Int. Conf. Softw. Maint., 2012, pp. 451–460.
    [25]
    C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval, vol. Volume 1 . Cambridge, U.K.: Cambridge Univ. Press, 2008.
    [26]
    D. Matter, A. Kuhn, and O. Nierstrasz, “Assigning bug reports using a vocabulary-based expertise model of developers,” in Proc. 6th IEEE Int. Work. Conf. Min. Softw. Repositories, 2009, pp. 131–140.
    [27]
    H. Naguib, N. Narayan, B. Brugge, and D. Helal, “Bug report assignee recommendation using activity profiles,” in Proc. 10th IEEE Work. Conf. Min. Softw. Repositories, 2013, pp. 22–30.
    [28]
    A. Nguyen, T. Nguyen, J. Al-Kofahi, H. Nguyen, and T. Nguyen, “A topic-based approach for narrowing the search space of buggy files from a bug report,” in Proc. 26th IEEE/ACM Int. Conf. Autom. Softw. Eng., 2011, pp. 263–272.
    [29]
    A. T. Nguyen, T. T. Nguyen, T. N. Nguyen, D. Lo, and C. Sun, “Duplicate bug report detection with a combination of information retrieval and topic modeling,” in Proc. 27th IEEE/ACM Int. Conf. Autom. Softw. Eng., 2012, pp. 70–79.
    [30]
    A. Panichella, B. Dit, R. Oliveto, M. Di Penta, D. Poshyvanyk, and A. De Lucia, “How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms,” in Proc. Int. Conf. Softw. Eng., 2013, pp. 522–531.
    [31]
    M. F. Porter, “An algorithm for suffix stripping,” Program, vol. Volume 14, no. Issue 3, pp. 130–137, 1980.
    [32]
    R. K. Saha, M. Lease, S. Khurshid, and D. E. Perry, “Improving bug localization using structured information retrieval,” in Proc. IEEE/ACM 28th Int. Conf. Autom. Softw. Eng., 2013, pp. 345–355.
    [33]
    R. Shokripour, J. Anvik, Z. M. Kasirun, and S. Zamani, “Why so complicated? simple term filtering and weighting for location-based bug report assignment recommendation,” in Proc. 10th Work. Conf. Min. Softw. Repositories, 2013, pp. 2–11.
    [34]
    R. Shokripour, J. Anvik, Z. M. Kasirun, and S. Zamani, “A time-based approach to automatic bug report assignment,” J. Syst. Softw., vol. Volume 102, pp. 109–122, 2015.
    [35]
    K. Somasundaram and G. C. Murphy, “Automatic categorization of bug reports using latent Dirichlet allocation,” in Proc. 5th India Softw. Eng. Conf., 2012, pp. 125–130.
    [36]
    M. Steyvers and T. Griffiths, “<chapter-title>Probabilistic topic models</chapter-title>,” in Latent Semantic Analysis: A Road to Meaning, T. Landauer, D. McNamara, S. Dennis, and W. Kintsch, Eds., Hove, United Kingdom: Laurence Erlbaum, 2007.
    [37]
    C. Sun, D. Lo, S.-C. Khoo, and J. Jiang, “Towards more accurate retrieval of duplicate bug reports,” in Proc. 26th IEEE/ACM Int. Conf. Autom. Softw. Eng., 2011, pp. 253–262.
    [38]
    A. Tamrawi, T. Nguyen, J. Al-Kofahi, and T. Nguyen, “Fuzzy set and cache-based approach for bug triaging,” in Proc. 19th ACM SIGSOFT Symp. 13th Eur. Conf. Found. Softw. Eng., 2011, pp. 365–375.
    [39]
    Y. Tian, D. Lo, X. Xia, and C. Sun, “Automated prediction of bug report priority using multi-factor analysis,” Empir. Softw. Eng., pp. 1–30, 2014.
    [40]
    F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics Bull., vol Volume 1, no. Issue 6, pp. 80–83, 1945.
    [41]
    W. Wu, W. Zhang, Y. Yang, and Q. Wang, “Drex: Developer recommendation with k-nearest-neighbor search and expertise ranking,” in Proc. 18th Asia Pac. Softw. Eng. Conf., 2011, pp. 389–396.
    [42]
    X. Xia, Y. Ding, D. Lo, J. Al-Kofahi, T. Nguyen, and X. Wang, “Toward more accurate bug triaging with topic modeling,” Tech. Report. {Online}. Available: http://pan.baidu.com/s/1eRkv4Dc, 2015.
    [43]
    X. Xia, D. Lo, X. Wang, and B. Zhou, “Accurate developer recommendation for bug resolution,” in Proc. 20th Work. Conf. Reverse Eng., 2013, pp. 72–81.
    [44]
    X. Xia, D. Lo, M. Wen, E. Shihab, and B. Zhou, “An empirical study of bug report field reassignment,” in Proc. Softw. Evol. Week-IEEE Conf. Softw. Maint. Reengineering Reverse Eng., 2014, pp. 174–183.
    [45]
    X. Xie, W. Zhang, Y. Yang, and Q. Wang, “Dretom: Developer recommendation based on topic models for bug resolution,” in Proc. 8th Int. Conf. Predictive Models Softw. Eng., 2012, pp. 19–28.
    [46]
    G. Yang, T. Zhang, and B. Lee, “Towards semi-automatic bug triage and severity prediction based on topic model and multi-feature of bug reports,” in Proc. IEEE 38th Annu. Comput. Softw. Appl. Conf., 2014, pp. 97–106.

    Cited By

    View all
    • (2024)Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based StudyProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644918(571-583)Online publication date: 15-Apr-2024
    • (2024)Characterizing Usability Issue Discussions in Open Source Software ProjectsProceedings of the ACM on Human-Computer Interaction10.1145/36373078:CSCW1(1-26)Online publication date: 26-Apr-2024
    • (2024)An empirical assessment of different word embedding and deep learning models for bug assignmentJournal of Systems and Software10.1016/j.jss.2024.111961210:COnline publication date: 1-Apr-2024
    • Show More Cited By

    Index Terms

    1. Improving Automated Bug Triaging with Specialized Topic Model
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image IEEE Transactions on Software Engineering
      IEEE Transactions on Software Engineering  Volume 43, Issue 3
      March 2017
      93 pages

      Publisher

      IEEE Press

      Publication History

      Published: 01 March 2017

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 14 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based StudyProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644918(571-583)Online publication date: 15-Apr-2024
      • (2024)Characterizing Usability Issue Discussions in Open Source Software ProjectsProceedings of the ACM on Human-Computer Interaction10.1145/36373078:CSCW1(1-26)Online publication date: 26-Apr-2024
      • (2024)An empirical assessment of different word embedding and deep learning models for bug assignmentJournal of Systems and Software10.1016/j.jss.2024.111961210:COnline publication date: 1-Apr-2024
      • (2024)CLeBPIInformation and Software Technology10.1016/j.infsof.2023.107302164:COnline publication date: 10-Jan-2024
      • (2023)Characterizing Issue Management in Runtime SystemsProceedings of the 33rd Annual International Conference on Computer Science and Software Engineering10.5555/3615924.3615930(54-63)Online publication date: 11-Sep-2023
      • (2023)Predicting the Change Impact of Resolving Defects by Leveraging the Topics of Issue Reports in Open Source Software SystemsACM Transactions on Software Engineering and Methodology10.1145/359380232:6(1-34)Online publication date: 30-Sep-2023
      • (2023)ADPTriage: Approximate Dynamic Programming for Bug TriageIEEE Transactions on Software Engineering10.1109/TSE.2023.330724349:10(4594-4609)Online publication date: 1-Oct-2023
      • (2023)Source Code Recommender Systems: The Practitioners' PerspectiveProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00182(2161-2172)Online publication date: 14-May-2023
      • (2023)On the Robustness of Code Generation Techniques: An Empirical Study on GitHub CopilotProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00181(2149-2160)Online publication date: 14-May-2023
      • (2023)Automated issue assignment using topic modelling on Jira issue tracking dataIET Software10.1049/sfw2.1212917:3(333-344)Online publication date: 30-May-2023
      • Show More Cited By

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media