research-article

RepresentThemAll: A Universal Learning Representation of Bug Reports

Authors:

Xiaobing SunAuthors Info & Claims

ICSE '23: Proceedings of the 45th International Conference on Software Engineering

Pages 602 - 614

https://doi.org/10.1109/ICSE48619.2023.00060

Published: 26 July 2023 Publication History

Abstract

Deep learning techniques have shown promising performance in automated software maintenance tasks associated with bug reports. Currently, all existing studies learn the customized representation of bug reports for a specific downstream task. Despite early success, training multiple models for multiple downstream tasks faces three issues: complexity, cost, and compatibility, due to the customization, disparity, and uniqueness of these automated approaches. To resolve the above challenges, we propose RepresentThemAll, a pre-trained approach that can learn the universal representation of bug reports and handle multiple downstream tasks. Specifically, RepresentThemAll is a universal bug report framework that is pre-trained with two carefully designed learning objectives: one is the dynamic masked language model and another one is a contrastive learning objective, "find yourself". We evaluate the performance of RepresentThemAll on four downstream tasks, including duplicate bug report detection, bug report summarization, bug priority prediction, and bug severity prediction. Our experimental results show that RepresentThemAll outperforms all baseline approaches on all considered downstream tasks after well-designed fine-tuning.

References

[1]

Y. Tian, D. Lo, X. Xia, and C. Sun, "Automated prediction of bug report priority using multi-factor analysis," Empirical Software Engineering, vol. 20, no. 5, pp. 1354--1383, 2015.

Digital Library

[2]

S. Fang, Y.-s. Tan, T. Zhang, Z. Xu, and H. Liu, "Effective prediction of bug-fixing priority via weighted graph convolutional networks," IEEE Transactions on Reliability, vol. 70, no. 2, pp. 563--574, 2021.

[3]

T. Zhang, J. Chen, G. Yang, B. Lee, and X. Luo, "Towards more accurate severity prediction and fixer recommendation of software bugs," Journal of Systems and Software, vol. 117, pp. 166--184, 2016.

Digital Library

[4]

Y. Tan, S. Xu, Z. Wang, T. Zhang, Z. Xu, and X. Luo, "Bug severity prediction using question-and-answer pairs from stack overflow," Journal of Systems and Software, vol. 165, p. 110567, 2020.

[5]

A. Budhiraja, K. Dutta, R. Reddy, and M. Shrivastava, "Dwen: deep word embedding network for duplicate bug report detection in software repositories," in Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings, 2018, pp. 193--194.

[6]

J. He, L. Xu, M. Yan, X. Xia, and Y. Lei, "Duplicate bug report detection using dual-channel convolutional neural networks," in Proceedings of the 28th International Conference on Program Comprehension, 2020, pp. 117--127.

[7]

X. Li, H. Jiang, D. Liu, Z. Ren, and G. Li, "Unsupervised deep bug report summarization," in Proceedings of the 26th International Conference on Program Comprehension, 2018, pp. 144--155.

[8]

H. Liu, Y. Yu, S. Li, Y. Guo, D. Wang, and X. Mao, "Bugsum: Deep context understanding for bug report summarization," in Proceedings of the 28th International Conference on Program Comprehension, 2020, pp. 94--105.

[9]

M. Kumari and V. Singh, "An improved classifier based on entropy and deep learning for bug priority prediction," in Proceedings of the 18th International Conference on Intelligent Systems Design and Applications, 2018, pp. 571--580.

[10]

W. Y. Ramay, Q. Umer, X. C. Yin, C. Zhu, and I. Illahi, "Deep neural network-based severity prediction of bug reports," IEEE Access, vol. 7, pp. 46 846--46 857, 2019.

[11]

Y. Xiao, J. Keung, K. E. Bennin, and Q. Mi, "Improving bug localization with word embedding and enhanced convolutional neural networks," Information and Software Technology, vol. 105, pp. 17--29, 2019.

[12]

H. Isotani, H. Washizaki, Y. Fukazawa, T. Nomoto, S. Ouji, and S. Saito, "Duplicate bug report detection by using sentence embedding and fine-tuning," in Proceedings of the 37th International Conference on Software Maintenance and Evolution, 2021, pp. 535--544.

[13]

Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, "A comprehensive survey on graph neural networks," IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 4--24, 2020.

[14]

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," 2013.

[15]

Y. LeCun, Y. Bengio et al., "Convolutional networks for images, speech, and time series," The Handbook of Brain Theory and Neural Networks, vol. 3361, no. 10, p. 1995, 1995.

Digital Library

[16]

T. Zhang, H. Jiang, X. Luo, and A. T. Chan, "A literature review of research in bug resolution: Tasks, challenges and future directions," The Computer Journal, vol. 59, no. 5, pp. 741--773, 2016.

[17]

W. Zou, D. Lo, Z. Chen, X. Xia, Y. Feng, and B. Xu, "How practitioners perceive automated bug report management techniques," IEEE Transactions on Software Engineering, vol. 46, no. 8, pp. 836--862, 2018.

[18]

J. Devlin, M. Chang, K. Lee, and K. Toutanova, "BERT: pre-training of deep bidirectional transformers for language understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2019, pp. 4171--4186.

[19]

Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou, "Codebert: A pre-trained model for programming and natural languages," 2020.

[20]

C. Niu, C. Li, V. Ng, J. Ge, L. Huang, and B. Luo, "Spt-code: sequence-to-sequence pre-training for learning source code representations," in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 01--13.

[21]

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, "A simple framework for contrastive learning of visual representations," in Proceedings of the 37th International Conference on Machine Learning, 2020, pp. 1597--1607.

[22]

T. Gao, X. Yao, and D. Chen, "SimCSE: Simple contrastive learning of sentence embeddings," in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 6894--6910.

[23]

M. Wei, N. S. Harzevili, Y. Huang, J. Wang, and S. Wang, "Clear: Contrastive learning for api recommendation," in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 376--387.

[24]

Q. Chen, J. Lacomis, E. J. Schwartz, G. Neubig, B. Vasilescu, and C. L. Goues, "Varclr: Variable semantic representation pre-training via contrastive learning," in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 2327--2339.

[25]

J. Huang, D. Tang, L. Shou, M. Gong, K. Xu, D. Jiang, M. Zhou, and N. Duan, "CoSQA: 20,000+ web queries for code search and question answering," in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021, pp. 5690--5700.

[26]

I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," in Proceedings of the 28th Conference on Neural Information Processing Systems (NIPS), 2014, pp. 3104--3112.

[27]

A. Lazar, S. Ritchey, and B. Sharif, "Generating duplicate bug datasets," in Proceedings of the 11th Working Conference on Mining Software Repositories, 2014, pp. 392--395.

[28]

Y. Tian, D. Lo, and C. Sun, "Information retrieval based nearest neighbor classification for fine-grained bug severity prediction," in Proceedings of the 19th Working Conference on Reverse Engineering, 2012, pp. 215--224.

[29]

A. T. Nguyen, T. T. Nguyen, J. Al-Kofahi, H. V. Nguyen, and T. N. Nguyen, "A topic-based approach for narrowing the search space of buggy files from a bug report," in Proceedings of the 26th International Conference on Automated Software Engineering, 2011, pp. 263--272.

[30]

J. Zhou, H. Zhang, and D. Lo, "Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports," in Proceedings of the 34th International Conference on Software Engineering, 2012, pp. 14--24.

[31]

R. K. Saha, M. Lease, S. Khurshid, and D. E. Perry, "Improving bug localization using structured information retrieval," in Proceedings of the 28th International Conference on Automated Software Engineering, 2013, pp. 345--355.

[32]

Q. Umer, H. Liu, and I. Illahi, "Cnn-based automatic prioritization of bug reports," IEEE Transactions on Reliability, vol. 69, no. 4, pp. 1341--1354, 2019.

[33]

R. Hadsell, S. Chopra, and Y. LeCun, "Dimensionality reduction by learning an invariant mapping," in Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006, pp. 1735--1742.

[34]

O. Chaparro, J. Lu, F. Zampetti, L. Moreno, M. Di Penta, A. Marcus, G. Bavota, and V. Ng, "Detecting missing information in bug descriptions," in Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, 2017, pp. 396--407.

[35]

T. Zhang, J. Chen, H. Jiang, X. Luo, and X. Xia, "Bug report enrichment with application of automated fixer recommendation," in Proceedings of the 25th International Conference on Program Comprehension, 2017, pp. 230--240.

[36]

R. K. Saha, Y. Lyu, H. Yoshida, and M. R. Prasad, "Elixir: Effective object-oriented program repair," in 2017 32nd IEEE/ACM International Conference on Automated Software Engineering, 2017, pp. 648--659.

[37]

A. Koyuncu, K. Liu, T. F. Bissyandé, D. Kim, M. Monperrus, J. Klein, and Y. Le Traon, "ifixr: Bug report driven program repair," in Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019, pp. 314--325.

[38]

U. Alon, R. Sadaka, O. Levy, and E. Yahav, "Structural language models of code," in Proceedings of the 37th International Conference on Machine Learning, 2020, pp. 245--256.

[39]

H. Husain, H.-H. Wu, T. Gazit, M. Allamanis, and M. Brockschmidt, "Codesearchnet challenge: Evaluating the state of semantic code search," 2020.

[40]

X. Gu, H. Zhang, and S. Kim, "Deep code search," in Proceedings of the IEEE/ACM 40th International Conference on Software Engineering (ICSE), 2018, pp. 933--944.

Digital Library

[41]

S. Fang, Y.-S. Tan, T. Zhang, and Y. Liu, "Self-attention networks for code search," Information and Software Technology, vol. 134, p. 106542, 2021.

[42]

X. Hu, G. Li, X. Xia, D. Lo, and Z. Jin, "Deep code comment generation," in Proceedings of the IEEE/ACM 26th International Conference on Program Comprehension (ICPC), 2018, pp. 200--20 010.

Digital Library

[43]

A. LeClair, S. Jiang, and C. McMillan, "A neural model for generating natural language summaries of program subroutines," in Proceedings of the 41st International Conference on Software Engineering (ICSE), 2019, pp. 795--806.

[44]

N. Jiang, T. Lutellier, and L. Tan, "Cure: Code-aware neural machine translation for automatic program repair," in Proceedings of the 43rd International Conference on Software Engineering, 2021, pp. 1161--1173.

[45]

P. Ardimento and C. Mele, "Using bert to predict bug-fixing time," in Proceedings of the 14th Conference on Evolving and Adaptive Intelligent Systems, 2020, pp. 1--7.

[46]

M. Li and B.-B. Yin, "Arb-bert: An automatic aging-related bug report classification method based on bert," in Proceedings of the 8th International Conference on Dependable Systems and Their Applications, 2021, pp. 474--483.

[47]

L. Bo and J. Lu, "Bug question answering with pretrained encoders," in Proceedings of the 28th IEEE International Conference on Software Analysis, Evolution and Reengineering, 2021, pp. 654--660.

[48]

P. R. Henao, J. Fischbach, D. Spies, J. Frattini, and A. Vogelsang, "Transfer learning for mining feature requests and bug reports from tweets and app store reviews," in Proceedings of the 29th International Requirements Engineering Conference Workshops, 2021, pp. 80--86.

[49]

C. Wang, K. Cho, and J. Gu, "Neural machine translation with byte-level subwords," in Proceedings of the 34th AAAI Conference on Artificial Intelligence, pp. 9154--9160.

[50]

R. Sennrich, B. Haddow, and A. Birch, "Neural machine translation of rare words with subword units," in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), 2016, pp. 1715--1725.

[51]

J. Xu, H. Zhou, C. Gan, Z. Zheng, and L. Li, "Vocabulary learning via optimal transport for neural machine translation," in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 7361--7373.

[52]

Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, "Roberta: A robustly optimized bert pretraining approach," 2019.

[53]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," in Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS), 2017, pp. 5998--6008.

[54]

K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770--778.

[55]

J. L. Ba, J. R. Kiros, and G. E. Hinton, "Layer normalization," arXiv preprint arXiv:1607.06450, 2016.

[56]

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving language understanding by generative pre-training," Technical Report, OpenAI, 2018.

[57]

D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.

[58]

J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, "Biobert: a pre-trained biomedical language representation model for biomedical text mining," Bioinformatics, vol. 36, no. 4, pp. 1234--1240, 2020.

[59]

A. Ciborowska and K. Damevski, "Fast changeset-based bug localization with bert," arXiv preprint arXiv:2112.14169, 2021.

[60]

S. Fang, T. Zhang, Y.-S. Tan, Z. Xu, Z.-X. Yuan, and L.-Z. Meng, "Prhan: Automated pull request description generation based on hybrid attention network," Journal of Systems and Software, vol. 185, p. 111160, 2022.

Digital Library

[61]

K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, "Bleu: a method for automatic evaluation of machine translation," in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 2002, pp. 311--318.

[62]

X. Chen, C. Liu, and D. Song, "Tree-to-tree neural networks for program translation," Advances in Neural Information Processing Systems, vol. 31, 2018.

[63]

D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," 2016.

[64]

J. Deshmukh, K. Annervaz, S. Podder, S. Sengupta, and N. Dubash, "Towards accurate duplicate bug retrieval using deep learning techniques," in Proceedings of the 33rd International Conference on Software Maintenance and Evolution, 2017, pp. 115--124.

[65]

J. Von der Mosel, A. Trautsch, and S. Herbold, "On the validity of pre-trained transformers for natural language processing in the software engineering domain," IEEE Transactions on Software Engineering, 2022.

[66]

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., "Pytorch: An imperative style, high-performance deep learning library," Advances in Neural Information Processing Systems, vol. 32, 2019.

[67]

T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz et al., "Transformers: State-of-the-art natural language processing," in Proceedings of the 2020 conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020, pp. 38--45.

[68]

Q. Lhoest, A. V. del Moral, Y. Jernite, A. Thakur, P. von Platen, S. Patil, J. Chaumond, M. Drame, J. Plu, L. Tunstall et al., "Datasets: A community library for natural language processing," arXiv preprint arXiv:2109.02846, 2021.

[69]

Z. Liu, X. Xia, C. Treude, D. Lo, and S. Li, "Automatic generation of pull request descriptions," in Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019, pp. 176--188.

Recommendations

Towards Semi-automatic Bug Triage and Severity Prediction Based on Topic Model and Multi-feature of Bug Reports
COMPSAC '14: Proceedings of the 2014 IEEE 38th Annual Computer Software and Applications Conference

Bug fixing is an essential activity in the software maintenance, because most of the software systems have unavoidable defects. When new bugs are submitted, triagers have to find and assign appropriate developers to fix the bugs. However, if the bugs are ...
Works for me! characterizing non-reproducible bug reports
MSR 2014: Proceedings of the 11th Working Conference on Mining Software Repositories

Bug repository systems have become an integral component of software development activities. Ideally, each bug report should help developers to find and fix a software fault. However, there is a subset of reported bugs that is not (easily) reproducible,...
An Effective Approach for Routing the Bug Reports to the Right Fixers
Internetware '18: Proceedings of the 10th Asia-Pacific Symposium on Internetware

Routing the bug reports to potential fixers (i.e., bug triaging), is an integral step in software development and maintenance. However, manually inspecting and assigning bug reports is tedious and time-consuming, especially in those software projects ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '23: Proceedings of the 45th International Conference on Software Engineering

May 2023

2713 pages

ISBN:9781665457019

General Chair:
John Grundy
Department of Software Systems and Cybersecurity, Faculty of IT, Monash University, Australia
,
Program Co-chairs:
Lori Pollock
University of Delaware, DE, USA
,
Massimiliano Di Penta
University of Sannio, Italy

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

IEEE CS

Publisher

IEEE Press

Publication History

Published: 26 July 2023

Check for updates

Qualifiers

Research-article

Conference

ICSE '23

Sponsor:

SIGSOFT

ICSE '23: 45th International Conference on Software Engineering

May 14 - 20, 2023

Victoria, Melbourne, Australia

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
75
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)3

Reflects downloads up to 24 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents