research-article

Automating code review activities by large-scale pre-training

Authors:

Shailesh Jannu,

Alexey Svyatkovskiy,

Neel SundaresanAuthors Info & Claims

ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 1035 - 1047

https://doi.org/10.1145/3540250.3549081

Published: 09 November 2022 Publication History

Abstract

Code review is an essential part to software development lifecycle since it aims at guaranteeing the quality of codes. Modern code review activities necessitate developers viewing, understanding and even running the programs to assess logic, functionality, latency, style and other factors. It turns out that developers have to spend far too much time reviewing the code of their peers. Accordingly, it is in significant demand to automate the code review process. In this research, we focus on utilizing pre-training techniques for the tasks in the code review scenario. We collect a large-scale dataset of real-world code changes and code reviews from open-source projects in nine of the most popular programming languages. To better understand code diffs and reviews, we propose CodeReviewer, a pre-trained model that utilizes four pre-training tasks tailored specifically for the code review scenario. To evaluate our model, we focus on three key tasks related to code review activities, including code change quality estimation, review comment generation and code refinement. Furthermore, we establish a high-quality benchmark dataset based on our collected data for these three tasks and conduct comprehensive experiments on it. The experimental results demonstrate that our model outperforms the previous state-of-the-art pre-training approaches in all tasks. Further analysis show that our proposed pre-training tasks and the multilingual pre-training dataset benefit the model on the understanding of code changes and reviews.

References

[1]

A Frank Ackerman, Priscilla J Fowler, and Robert G Ebenau. 1984. Software inspections and the industrial production of software. In Proc. of a symposium on Software validation: inspection-testing-verification-alternatives. 13–40.

Digital Library

[2]

Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified Pre-training for Program Understanding and Generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2655–2668.

[3]

Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and challenges of modern code review. In 35th International Conference on Software Engineering, ICSE ’13, San Francisco, CA, USA, May 18-26, 2013, David Notkin, Betty H. C. Cheng, and Klaus Pohl (Eds.). IEEE Computer Society, 712–721. https://doi.org/10.1109/ICSE.2013.6606617

[4]

Moritz Beller, Alberto Bacchelli, Andy Zaidman, and Elmar Jürgens. 2014. Modern code reviews in open-source projects: which problems do they fix? In 11th Working Conference on Mining Software Repositories, MSR 2014, Proceedings, May 31 - June 1, 2014, Hyderabad, India, Premkumar T. Devanbu, Sung Kim, and Martin Pinzger (Eds.). ACM, 202–211. https://doi.org/10.1145/2597073.2597082

Digital Library

[5]

Amiangshu Bosu and Jeffrey C. Carver. 2013. Impact of Peer Code Review on Peer Impression Formation: A Survey. In ESEM. IEEE Computer Society, 133–142.

[6]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, and Amanda Askell. 2020. Language models are few-shot learners. Advances in neural information processing systems, 33 (2020), 1877–1901.

[7]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, and Greg Brockman. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.

[8]

Ting-Rui Chiang, Yi-Pei Chen, Yi-Ting Yeh, and Graham Neubig. 2021. Breaking Down Multilingual Machine Translation. CoRR, abs/2110.08130 (2021).

[9]

Moataz Chouchen, Ali Ouni, Mohamed Wiem Mkaouer, Raula Gaikovina Kula, and Katsuro Inoue. 2021. WhoReview: A multi-objective search-based approach for code reviewers recommendation in modern code review. Appl. Soft Comput., 100 (2021), 106908. https://doi.org/10.1016/j.asoc.2020.106908

[10]

Prem Devanbu, Matthew Dwyer, Sebastian Elbaum, Michael Lowry, Kevin Moran, Denys Poshyvanyk, Baishakhi Ray, Rishabh Singh, and Xiangyu Zhang. 2020. Deep learning & software engineering: State of research and future directions. arXiv preprint arXiv:2009.08525.

[11]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

[12]

Michael E. Fagan. 2002. Design and Code Inspections to Reduce Errors in Program Development (Reprint). In Software Pioneers, Manfred Broy and Ernst Denert (Eds.). Springer Berlin Heidelberg, 575–607. https://doi.org/10.1007/978-3-642-59412-0_35

[13]

Michael E. Fagan. 2002. A History of Software Inspections. In Software Pioneers. Springer Berlin Heidelberg, 562–573.

[14]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, and Daxin Jiang. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. 1536–1547.

[15]

Github. 2021. GitHub Copilot · Your AI pair programmer. https://copilot.github.com/

[16]

Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. arXiv preprint arXiv:2203.03850.

[17]

Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, and Shengyu Fu. 2021. GraphCodeBERT: Pre-training Code Representations with Data Flow. In ICLR.

[18]

Anshul Gupta and Neel Sundaresan. 2018. Intelligent code reviews using deep learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18) Deep Learning Day.

[19]

Vincent J. Hellendoorn, Jason Tsay, Manisha Mukherjee, and Martin Hirzel. 2021. Towards Automating Code Review at Scale. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, USA. 1479–1482. isbn:9781450385626 https://doi.org/10.1145/3468264.3473134

Digital Library

[20]

Robert Heumüller, Sebastian Nielebock, and Frank Ortmeier. 2021. Exploit Those Code Reviews! Bigger Data for Deeper Learning. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, USA. 1505–1509. isbn:9781450385626 https://doi.org/10.1145/3468264.3473110

Digital Library

[21]

Abram Hindle, Earl T. Barr, Mark Gabel, Zhendong Su, and Premkumar Devanbu. 2016. On the Naturalness of Software. Commun. ACM, 59, 5 (2016), apr, 122–131. issn:0001-0782 https://doi.org/10.1145/2902362

Digital Library

[22]

Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. CoRR, abs/1909.09436 (2019).

[23]

Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, and Kensen Shi. 2020. Learning and evaluating contextual embedding of source code. In International Conference on Machine Learning. 5110–5121.

[24]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.

[25]

Heng-Yi Li, Shu-Ting Shi, Ferdian Thung, Xuan Huo, Bowen Xu, Ming Li, and David Lo. 2019. Deepreview: automatic code review using deep multi-instance learning. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. 318–330.

Digital Library

[26]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.

[27]

Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, and Duyu Tang. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).

[28]

Murtuza Mukadam, Christian Bird, and Peter C Rigby. 2013. Gerrit software code review data from android. In 2013 10th Working Conference on Mining Software Repositories (MSR). 45–48.

[29]

Matheus Paixao, Jens Krinke, Donggyun Han, and Mark Harman. 2018. CROP: Linking Code Reviews to Source Code Changes. In Proceedings of the 15th International Conference on Mining Software Repositories (MSR ’18). Association for Computing Machinery, New York, NY, USA. 46–49. isbn:9781450357166 https://doi.org/10.1145/3196398.3196466

Digital Library

[30]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318.

[31]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res., 21, 1 (2020), Article 140, jan, 67 pages. issn:1532-4435

[32]

Peter C. Rigby and Christian Bird. 2013. Convergent contemporary software peer review practices. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013, Bertrand Meyer, Luciano Baresi, and Mira Mezini (Eds.). ACM, 202–212. https://doi.org/10.1145/2491411.2491444

Digital Library

[33]

Caitlin Sadowski, Emma Söderberg, Luke Church, Michal Sipko, and Alberto Bacchelli. 2018. Modern code review: a case study at google. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice, ICSE (SEIP) 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Frances Paulisch and Jan Bosch (Eds.). ACM, 181–190. https://doi.org/10.1145/3183519.3183525

Digital Library

[34]

Shu-Ting Shi, Ming Li, David Lo, Ferdian Thung, and Xuan Huo. 2019. Automatic code review by learning the revision of source code. In Proceedings of the AAAI Conference on Artificial Intelligence. 33, 4910–4917.

Digital Library

[35]

Jing Kai Siow, Cuiyun Gao, Lingling Fan, Sen Chen, and Yang Liu. 2020. CORE: Automating Review Recommendation for Code Changes. In SANER. IEEE, 284–295.

[36]

Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. IntelliCode Compose: Code Generation Using Transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA. 1433–1443. isbn:9781450370431 https://doi.org/10.1145/3368089.3417058

Digital Library

[37]

Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Raula Gaikovina Kula, Norihiro Yoshida, Hajimu Iida, and Ken-ichi Matsumoto. 2015. Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review. In 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2015, Montreal, QC, Canada, March 2-6, 2015, Yann-Gaël Guéhéneuc, Bram Adams, and Alexander Serebrenik (Eds.). IEEE Computer Society, 141–150. https://doi.org/10.1109/SANER.2015.7081824

[38]

Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Raula Gaikovina Kula, Norihiro Yoshida, Hajimu Iida, and Ken-ichi Matsumoto. 2015. Who should review my code? a file location-based code-reviewer recommendation approach for modern code review. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 141–150.

[39]

Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. 2019. An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation. ACM Trans. Softw. Eng. Methodol., 28, 4 (2019), Article 19, sep, 29 pages. issn:1049-331X https://doi.org/10.1145/3340544

Digital Library

[40]

Rosalia Tufano, Simone Masiero, Antonio Mastropaolo, Luca Pascarella, Denys Poshyvanyk, and Gabriele Bavota. 2022. Using Pre-Trained Models to Boost Code Review Automation. In Proceedings of the 44th International Conference on Software Engineering (ICSE ’22). Association for Computing Machinery, New York, NY, USA. 2291–2302. isbn:9781450392211 https://doi.org/10.1145/3510003.3510621

Digital Library

[41]

Rosalia Tufano, Luca Pascarella, Michele Tufano, Denys Poshyvanyk, and Gabriele Bavota. 2021. Towards Automating Code Review Activities. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, 163–174. https://doi.org/10.1109/ICSE43902.2021.00027

Digital Library

[42]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998–6008.

[43]

Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. CoRR, abs/2109.00859 (2021).

[44]

Cody Watson, Nathan Cooper, David Nader Palacio, Kevin Moran, and Denys Poshyvanyk. 2022. A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research. ACM Trans. Softw. Eng. Methodol., 31, 2 (2022), Article 32, mar, 58 pages. issn:1049-331X https://doi.org/10.1145/3485275

Digital Library

[45]

Xin Yang, Raula Gaikovina Kula, Norihiro Yoshida, and Hajimu Iida. 2016. Mining the modern code review repositories: a dataset of people, process and product. In Proceedings of the 13th International Conference on Mining Software Repositories, MSR 2016, Austin, TX, USA, May 14-22, 2016, Miryung Kim, Romain Robbes, and Christian Bird (Eds.). ACM, 460–463. https://doi.org/10.1145/2901739.2903504

Digital Library

[46]

Xin Yang, Raula Gaikovina Kula, Norihiro Yoshida, and Hajimu Iida. 2016. Mining the Modern Code Review Repositories: A Dataset of People, Process and Product. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR ’16). Association for Computing Machinery, New York, NY, USA. 460–463. isbn:9781450341868 https://doi.org/10.1145/2901739.2903504

Digital Library

[47]

Motahareh Bahrami Zanjani, Huzefa Kagdi, and Christian Bird. 2015. Automatically recommending peer reviewers in modern code review. IEEE Transactions on Software Engineering, 42, 6 (2015), 530–543.

Digital Library

[48]

Ming Zhu, Karthik Suresh, and Chandan K Reddy. 2022. Multilingual Code Snippets Training for Program Translation.

Cited By

Liu YTantithamthavorn CLiu YThongtanunam PLi L(2024)Automatically Recommend Code Updates: Are We There Yet?ACM Transactions on Software Engineering and Methodology10.1145/3678167Online publication date: 16-Jul-2024
https://doi.org/10.1145/3678167
Liu SLi YXie XMa WMeng GLiu Y(2024)Automated Commit Intelligence by Pre-trainingACM Transactions on Software Engineering and Methodology10.1145/3674731Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3674731
Hou YJin WWang ZWang LChen SWang YSang LWang HLiu T(2024)ERD-CQC : Enhanced Rule and Dependency Code Quality Check for JavaProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674820(377-386)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3674820
Show More Cited By

Index Terms

Automating code review activities by large-scale pre-training

Recommendations

Towards Automating Code Review Activities
ICSE '21: Proceedings of the 43rd International Conference on Software Engineering

Code reviews are popular in both industrial and open source projects. The benefits of code reviews are widely recognized and include better code quality and lower likelihood of introducing bugs. However, since code review is a manual activity it comes at ...
XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training
Source code representation learning is the basis of applying artificial intelligence to many software engineering tasks such as code clone detection, algorithm classification, and code summarization. Recently, many works have tried to improve the ...
Automating Code Review
ICSE '23: Proceedings of the 45th International Conference on Software Engineering: Companion Proceedings

Code reviews are popular in both industrial and open source projects. The benefits of code reviews are widely recognized and include better code quality and lower likelihood of introducing bugs. However, code review comes at the cost of spending ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 2022

1822 pages

ISBN:9781450394130

DOI:10.1145/3540250

General Chair:
Abhik Roychoudhury
National University of Singapore, Singapore
,
Program Chairs:
Cristian Cadar
Imperial College London, UK
,
Miryung Kim
University of California at Los Angeles, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE '22

Sponsor:

ESEC/FSE '22: 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 14 - 18, 2022

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
827
Total Downloads

Downloads (Last 12 months)532
Downloads (Last 6 weeks)41

Reflects downloads up to 14 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu YTantithamthavorn CLiu YThongtanunam PLi L(2024)Automatically Recommend Code Updates: Are We There Yet?ACM Transactions on Software Engineering and Methodology10.1145/3678167Online publication date: 16-Jul-2024
https://doi.org/10.1145/3678167
Liu SLi YXie XMa WMeng GLiu Y(2024)Automated Commit Intelligence by Pre-trainingACM Transactions on Software Engineering and Methodology10.1145/3674731Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3674731
Hou YJin WWang ZWang LChen SWang YSang LWang HLiu T(2024)ERD-CQC : Enhanced Rule and Dependency Code Quality Check for JavaProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674820(377-386)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3674820
Liu PLin BQin YWeng CChen L(2024)T-RAP: A Template-guided Retrieval-Augmented Vulnerability Patch Generation ApproachProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3672506(105-114)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3672506
Zou WShen ZGe JLi CLuo B(2024)CCAF: Learning Code Change via AdapterFusionProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671399(219-228)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3671399
Vijayvergiya MSalawa MBudiselić IZheng DLamblin PIvanković MCarin JLewko MAndonov JPetrović GTarlow DManiatis PJust RAdams BZimmermann TOzkaya ILin DZhang J(2024)AI-Assisted Assessment of Coding Practices in Modern Code ReviewProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3665664(85-93)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3664646.3665664
Olewicki DHabchi SAdams B(2024)An Empirical Study on Code Review Activity Prediction and Its Impact in PracticeProceedings of the ACM on Software Engineering10.1145/36608061:FSE(2238-2260)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660806
Ben Sghaier OSahraoui H(2024)Improving the Learning of Code Review Successive Tasks with Cross-Task Knowledge DistillationProceedings of the ACM on Software Engineering10.1145/36437751:FSE(1086-1106)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643775
Zhou XKim KXu BHan DLo DRoychoudhury APaiva AAbreu RStorey M(2024)Out of Sight, Out of Mind: Better Automatic Vulnerability Repair by Broadening Input Ranges and SourcesProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639222(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639222
Kazemi FLamothe MMcIntosh S(2024)Characterizing the Prevalence, Distribution, and Duration of Stale Reviewer RecommendationsIEEE Transactions on Software Engineering10.1109/TSE.2024.342236950:8(2096-2109)Online publication date: Aug-2024
https://doi.org/10.1109/TSE.2024.3422369
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents