research-article

Using pre-trained models to boost code review automation

Authors:

Rosalia Tufano,

Simone Masiero,

Antonio Mastropaolo,

Luca Pascarella,

Denys Poshyvanyk,

Gabriele BavotaAuthors Info & Claims

ICSE '22: Proceedings of the 44th International Conference on Software Engineering

Pages 2291 - 2302

https://doi.org/10.1145/3510003.3510621

Published: 05 July 2022 Publication History

Abstract

Code review is a practice widely adopted in open source and industrial projects. Given the non-negligible cost of such a process, researchers started investigating the possibility of automating specific code review tasks. We recently proposed Deep Learning (DL) models targeting the automation of two tasks: the first model takes as input a code submitted for review and implements in it changes likely to be recommended by a reviewer; the second takes as input the submitted code and a reviewer comment posted in natural language and automatically implements the change required by the reviewer. While the preliminary results we achieved are encouraging, both models had been tested in rather simple code review scenarios, substantially simplifying the targeted problem. This was also due to the choices we made when designing both the technique and the experiments. In this paper, we build on top of that work by demonstrating that a pre-trained Text-To-Text Transfer Transformer (T5) model can outperform previous DL models for automating code review tasks. Also, we conducted our experiments on a larger and more realistic (and challenging) dataset of code review activities.

References

[1]

[n.d.]. Gerrit. https://www.gerritcodereview.com/.

[2]

[n.d.]. GitHub. https://github.com/.

[3]

[n.d.]. langdetect. https://pypi.org/project/langdetect/.

[4]

[n.d.]. Lizard. https://github.com/terryyin/lizard/.

[5]

[n.d.]. MSR mining platform. https://seart-ghs.si.usi.ch.

[6]

[n.d.]. pycld3. https://pypi.org/project/pycld3/.

[7]

[n.d.]. Stack Exchange Dumps. https://archive.org/details/stackexchange.

[8]

2021. Replication Package. https://github.com/RosaliaTufano/code_review_automation.

[9]

Wisam Haitham Abbood Al-Zubaidi, Patanamon Thongtanunam, Hoa Khanh Dam, Chakkrit Tantithamthavorn, and Aditya Ghose. 2020. Workload-aware reviewer recommendation using a multi-objective search-based approach. In Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering. 21--30.

Digital Library

[10]

Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and challenges of modern code review. In Proceedings of the 2013 international conference on software engineering. IEEE Press, 712--721.

Digital Library

[11]

Vipin Balachandran. 2013. Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation. In 2013 35th International Conference on Software Engineering (ICSE). 931--940.

[12]

Mike Barnett, Christian Bird, João Brunet, and Shuvendu K. Lahiri. 2015. Helping Developers Help Themselves: Automatic Decomposition of Code Review Change-sets. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE '15). 134--144.

Digital Library

[13]

Tobias Baum and Kurt Schneider. 2016. On the need for a new generation of code review tools. In International Conference on Product-Focused Software Process Improvement. Springer, 301--308.

[14]

Tobias Baum, Kurt Schneider, and Alberto Bacchelli. 2017. On the optimal order of reading source code changes for review. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 329--340.

[15]

Gabriele Bavota and Barbara Russo. 2015. Four eyes are better than two: On the impact of code reviews on software quality. In IEEE International Conference on Software Maintenance and Evolution, (ICSME). 81--90.

Digital Library

[16]

A. Bosu and J. C. Carver. 2013. Impact of Peer Code Review on Peer Impression Formation: A Survey. In 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement. 133--142.

[17]

Saikat Chakraborty and Baishakhi Ray. 2021. On Multi-Modal Learning of Editing Source Code. arXiv:2108.06645 [cs.SE]

[18]

Moataz Chouchen, Ali Ouni, Mohamed Wiem Mkaouer, Raula Gaikovina Kula, and Katsuro Inoue. 2021. WhoReview: A multi-objective search-based approach for code reviewers recommendation in modern code review. Applied Soft Computing 100 (2021), 106908.

[19]

Ozren Dabic, Emad Aghajani, and Gabriele Bavota. 2021. Sampling Projects in GitHub for MSR Studies. In 18th IEEE/ACM International Conference on Mining Software Repositories, MSR 2021. IEEE, 560--564.

[20]

Nicole Davila and Ingrid Nunes. 2021. A systematic literature review and taxonomy of modern code review. Journal of Systems and Software (2021), 110951.

[21]

Marco di Biase, Magiel Bruntink, Arie van Deursen, and Alberto Bacchelli. 2019. The effects of change decomposition on code review---a controlled experiment. PeerJ Computer Science 5 (2019), e193.

[22]

Vincent J Hellendoorn, Jason Tsay, Manisha Mukherjee, and Martin Hirzel. 2021. Towards automating code review at scale. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1479--1482.

Digital Library

[23]

Thong Hoang, Hoa Khanh Dam, Yasutaka Kamei, David Lo, and Naoyasu Ubayashi. 2019. DeepJIT: an end-to-end deep learning framework for just-intime defect prediction. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 34--45.

Digital Library

[24]

Sture Holm. 1979. A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics (1979), 65--70.

[25]

Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. CoRR abs/1909.09436 (2019). http://arxiv.org/abs/1909.09436

[26]

Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. CoRR (2018). arXiv:1808.06226

[27]

Chris Lewis, Zhongpeng Lin, Caitlin Sadowski, Xiaoyan Zhu, Rong Ou, and E James Whitehead. 2013. Does bug prediction support human developers? findings from a google case study. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 372--381.

[28]

Antonio Mastropaolo, Simone Scalabrino, Nathan Cooper, David Nader Palacio, Denys Poshyvanyk, Rocco Oliveto, and Gabriele Bavota. 2021. Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 336--347.

Digital Library

[29]

Shane McIntosh, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan. 2014. The Impact of Code Review Coverage and Code Review Participation on Software Quality: A Case Study of the Qt, VTK, and ITK Projects. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR 2014). 192--201.

Digital Library

[30]

Quinn McNemar. 1947. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12, 2 (1947), 153--157.

[31]

Rodrigo Morales, Shane McIntosh, and Foutse Khomh. 2015. Do Code Review Practices Impact Design Quality? A Case Study of the Qt, VTK, and ITK Projects. In Proc. of the 22nd Int'l Conf. on Software Analysis, Evolution, and Reengineering (SANER). 171--180.

[32]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL '02). 311--318.

Digital Library

[33]

Luca Pascarella, Fabio Palomba, and Alberto Bacchelli. 2019. Fine-grained justin-time defect prediction. Journal of Systems and Software 150 (2019), 22--36.

[34]

Luca Pascarella, Davide Spadini, Fabio Palomba, Magiel Bruntink, and Alberto Bacchelli. 2018. Information needs in contemporary code review. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1--27.

Digital Library

[35]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1--67. http://jmlr.org/papers/v21/20-074.html

[36]

Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code Completion with Statistical Language Models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '14). ACM, 419--428.

Digital Library

[37]

Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. arXiv:2009.10297 [cs.SE]

[38]

Peter C. Rigby and Christian Bird. 2013. Convergent Contemporary Software Peer Review Practices. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2013). 202--212.

[39]

Peter C. Rigby, Daniel M. German, Laura Cowen, and Margaret-Anne Storey. 2014. Peer Review on Open-Source Software Projects: Parameters, Statistical Models, and Theory. ACM Trans. Softw. Eng. Methodol. 23, 4 (2014).

Digital Library

[40]

Caitlin Sadowski, Emma Söderberg, Luke Church, Michal Sipko, and Alberto Bacchelli. 2018. Modern Code Review: A Case Study at Google. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP '18). 181?190.

Digital Library

[41]

Shu-Ting Shi, Ming Li, David Lo, Ferdian Thung, and Xuan Huo. 2019. Automatic code review by learning the revision of source code. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4910--4917.

Digital Library

[42]

Devarshi Singh, Varun Ramachandra Sekar, Kathryn T Stolee, and Brittany Johnson. 2017. Evaluating how static analysis tools can reduce code review effort. In 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 101--105.

[43]

Davide Spadini, Fabio Palomba, Tobias Baum, Stefan Hanenberg, Magiel Bruntink, and Alberto Bacchelli. 2019. Test-driven code review: an empirical study. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 1061--1072.

Digital Library

[44]

Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, and Neel Sundaresan. 2020. Unit Test Case Generation with Transformers. CoRR abs/2009.05617 (2020). https://arxiv.org/abs/2009.05617

[45]

Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. 2019. An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation. ACM Trans. Softw. Eng. Methodol. 28, 4 (2019), 19:1--19:29.

Digital Library

[46]

Rosalia Tufano, Luca Pascarella, Michele Tufano, Denys Poshyvanyk, and Gabriele Bavota. 2021. Towards Automating Code Review Activities. In 43rd International Conference on Software Engineering, ICSE'21. https://arxiv.org/abs/2101.02518

[47]

Yuriy Tymchuk, Andrea Mocci, and Michele Lanza. 2015. Code review: Veni, vidi, vici. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 151--160.

[48]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.

[49]

Cody Watson, Nathan Cooper, David Nader Palacio, Kevin Moran, and Denys Poshyvanyk. 2021. A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research. ACM Transactions on Software Engineering and Methodology (2021).

[50]

Cody Watson, Michele Tufano, Kevin Moran, Gabriele Bavota, and Denys Poshyvanyk. 2020. On learning meaningful assert statements for unit test cases. In ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 1398--1409.

Digital Library

[51]

Supatsara Wattanakriengkrai, Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Hideaki Hata, and Kenichi Matsumoto. 2020. Predicting Defective Lines Using a Model-Agnostic Technique. CoRR (2020).

Cited By

Liu YTantithamthavorn CLiu YThongtanunam PLi L(2024)Automatically Recommend Code Updates: Are We There Yet?ACM Transactions on Software Engineering and Methodology10.1145/3678167Online publication date: 16-Jul-2024
https://doi.org/10.1145/3678167
Liu SLi YXie XMa WMeng GLiu Y(2024)Automated Commit Intelligence by Pre-trainingACM Transactions on Software Engineering and Methodology10.1145/3674731Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3674731
Vijayvergiya MSalawa MBudiselić IZheng DLamblin PIvanković MCarin JLewko MAndonov JPetrović GTarlow DManiatis PJust RAdams BZimmermann TOzkaya ILin DZhang J(2024)AI-Assisted Assessment of Coding Practices in Modern Code ReviewProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3665664(85-93)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3664646.3665664
Show More Cited By

Index Terms

Using pre-trained models to boost code review automation
1. Software and its engineering
  1. Software notations and tools
    1. Software maintenance tools

Recommendations

Automating code review activities by large-scale pre-training
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Code review is an essential part to software development lifecycle since it aims at guaranteeing the quality of codes. Modern code review activities necessitate developers viewing, understanding and even running the programs to assess logic, ...
Code smells detection via modern code review: a study of the OpenStack and Qt communities
Abstract
Code review plays an important role in software quality control. A typical review process involves a careful check of a piece of code in an attempt to detect and locate defects and other quality issues/violations. One type of issue that may impact ...
Understanding code snippets in code reviews: a preliminary study of the OpenStack community
ICPC '22: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension

Code review is a mature practice for software quality assurance in software development with which reviewers check the code that has been committed by developers, and verify the quality of code. During the code review discussions, reviewers and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '22: Proceedings of the 44th International Conference on Software Engineering

May 2022

2508 pages

ISBN:9781450392211

DOI:10.1145/3510003

General Chair:
Matthew B Dwyer
University of Virginia
,
Program Chairs:
Daniela Damian
University of Victoria, Canada
,
Andreas Zeller
CISPA, Germany

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

European Research Council

Conference

ICSE '22

Sponsor:

SIGSOFT

ICSE '22: 44th International Conference on Software Engineering

May 21 - 29, 2022

Pennsylvania, Pittsburgh

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

60
Total Citations
View Citations
730
Total Downloads

Downloads (Last 12 months)462
Downloads (Last 6 weeks)40

Reflects downloads up to 14 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu YTantithamthavorn CLiu YThongtanunam PLi L(2024)Automatically Recommend Code Updates: Are We There Yet?ACM Transactions on Software Engineering and Methodology10.1145/3678167Online publication date: 16-Jul-2024
https://doi.org/10.1145/3678167
Liu SLi YXie XMa WMeng GLiu Y(2024)Automated Commit Intelligence by Pre-trainingACM Transactions on Software Engineering and Methodology10.1145/3674731Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3674731
Vijayvergiya MSalawa MBudiselić IZheng DLamblin PIvanković MCarin JLewko MAndonov JPetrović GTarlow DManiatis PJust RAdams BZimmermann TOzkaya ILin DZhang J(2024)AI-Assisted Assessment of Coding Practices in Modern Code ReviewProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3665664(85-93)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3664646.3665664
Abedu SAbdellatif AShihab E(2024)LLM-Based Chatbots for Mining Software Repositories: Challenges and OpportunitiesProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661218(201-210)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3661167.3661218
Di Sipio CRubei RDi Rocco JDi Ruscio DNguyen P(2024)Automated categorization of pre-trained models in software engineering: A case study with a Hugging Face datasetProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661215(351-356)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3661167.3661215
Watanabe MKashiwa YLin BHirao TYamaguchi KIida H(2024)On the Use of ChatGPT for Code Review: Do Developers Like Reviews By ChatGPT?Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661183(375-380)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3661167.3661183
Tufano RMastropaolo APepe FDabic ODi Penta MBavota GSpinellis DConstantinou EBacchelli A(2024)Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based StudyProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644918(571-583)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644918
Lin HThongtanunam PTreude CCharoenwet WSpinellis DConstantinou EBacchelli A(2024)Improving Automated Code Reviews: Learning From ExperienceProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644910(278-283)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644910
Ciniselli MMartin-Lopez ABavota GBaysal OLinares-Vasquez MMoran KSteinmacher I(2024)On the Generalizability of Deep Learning-based Code Completion Across Programming Language VersionsProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644411(99-111)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643916.3644411
Mastropaolo ACiniselli MPascarella LTufano RAghajani EBavota GBaysal OLinares-Vasquez MMoran KSteinmacher I(2024)Towards Summarizing Code Snippets Using Pre-Trained TransformersProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644400(1-12)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643916.3644400
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents