skip to main content
10.1145/3510003.3510621acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Using pre-trained models to boost code review automation

Published: 05 July 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Code review is a practice widely adopted in open source and industrial projects. Given the non-negligible cost of such a process, researchers started investigating the possibility of automating specific code review tasks. We recently proposed Deep Learning (DL) models targeting the automation of two tasks: the first model takes as input a code submitted for review and implements in it changes likely to be recommended by a reviewer; the second takes as input the submitted code and a reviewer comment posted in natural language and automatically implements the change required by the reviewer. While the preliminary results we achieved are encouraging, both models had been tested in rather simple code review scenarios, substantially simplifying the targeted problem. This was also due to the choices we made when designing both the technique and the experiments. In this paper, we build on top of that work by demonstrating that a pre-trained Text-To-Text Transfer Transformer (T5) model can outperform previous DL models for automating code review tasks. Also, we conducted our experiments on a larger and more realistic (and challenging) dataset of code review activities.

    References

    [1]
    [n.d.]. Gerrit. https://www.gerritcodereview.com/.
    [2]
    [n.d.]. GitHub. https://github.com/.
    [3]
    [n.d.]. langdetect. https://pypi.org/project/langdetect/.
    [4]
    [n.d.]. Lizard. https://github.com/terryyin/lizard/.
    [5]
    [n.d.]. MSR mining platform. https://seart-ghs.si.usi.ch.
    [6]
    [n.d.]. pycld3. https://pypi.org/project/pycld3/.
    [7]
    [n.d.]. Stack Exchange Dumps. https://archive.org/details/stackexchange.
    [8]
    2021. Replication Package. https://github.com/RosaliaTufano/code_review_automation.
    [9]
    Wisam Haitham Abbood Al-Zubaidi, Patanamon Thongtanunam, Hoa Khanh Dam, Chakkrit Tantithamthavorn, and Aditya Ghose. 2020. Workload-aware reviewer recommendation using a multi-objective search-based approach. In Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering. 21--30.
    [10]
    Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and challenges of modern code review. In Proceedings of the 2013 international conference on software engineering. IEEE Press, 712--721.
    [11]
    Vipin Balachandran. 2013. Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation. In 2013 35th International Conference on Software Engineering (ICSE). 931--940.
    [12]
    Mike Barnett, Christian Bird, João Brunet, and Shuvendu K. Lahiri. 2015. Helping Developers Help Themselves: Automatic Decomposition of Code Review Change-sets. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE '15). 134--144.
    [13]
    Tobias Baum and Kurt Schneider. 2016. On the need for a new generation of code review tools. In International Conference on Product-Focused Software Process Improvement. Springer, 301--308.
    [14]
    Tobias Baum, Kurt Schneider, and Alberto Bacchelli. 2017. On the optimal order of reading source code changes for review. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 329--340.
    [15]
    Gabriele Bavota and Barbara Russo. 2015. Four eyes are better than two: On the impact of code reviews on software quality. In IEEE International Conference on Software Maintenance and Evolution, (ICSME). 81--90.
    [16]
    A. Bosu and J. C. Carver. 2013. Impact of Peer Code Review on Peer Impression Formation: A Survey. In 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement. 133--142.
    [17]
    Saikat Chakraborty and Baishakhi Ray. 2021. On Multi-Modal Learning of Editing Source Code. arXiv:2108.06645 [cs.SE]
    [18]
    Moataz Chouchen, Ali Ouni, Mohamed Wiem Mkaouer, Raula Gaikovina Kula, and Katsuro Inoue. 2021. WhoReview: A multi-objective search-based approach for code reviewers recommendation in modern code review. Applied Soft Computing 100 (2021), 106908.
    [19]
    Ozren Dabic, Emad Aghajani, and Gabriele Bavota. 2021. Sampling Projects in GitHub for MSR Studies. In 18th IEEE/ACM International Conference on Mining Software Repositories, MSR 2021. IEEE, 560--564.
    [20]
    Nicole Davila and Ingrid Nunes. 2021. A systematic literature review and taxonomy of modern code review. Journal of Systems and Software (2021), 110951.
    [21]
    Marco di Biase, Magiel Bruntink, Arie van Deursen, and Alberto Bacchelli. 2019. The effects of change decomposition on code review---a controlled experiment. PeerJ Computer Science 5 (2019), e193.
    [22]
    Vincent J Hellendoorn, Jason Tsay, Manisha Mukherjee, and Martin Hirzel. 2021. Towards automating code review at scale. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1479--1482.
    [23]
    Thong Hoang, Hoa Khanh Dam, Yasutaka Kamei, David Lo, and Naoyasu Ubayashi. 2019. DeepJIT: an end-to-end deep learning framework for just-intime defect prediction. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 34--45.
    [24]
    Sture Holm. 1979. A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics (1979), 65--70.
    [25]
    Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. CoRR abs/1909.09436 (2019). http://arxiv.org/abs/1909.09436
    [26]
    Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. CoRR (2018). arXiv:1808.06226
    [27]
    Chris Lewis, Zhongpeng Lin, Caitlin Sadowski, Xiaoyan Zhu, Rong Ou, and E James Whitehead. 2013. Does bug prediction support human developers? findings from a google case study. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 372--381.
    [28]
    Antonio Mastropaolo, Simone Scalabrino, Nathan Cooper, David Nader Palacio, Denys Poshyvanyk, Rocco Oliveto, and Gabriele Bavota. 2021. Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 336--347.
    [29]
    Shane McIntosh, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan. 2014. The Impact of Code Review Coverage and Code Review Participation on Software Quality: A Case Study of the Qt, VTK, and ITK Projects. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR 2014). 192--201.
    [30]
    Quinn McNemar. 1947. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12, 2 (1947), 153--157.
    [31]
    Rodrigo Morales, Shane McIntosh, and Foutse Khomh. 2015. Do Code Review Practices Impact Design Quality? A Case Study of the Qt, VTK, and ITK Projects. In Proc. of the 22nd Int'l Conf. on Software Analysis, Evolution, and Reengineering (SANER). 171--180.
    [32]
    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL '02). 311--318.
    [33]
    Luca Pascarella, Fabio Palomba, and Alberto Bacchelli. 2019. Fine-grained justin-time defect prediction. Journal of Systems and Software 150 (2019), 22--36.
    [34]
    Luca Pascarella, Davide Spadini, Fabio Palomba, Magiel Bruntink, and Alberto Bacchelli. 2018. Information needs in contemporary code review. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1--27.
    [35]
    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1--67. http://jmlr.org/papers/v21/20-074.html
    [36]
    Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code Completion with Statistical Language Models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '14). ACM, 419--428.
    [37]
    Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. arXiv:2009.10297 [cs.SE]
    [38]
    Peter C. Rigby and Christian Bird. 2013. Convergent Contemporary Software Peer Review Practices. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2013). 202--212.
    [39]
    Peter C. Rigby, Daniel M. German, Laura Cowen, and Margaret-Anne Storey. 2014. Peer Review on Open-Source Software Projects: Parameters, Statistical Models, and Theory. ACM Trans. Softw. Eng. Methodol. 23, 4 (2014).
    [40]
    Caitlin Sadowski, Emma Söderberg, Luke Church, Michal Sipko, and Alberto Bacchelli. 2018. Modern Code Review: A Case Study at Google. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP '18). 181?190.
    [41]
    Shu-Ting Shi, Ming Li, David Lo, Ferdian Thung, and Xuan Huo. 2019. Automatic code review by learning the revision of source code. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4910--4917.
    [42]
    Devarshi Singh, Varun Ramachandra Sekar, Kathryn T Stolee, and Brittany Johnson. 2017. Evaluating how static analysis tools can reduce code review effort. In 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 101--105.
    [43]
    Davide Spadini, Fabio Palomba, Tobias Baum, Stefan Hanenberg, Magiel Bruntink, and Alberto Bacchelli. 2019. Test-driven code review: an empirical study. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 1061--1072.
    [44]
    Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, and Neel Sundaresan. 2020. Unit Test Case Generation with Transformers. CoRR abs/2009.05617 (2020). https://arxiv.org/abs/2009.05617
    [45]
    Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. 2019. An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation. ACM Trans. Softw. Eng. Methodol. 28, 4 (2019), 19:1--19:29.
    [46]
    Rosalia Tufano, Luca Pascarella, Michele Tufano, Denys Poshyvanyk, and Gabriele Bavota. 2021. Towards Automating Code Review Activities. In 43rd International Conference on Software Engineering, ICSE'21. https://arxiv.org/abs/2101.02518
    [47]
    Yuriy Tymchuk, Andrea Mocci, and Michele Lanza. 2015. Code review: Veni, vidi, vici. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 151--160.
    [48]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
    [49]
    Cody Watson, Nathan Cooper, David Nader Palacio, Kevin Moran, and Denys Poshyvanyk. 2021. A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research. ACM Transactions on Software Engineering and Methodology (2021).
    [50]
    Cody Watson, Michele Tufano, Kevin Moran, Gabriele Bavota, and Denys Poshyvanyk. 2020. On learning meaningful assert statements for unit test cases. In ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 1398--1409.
    [51]
    Supatsara Wattanakriengkrai, Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Hideaki Hata, and Kenichi Matsumoto. 2020. Predicting Defective Lines Using a Model-Agnostic Technique. CoRR (2020).

    Cited By

    View all
    • (2024)Automatically Recommend Code Updates: Are We There Yet?ACM Transactions on Software Engineering and Methodology10.1145/3678167Online publication date: 16-Jul-2024
    • (2024)Automated Commit Intelligence by Pre-trainingACM Transactions on Software Engineering and Methodology10.1145/3674731Online publication date: 1-Jul-2024
    • (2024)AI-Assisted Assessment of Coding Practices in Modern Code ReviewProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3665664(85-93)Online publication date: 10-Jul-2024
    • Show More Cited By

    Index Terms

    1. Using pre-trained models to boost code review automation

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICSE '22: Proceedings of the 44th International Conference on Software Engineering
      May 2022
      2508 pages
      ISBN:9781450392211
      DOI:10.1145/3510003
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      • IEEE CS

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 July 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. code review
      2. empirical study
      3. machine learning on code

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      ICSE '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 276 of 1,856 submissions, 15%

      Upcoming Conference

      ICSE 2025

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)462
      • Downloads (Last 6 weeks)40
      Reflects downloads up to 14 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Automatically Recommend Code Updates: Are We There Yet?ACM Transactions on Software Engineering and Methodology10.1145/3678167Online publication date: 16-Jul-2024
      • (2024)Automated Commit Intelligence by Pre-trainingACM Transactions on Software Engineering and Methodology10.1145/3674731Online publication date: 1-Jul-2024
      • (2024)AI-Assisted Assessment of Coding Practices in Modern Code ReviewProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3665664(85-93)Online publication date: 10-Jul-2024
      • (2024)LLM-Based Chatbots for Mining Software Repositories: Challenges and OpportunitiesProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661218(201-210)Online publication date: 18-Jun-2024
      • (2024)Automated categorization of pre-trained models in software engineering: A case study with a Hugging Face datasetProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661215(351-356)Online publication date: 18-Jun-2024
      • (2024)On the Use of ChatGPT for Code Review: Do Developers Like Reviews By ChatGPT?Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661183(375-380)Online publication date: 18-Jun-2024
      • (2024)Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based StudyProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644918(571-583)Online publication date: 15-Apr-2024
      • (2024)Improving Automated Code Reviews: Learning From ExperienceProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644910(278-283)Online publication date: 15-Apr-2024
      • (2024)On the Generalizability of Deep Learning-based Code Completion Across Programming Language VersionsProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644411(99-111)Online publication date: 15-Apr-2024
      • (2024)Towards Summarizing Code Snippets Using Pre-Trained TransformersProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644400(1-12)Online publication date: 15-Apr-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media