skip to main content
10.1145/3510003.3510067acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

AutoTransform: automated code transformation to support modern code review process

Published: 05 July 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Code review is effective, but human-intensive (e.g., developers need to manually modify source code until it is approved). Recently, prior work proposed a Neural Machine Translation (NMT) approach to automatically transform source code to the version that is reviewed and approved (i.e., the after version). Yet, its performance is still suboptimal when the after version has new identifiers or literals (e.g., renamed variables) or has many code tokens. To address these limitations, we propose AutoTransform which leverages a Byte-Pair Encoding (BPE) approach to handle new tokens and a Transformer-based NMT architecture to handle long sequences. We evaluate our approach based on 14,750 changed methods with and without new tokens for both small and medium sizes. The results show that when generating one candidate for the after version (i.e., beam width = 1), our AutoTransform can correctly transform 1,413 changed methods, which is 567% higher than the prior work, highlighting the substantial improvement of our approach for code transformation in the context of code review. This work contributes towards automated code transformation for code reviews, which could help developers reduce their effort in modifying source code during the code review process.

    References

    [1]
    [n.d.]. A library for subwod tokenization using Byte-Pair Encoding. https://github.com/rsennrich/subword-nmt.
    [2]
    [n.d.]. A list of Top-300 frequent identifier/literals for each of the studied datasets. https://sites.google.com/view/learning-codechanges/data#h.p_r-R_Z4sKJC2L.
    [3]
    [n.d.]. Android's Gerrit Code Review Repositories. https://android-review.googlesource.com/.
    [4]
    [n.d.]. AutoTransform's Replication Package. https://github.com/awsm-research/AutoTransform-Replication.
    [5]
    [n.d.]. Datasets of the paper titled "On Learning Meaningful Code Changes Via Neural Machine Translation". https://sites.google.com/view/learning-codechanges/data#h.p__6KdV38lN05N.
    [6]
    [n.d.]. Google's Gerrit Code Review Repositories. https://gerrit-review.googlesource.com/.
    [7]
    [n.d.]. Ovirt's Gerrit Code Review Repositories. https://gerrit.ovirt.org/.
    [8]
    [n.d.]. Seq2Seq: A library for RNN-based NMT models. https://google.github.io/seq2seq/.
    [9]
    [n.d.]. Src2Abs: A library for abstracting code with reusable IDs. https://github.com/micheletufano/src2abs.
    [10]
    [n.d.]. Tensor2Tensor: A library for Transfomer-based NMT models. https://github.com/tensorflow/tensor2tensor.
    [11]
    Alberto Bacchelli and Christian Bird. 2013. Expectations, Outcomes, and Challenges of Modern Code Review. In Proceedings of ICSE. 712--721.
    [12]
    Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of ICLR. 1--15.
    [13]
    Vipin Balachandran. 2013. Reducing Human Effort and Improving Quality in Peer Code Reviews using Automatic Static Analysis and Reviewer Recommendation. In Proceedings of ICSE. 931--940.
    [14]
    Tobias Baum, Kurt Schneider, and Alberto Bacchelli. 2019. Associating Working Memory Capacity and Code Change Ordering with Code Review Performance. EMSE 24, 4 (2019), 1762--1798.
    [15]
    Gabriele Bavota and Barbara Russo. 2015. Four Eyes Are Better Than Two: On The Impact of Code Reviews on Software Quality. In Proceedings of ICSME. 81--90.
    [16]
    Moritz Beller, Alberto Bacchelli, Andy Zaidman, and Elmar Juergens. 2014. Modern Code Reviews in Open-Source Projects: Which Problems do They Fix?. In Proceedings of MSR. 202--211.
    [17]
    Amiangshu Bosu, Michaela Greiler, and Christian Bird. 2015. Characteristics of Useful Code Reviews: An Empirical Study at Microsoft. In Proceedings of MSR. 146--156.
    [18]
    Denny Britz, Anna Goldie, Minh Thang Luong, and Quoc V. Le. 2017. Massive Exploration of Neural Machine Translation Architectures. In Proceedings of EMNLP. 1442--1451.
    [19]
    Kaibo Cao, Chunyang Chen, Sebastian Baltes, Christoph Treude, and Xiang Chen. 2021. Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow. In Proceedings of ICSE. 1273--1285.
    [20]
    Zimin Chen, Steve James Kommrusch, Michele Tufano, Louis-Noël Pouchet, Denys Poshyvanyk, and Martin Monperrus. 2019. Sequencer: Sequence-to-Sequence Learning for End-to-End Program Repair. TSE (2019).
    [21]
    Czerwonka, Jacek and Greiler, Michaela and Tilford, Jack. 2015. Code Reviews Do Not Find Bugs How the Current Code Review Best Practice Slows Us Down. In Proceedings of ICSE. 27--28.
    [22]
    Yangruibo Ding, Baishakhi Ray, Premkumar Devanbu, and Vincent J. Hellendoorn. 2020. Patching as Translation: The Data and the Metaphor. In Proceedings of ASE. 275--286.
    [23]
    Yuanrui Fan, Xin Xia, David Lo, and Shanping Li. 2018. Early Prediction of Merged Code Changes to Prioritize Reviewing Tasks. EMSE 23, 6 (2018), 3346--3393.
    [24]
    Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep Code Search. In Proceedings of ICSE. 933--944.
    [25]
    Sakib Haque, Alexander LeClair, Lingfei Wu, and Collin McMillan. 2020. Improved Automatic Summarization of Subroutines via Attention to File Context. In Proceedings of MSR. 300--310.
    [26]
    Vincent J Hellendoorn and Premkumar Devanbu. 2017. Are Deep Neural Networks the Best Choice for Modeling Source Code?. In Proceedings of FSE. 763--773.
    [27]
    Yang Hong, Chakkrit Tantithamthavorn, and Patanamon Thongtanunam. 2022. Where Should I Look at? Recommending Lines that Reviewers Should Pay Attention To. In Proceeding of SANER.
    [28]
    Nan Jiang, Thibaud Lutellier, and Lin Tan. 2021. CURE: Code-Aware Neural Machine Translation for Automatic Program Repair. In Proceedings of ICSE. 1161--1173.
    [29]
    Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically Generating Commit Messages from Diffs using Neural Machine Translation. In Proceedings of ASE). 135--146.
    [30]
    Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2013. A Large-Scale Empirical Study of Just-In-Time Quality Assurance. TSE 39, 6 (2013), 757--773.
    [31]
    Rafael Michael Karampatsis, Hlib Babii, Romain Robbes, Charles Sutton, and Andrea Janes. 2020. Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code. In Proceedings of ICSE). 1073--1085.
    [32]
    Oleksii Kononenko, Olga Baysal, and Michael W. Godfrey. 2016. Code Review Quality: How Developers See It. In Proceedings of ICSE. 1028--1038.
    [33]
    Yi Li, Shaohua Wang, and Tien N Nguyen. 2020. Dlfix: Context-based Code Transformation Learning for Automated Program Repair. In Proceedings of ICSE. 602--614.
    [34]
    Zhongxin Liu, Xin Xia, Ahmed E Hassan, David Lo, Zhenchang Xing, and Xinyu Wang. 2018. Neural-Machine-Translation-based Commit Message Generation: How Far are We?. In Proceedings of ASE. 373--384.
    [35]
    Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, et al. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. arXiv preprint arXiv:2102.04664 (2021).
    [36]
    Minh Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of EMNLP. 1412--1421.
    [37]
    Laura MacLeod, Michaela Greiler, Margaret-Anne Storey, Christian Bird, and Jacek Czerwonka. 2017. Code Reviewing in the Trenches: Challenges and Best Practices. IEEE Software 35, 4 (2017), 34--42.
    [38]
    Chandra Maddila, Chetan Bansal, and Nachiappan Nagappan. 2019. Predicting Pull Request Completion Time: A Case Study on Large Scale Cloud Services. In Proceedings of ESEC/FSE. 874--882.
    [39]
    Shane McIntosh and Yasutaka Kamei. 2017. Are Fix-Inducing Changes a Moving Target? A Longitudinal Case Study of Just-In-Time Defect Prediction. TSE (2017), 412--428.
    [40]
    Shane McIntosh, Yasutaka Kamei, Bram Adams, and Ahmed E Hassan. 2016. An Empirical Study of the Impact of Modern Code Review Practices on Software Quality. EMSE 21, 5 (2016), 2146--2189.
    [41]
    Rodrigo Morales, Shane McIntosh, and Foutse Khomh. 2015. Do Code Review Practices Impact Design Quality?: A Case Study of the Qt, VTK, and ITK Projects. In Proceedings of SANER. 171--180.
    [42]
    Thanh Nguyen, Peter C Rigby, Anh Tuan Nguyen, Mark Karanfil, and Tien N Nguyen. 2016. T2API: Synthesizing API Code Usage Templates from English Texts with Statistical Translation. In Proceedings of FSE. 1013--1017.
    [43]
    Ali Ouni, Raula Gaikovina Kula, and Katsuro Inoue. 2016. Search-based Peer Reviewers Recommendation in Modern Code Review. In Proceedings of ICSME. 367--377.
    [44]
    Chanathip Pornprasit and Chakkrit Tantithamthavorn. 2021. JITLine: A Simpler, Better, Faster, Finer-grained Just-In-Time Defect Prediction. In Proceedings of MSR. To Appear.
    [45]
    Chanathip Pornprasit and Chakkrit Tantithamthavorn. 2022. DeepLineDP: Towards a Deep Learning Approach for Line-Level Defect Prediction. IEEE Transactions on Software Engineering (2022).
    [46]
    Mohammad Masudur Rahman, Chanchal K Roy, and Jason A Collins. 2016. CORRECT: Code Reviewer Recommendation in GitHub based on Cross-Project and Technology Experience. In Proceedings of ICSE (Companion). 222--231.
    [47]
    Peter C Rigby and Christian Bird. 2013. Convergent Contemporary Software Peer Review Practices. In Proceedings of FSE. 202--212.
    [48]
    Baptiste Roziere, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lample. 2020. Unsupervised Translation of Programming Languages. In NeurIPS.
    [49]
    Shade Ruangwan, Patanamon Thongtanunam, Akinori Ihara, and Kenichi Matsumoto. 2018. The Impact of Human Factors on the Participation Decision of Reviewers in Modern Code Review. EMSE (2018), In press.
    [50]
    Caitlin Sadowski, Emma Söderberg, Luke Church, Michal Sipko, and Alberto Bacchelli. 2018. Modern Code Review: A Case Study at Google. In Proceedings of ICSE (Companion). 181--190.
    [51]
    Caitlin Sadowski, Jeffrey Van Gogh, Ciera Jaspan, Emma Söderberg, and Collin Winter. 2015. Tricorder: Building a program analysis ecosystem. In Proceeding of the International Conference on Software Engineering (ICSE). 598--608.
    [52]
    Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of ACL, Vol. 3. 1715--1725.
    [53]
    Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In NIPS. 3104--3112.
    [54]
    Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. IntelliCode Compose: Code Generation Using Transformer. In Proceedings of ESEC/FSE.
    [55]
    Yida Tao and Sunghun Kim. 2015. Partitioning Composite Code Changes to Facilitate Code Review. In Proceedings of MSR. 180--190.
    [56]
    Patanamon Thongtanunam, Raula Gaikovina Kula, Ana Erika Camargo Cruz, Norihiro Yoshida, and Hajimu Iida. 2014. Improving code review effectiveness through reviewer recommendations. In Proceedings of CHASE. 119--122.
    [57]
    Patanamon Thongtanunam, Shane McIntosh, Ahmed E. Hassan, and Hajimu Iida. 2015. Investigating Code Review Practices in Defective Files: An Empirical Study of the Qt System. In MSR. 168--179.
    [58]
    Patanamon Thongtanunam, Shane McIntosh, Ahmed E Hassan, and Hajimu Iida. 2017. Review Participation in Modern Code Review. EMSE 22, 2 (2017), 768--817.
    [59]
    Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Raula Gaikovina Kula, Norihiro Yoshida, Hajimu Iida, and Ken-ichi Matsumoto. 2015. Who Should Review My Code? A File Location-based Code-reviewer Recommendation Approach for Modern Code Review. In Proceedings of SANER. 141--150.
    [60]
    Michele Tufano, Jevgenija Pantiuchina, Cody Watson, Gabriele Bavota, and Denys Poshyvanyk. 2019. On Learning Meaningful Code Changes Via Neural Machine Translation. In Proceedings of ICSE. 25--36.
    [61]
    Rosalia Tufano, Luca Pascarella, Michele Tufano, Denys Poshyvanykz, and Gabriele Bavota. 2021. Towards Automating Code Review Activities. In Proceedings of ICSE. To appear.
    [62]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In NIPS. 5999--6009.
    [63]
    Dong Wang, Tao Xiao, Patanamon Thongtanunam, Raula Gaikovina Kula, and Kenichi Matsumoto. 2021. Understanding Shared Links and Their Intentions to Meet Information Needs in Modern Code Review. In EMSE. to appear.
    [64]
    Min Wang, Zeqi Lin, Yanzhen Zou, and Bing Xie. 2019. CORA: Decomposing and Describing Tangled Code Changes for Reviewer. In Proceedings of ASE. 1050--1061.
    [65]
    Supatsara Wattanakriengkrai, Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Hideaki Hata, and Kenichi Matsumoto. 2020. Predicting defective lines using a model-agnostic technique. IEEE Transactions on Software Engineering (2020).
    [66]
    Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. 2020. Big Bird: Transformers for Longer Sequences. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 17283--17297.

    Cited By

    View all
    • (2024)AI-Assisted Programming Tasks Using Code Embeddings and TransformersElectronics10.3390/electronics1304076713:4(767)Online publication date: 15-Feb-2024
    • (2024)Automatically Recommend Code Updates: Are We There Yet?ACM Transactions on Software Engineering and Methodology10.1145/3678167Online publication date: 16-Jul-2024
    • (2024)ERD-CQC : Enhanced Rule and Dependency Code Quality Check for JavaProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674820(377-386)Online publication date: 24-Jul-2024
    • Show More Cited By

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICSE '22: Proceedings of the 44th International Conference on Software Engineering
    May 2022
    2508 pages
    ISBN:9781450392211
    DOI:10.1145/3510003
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 July 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICSE '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 276 of 1,856 submissions, 15%

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)224
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 14 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)AI-Assisted Programming Tasks Using Code Embeddings and TransformersElectronics10.3390/electronics1304076713:4(767)Online publication date: 15-Feb-2024
    • (2024)Automatically Recommend Code Updates: Are We There Yet?ACM Transactions on Software Engineering and Methodology10.1145/3678167Online publication date: 16-Jul-2024
    • (2024)ERD-CQC : Enhanced Rule and Dependency Code Quality Check for JavaProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674820(377-386)Online publication date: 24-Jul-2024
    • (2024)AI-Assisted Assessment of Coding Practices in Modern Code ReviewProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3665664(85-93)Online publication date: 10-Jul-2024
    • (2024)Deep Domain Adaptation With Max-Margin Principle for Cross-Project Imbalanced Software Vulnerability DetectionACM Transactions on Software Engineering and Methodology10.1145/366460233:6(1-34)Online publication date: 27-Jun-2024
    • (2024)On the Use of ChatGPT for Code Review: Do Developers Like Reviews By ChatGPT?Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661183(375-380)Online publication date: 18-Jun-2024
    • (2024)Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based StudyProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644918(571-583)Online publication date: 15-Apr-2024
    • (2024)Improving Automated Code Reviews: Learning From ExperienceProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644910(278-283)Online publication date: 15-Apr-2024
    • (2024)On the Reliability and Explainability of Language Models for Program GenerationACM Transactions on Software Engineering and Methodology10.1145/364154033:5(1-26)Online publication date: 3-Jun-2024
    • (2024)Vision Transformer Inspired Automated Vulnerability RepairACM Transactions on Software Engineering and Methodology10.1145/363274633:3(1-29)Online publication date: 15-Mar-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media