research-article

AutoTransform: automated code transformation to support modern code review process

Authors:

Patanamon Thongtanunam,

Chanathip Pornprasit,

Chakkrit TantithamthavornAuthors Info & Claims

ICSE '22: Proceedings of the 44th International Conference on Software Engineering

Pages 237 - 248

https://doi.org/10.1145/3510003.3510067

Published: 05 July 2022 Publication History

Abstract

Code review is effective, but human-intensive (e.g., developers need to manually modify source code until it is approved). Recently, prior work proposed a Neural Machine Translation (NMT) approach to automatically transform source code to the version that is reviewed and approved (i.e., the after version). Yet, its performance is still suboptimal when the after version has new identifiers or literals (e.g., renamed variables) or has many code tokens. To address these limitations, we propose AutoTransform which leverages a Byte-Pair Encoding (BPE) approach to handle new tokens and a Transformer-based NMT architecture to handle long sequences. We evaluate our approach based on 14,750 changed methods with and without new tokens for both small and medium sizes. The results show that when generating one candidate for the after version (i.e., beam width = 1), our AutoTransform can correctly transform 1,413 changed methods, which is 567% higher than the prior work, highlighting the substantial improvement of our approach for code transformation in the context of code review. This work contributes towards automated code transformation for code reviews, which could help developers reduce their effort in modifying source code during the code review process.

References

[1]

[n.d.]. A library for subwod tokenization using Byte-Pair Encoding. https://github.com/rsennrich/subword-nmt.

[2]

[n.d.]. A list of Top-300 frequent identifier/literals for each of the studied datasets. https://sites.google.com/view/learning-codechanges/data#h.p_r-R_Z4sKJC2L.

[3]

[n.d.]. Android's Gerrit Code Review Repositories. https://android-review.googlesource.com/.

[4]

[n.d.]. AutoTransform's Replication Package. https://github.com/awsm-research/AutoTransform-Replication.

[5]

[n.d.]. Datasets of the paper titled "On Learning Meaningful Code Changes Via Neural Machine Translation". https://sites.google.com/view/learning-codechanges/data#h.p__6KdV38lN05N.

[6]

[n.d.]. Google's Gerrit Code Review Repositories. https://gerrit-review.googlesource.com/.

[7]

[n.d.]. Ovirt's Gerrit Code Review Repositories. https://gerrit.ovirt.org/.

[8]

[n.d.]. Seq2Seq: A library for RNN-based NMT models. https://google.github.io/seq2seq/.

[9]

[n.d.]. Src2Abs: A library for abstracting code with reusable IDs. https://github.com/micheletufano/src2abs.

[10]

[n.d.]. Tensor2Tensor: A library for Transfomer-based NMT models. https://github.com/tensorflow/tensor2tensor.

[11]

Alberto Bacchelli and Christian Bird. 2013. Expectations, Outcomes, and Challenges of Modern Code Review. In Proceedings of ICSE. 712--721.

[12]

Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of ICLR. 1--15.

[13]

Vipin Balachandran. 2013. Reducing Human Effort and Improving Quality in Peer Code Reviews using Automatic Static Analysis and Reviewer Recommendation. In Proceedings of ICSE. 931--940.

[14]

Tobias Baum, Kurt Schneider, and Alberto Bacchelli. 2019. Associating Working Memory Capacity and Code Change Ordering with Code Review Performance. EMSE 24, 4 (2019), 1762--1798.

Digital Library

[15]

Gabriele Bavota and Barbara Russo. 2015. Four Eyes Are Better Than Two: On The Impact of Code Reviews on Software Quality. In Proceedings of ICSME. 81--90.

Digital Library

[16]

Moritz Beller, Alberto Bacchelli, Andy Zaidman, and Elmar Juergens. 2014. Modern Code Reviews in Open-Source Projects: Which Problems do They Fix?. In Proceedings of MSR. 202--211.

Digital Library

[17]

Amiangshu Bosu, Michaela Greiler, and Christian Bird. 2015. Characteristics of Useful Code Reviews: An Empirical Study at Microsoft. In Proceedings of MSR. 146--156.

[18]

Denny Britz, Anna Goldie, Minh Thang Luong, and Quoc V. Le. 2017. Massive Exploration of Neural Machine Translation Architectures. In Proceedings of EMNLP. 1442--1451.

[19]

Kaibo Cao, Chunyang Chen, Sebastian Baltes, Christoph Treude, and Xiang Chen. 2021. Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow. In Proceedings of ICSE. 1273--1285.

Digital Library

[20]

Zimin Chen, Steve James Kommrusch, Michele Tufano, Louis-Noël Pouchet, Denys Poshyvanyk, and Martin Monperrus. 2019. Sequencer: Sequence-to-Sequence Learning for End-to-End Program Repair. TSE (2019).

[21]

Czerwonka, Jacek and Greiler, Michaela and Tilford, Jack. 2015. Code Reviews Do Not Find Bugs How the Current Code Review Best Practice Slows Us Down. In Proceedings of ICSE. 27--28.

[22]

Yangruibo Ding, Baishakhi Ray, Premkumar Devanbu, and Vincent J. Hellendoorn. 2020. Patching as Translation: The Data and the Metaphor. In Proceedings of ASE. 275--286.

[23]

Yuanrui Fan, Xin Xia, David Lo, and Shanping Li. 2018. Early Prediction of Merged Code Changes to Prioritize Reviewing Tasks. EMSE 23, 6 (2018), 3346--3393.

Digital Library

[24]

Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep Code Search. In Proceedings of ICSE. 933--944.

Digital Library

[25]

Sakib Haque, Alexander LeClair, Lingfei Wu, and Collin McMillan. 2020. Improved Automatic Summarization of Subroutines via Attention to File Context. In Proceedings of MSR. 300--310.

Digital Library

[26]

Vincent J Hellendoorn and Premkumar Devanbu. 2017. Are Deep Neural Networks the Best Choice for Modeling Source Code?. In Proceedings of FSE. 763--773.

Digital Library

[27]

Yang Hong, Chakkrit Tantithamthavorn, and Patanamon Thongtanunam. 2022. Where Should I Look at? Recommending Lines that Reviewers Should Pay Attention To. In Proceeding of SANER.

[28]

Nan Jiang, Thibaud Lutellier, and Lin Tan. 2021. CURE: Code-Aware Neural Machine Translation for Automatic Program Repair. In Proceedings of ICSE. 1161--1173.

Digital Library

[29]

Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically Generating Commit Messages from Diffs using Neural Machine Translation. In Proceedings of ASE). 135--146.

[30]

Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2013. A Large-Scale Empirical Study of Just-In-Time Quality Assurance. TSE 39, 6 (2013), 757--773.

Digital Library

[31]

Rafael Michael Karampatsis, Hlib Babii, Romain Robbes, Charles Sutton, and Andrea Janes. 2020. Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code. In Proceedings of ICSE). 1073--1085.

Digital Library

[32]

Oleksii Kononenko, Olga Baysal, and Michael W. Godfrey. 2016. Code Review Quality: How Developers See It. In Proceedings of ICSE. 1028--1038.

[33]

Yi Li, Shaohua Wang, and Tien N Nguyen. 2020. Dlfix: Context-based Code Transformation Learning for Automated Program Repair. In Proceedings of ICSE. 602--614.

Digital Library

[34]

Zhongxin Liu, Xin Xia, Ahmed E Hassan, David Lo, Zhenchang Xing, and Xinyu Wang. 2018. Neural-Machine-Translation-based Commit Message Generation: How Far are We?. In Proceedings of ASE. 373--384.

Digital Library

[35]

Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, et al. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. arXiv preprint arXiv:2102.04664 (2021).

[36]

Minh Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of EMNLP. 1412--1421.

[37]

Laura MacLeod, Michaela Greiler, Margaret-Anne Storey, Christian Bird, and Jacek Czerwonka. 2017. Code Reviewing in the Trenches: Challenges and Best Practices. IEEE Software 35, 4 (2017), 34--42.

Digital Library

[38]

Chandra Maddila, Chetan Bansal, and Nachiappan Nagappan. 2019. Predicting Pull Request Completion Time: A Case Study on Large Scale Cloud Services. In Proceedings of ESEC/FSE. 874--882.

Digital Library

[39]

Shane McIntosh and Yasutaka Kamei. 2017. Are Fix-Inducing Changes a Moving Target? A Longitudinal Case Study of Just-In-Time Defect Prediction. TSE (2017), 412--428.

[40]

Shane McIntosh, Yasutaka Kamei, Bram Adams, and Ahmed E Hassan. 2016. An Empirical Study of the Impact of Modern Code Review Practices on Software Quality. EMSE 21, 5 (2016), 2146--2189.

Digital Library

[41]

Rodrigo Morales, Shane McIntosh, and Foutse Khomh. 2015. Do Code Review Practices Impact Design Quality?: A Case Study of the Qt, VTK, and ITK Projects. In Proceedings of SANER. 171--180.

[42]

Thanh Nguyen, Peter C Rigby, Anh Tuan Nguyen, Mark Karanfil, and Tien N Nguyen. 2016. T2API: Synthesizing API Code Usage Templates from English Texts with Statistical Translation. In Proceedings of FSE. 1013--1017.

Digital Library

[43]

Ali Ouni, Raula Gaikovina Kula, and Katsuro Inoue. 2016. Search-based Peer Reviewers Recommendation in Modern Code Review. In Proceedings of ICSME. 367--377.

[44]

Chanathip Pornprasit and Chakkrit Tantithamthavorn. 2021. JITLine: A Simpler, Better, Faster, Finer-grained Just-In-Time Defect Prediction. In Proceedings of MSR. To Appear.

[45]

Chanathip Pornprasit and Chakkrit Tantithamthavorn. 2022. DeepLineDP: Towards a Deep Learning Approach for Line-Level Defect Prediction. IEEE Transactions on Software Engineering (2022).

[46]

Mohammad Masudur Rahman, Chanchal K Roy, and Jason A Collins. 2016. CORRECT: Code Reviewer Recommendation in GitHub based on Cross-Project and Technology Experience. In Proceedings of ICSE (Companion). 222--231.

Digital Library

[47]

Peter C Rigby and Christian Bird. 2013. Convergent Contemporary Software Peer Review Practices. In Proceedings of FSE. 202--212.

Digital Library

[48]

Baptiste Roziere, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lample. 2020. Unsupervised Translation of Programming Languages. In NeurIPS.

[49]

Shade Ruangwan, Patanamon Thongtanunam, Akinori Ihara, and Kenichi Matsumoto. 2018. The Impact of Human Factors on the Participation Decision of Reviewers in Modern Code Review. EMSE (2018), In press.

[50]

Caitlin Sadowski, Emma Söderberg, Luke Church, Michal Sipko, and Alberto Bacchelli. 2018. Modern Code Review: A Case Study at Google. In Proceedings of ICSE (Companion). 181--190.

Digital Library

[51]

Caitlin Sadowski, Jeffrey Van Gogh, Ciera Jaspan, Emma Söderberg, and Collin Winter. 2015. Tricorder: Building a program analysis ecosystem. In Proceeding of the International Conference on Software Engineering (ICSE). 598--608.

[52]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of ACL, Vol. 3. 1715--1725.

[53]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In NIPS. 3104--3112.

Digital Library

[54]

Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. IntelliCode Compose: Code Generation Using Transformer. In Proceedings of ESEC/FSE.

Digital Library

[55]

Yida Tao and Sunghun Kim. 2015. Partitioning Composite Code Changes to Facilitate Code Review. In Proceedings of MSR. 180--190.

[56]

Patanamon Thongtanunam, Raula Gaikovina Kula, Ana Erika Camargo Cruz, Norihiro Yoshida, and Hajimu Iida. 2014. Improving code review effectiveness through reviewer recommendations. In Proceedings of CHASE. 119--122.

Digital Library

[57]

Patanamon Thongtanunam, Shane McIntosh, Ahmed E. Hassan, and Hajimu Iida. 2015. Investigating Code Review Practices in Defective Files: An Empirical Study of the Qt System. In MSR. 168--179.

[58]

Patanamon Thongtanunam, Shane McIntosh, Ahmed E Hassan, and Hajimu Iida. 2017. Review Participation in Modern Code Review. EMSE 22, 2 (2017), 768--817.

Digital Library

[59]

Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Raula Gaikovina Kula, Norihiro Yoshida, Hajimu Iida, and Ken-ichi Matsumoto. 2015. Who Should Review My Code? A File Location-based Code-reviewer Recommendation Approach for Modern Code Review. In Proceedings of SANER. 141--150.

[60]

Michele Tufano, Jevgenija Pantiuchina, Cody Watson, Gabriele Bavota, and Denys Poshyvanyk. 2019. On Learning Meaningful Code Changes Via Neural Machine Translation. In Proceedings of ICSE. 25--36.

Digital Library

[61]

Rosalia Tufano, Luca Pascarella, Michele Tufano, Denys Poshyvanykz, and Gabriele Bavota. 2021. Towards Automating Code Review Activities. In Proceedings of ICSE. To appear.

Digital Library

[62]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In NIPS. 5999--6009.

[63]

Dong Wang, Tao Xiao, Patanamon Thongtanunam, Raula Gaikovina Kula, and Kenichi Matsumoto. 2021. Understanding Shared Links and Their Intentions to Meet Information Needs in Modern Code Review. In EMSE. to appear.

[64]

Min Wang, Zeqi Lin, Yanzhen Zou, and Bing Xie. 2019. CORA: Decomposing and Describing Tangled Code Changes for Reviewer. In Proceedings of ASE. 1050--1061.

Digital Library

[65]

Supatsara Wattanakriengkrai, Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Hideaki Hata, and Kenichi Matsumoto. 2020. Predicting defective lines using a model-agnostic technique. IEEE Transactions on Software Engineering (2020).

[66]

Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. 2020. Big Bird: Transformers for Longer Sequences. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 17283--17297.

Cited By

Kotsiantis SVerykios VTzagarakis M(2024)AI-Assisted Programming Tasks Using Code Embeddings and TransformersElectronics10.3390/electronics1304076713:4(767)Online publication date: 15-Feb-2024
https://doi.org/10.3390/electronics13040767
Liu YTantithamthavorn CLiu YThongtanunam PLi L(2024)Automatically Recommend Code Updates: Are We There Yet?ACM Transactions on Software Engineering and Methodology10.1145/3678167Online publication date: 16-Jul-2024
https://doi.org/10.1145/3678167
Hou YJin WWang ZWang LChen SWang YSang LWang HLiu T(2024)ERD-CQC : Enhanced Rule and Dependency Code Quality Check for JavaProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674820(377-386)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3674820
Show More Cited By

Recommendations

Using evolution patterns to find duplicated bugs
Bug localization via searching crowd-contributed code
Internetware '14: Proceedings of the 6th Asia-Pacific Symposium on Internetware

Bug localization, i.e., locating bugs in code snippets, is a frequent task in software development. Although static bug-finding tools are available to reduce manual effort in bug localization, these tools typically detect bugs with known project-...
Effective Bug Triage Based on Historical Bug-Fix Information
ISSRE '14: Proceedings of the 2014 IEEE 25th International Symposium on Software Reliability Engineering

For complex and popular software, project teams could receive a large number of bug reports. It is often tedious and costly to manually assign these bug reports to developers who have the expertise to fix the bugs. Many bug triage techniques have been ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '22: Proceedings of the 44th International Conference on Software Engineering

May 2022

2508 pages

ISBN:9781450392211

DOI:10.1145/3510003

General Chair:
Matthew B Dwyer
University of Virginia
,
Program Chairs:
Daniela Damian
University of Victoria, Canada
,
Andreas Zeller
CISPA, Germany

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Australian Research Council

Conference

ICSE '22

Sponsor:

SIGSOFT

ICSE '22: 44th International Conference on Software Engineering

May 21 - 29, 2022

Pennsylvania, Pittsburgh

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
433
Total Downloads

Downloads (Last 12 months)224
Downloads (Last 6 weeks)14

Reflects downloads up to 14 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kotsiantis SVerykios VTzagarakis M(2024)AI-Assisted Programming Tasks Using Code Embeddings and TransformersElectronics10.3390/electronics1304076713:4(767)Online publication date: 15-Feb-2024
https://doi.org/10.3390/electronics13040767
Liu YTantithamthavorn CLiu YThongtanunam PLi L(2024)Automatically Recommend Code Updates: Are We There Yet?ACM Transactions on Software Engineering and Methodology10.1145/3678167Online publication date: 16-Jul-2024
https://doi.org/10.1145/3678167
Hou YJin WWang ZWang LChen SWang YSang LWang HLiu T(2024)ERD-CQC : Enhanced Rule and Dependency Code Quality Check for JavaProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674820(377-386)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3674820
Vijayvergiya MSalawa MBudiselić IZheng DLamblin PIvanković MCarin JLewko MAndonov JPetrović GTarlow DManiatis PJust RAdams BZimmermann TOzkaya ILin DZhang J(2024)AI-Assisted Assessment of Coding Practices in Modern Code ReviewProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3665664(85-93)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3664646.3665664
Nguyen VLe TTantithamthavorn CGrundy JPhung D(2024)Deep Domain Adaptation With Max-Margin Principle for Cross-Project Imbalanced Software Vulnerability DetectionACM Transactions on Software Engineering and Methodology10.1145/366460233:6(1-34)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3664602
Watanabe MKashiwa YLin BHirao TYamaguchi KIida H(2024)On the Use of ChatGPT for Code Review: Do Developers Like Reviews By ChatGPT?Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661183(375-380)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3661167.3661183
Tufano RMastropaolo APepe FDabic ODi Penta MBavota GSpinellis DConstantinou EBacchelli A(2024)Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based StudyProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644918(571-583)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644918
Lin HThongtanunam PTreude CCharoenwet WSpinellis DConstantinou EBacchelli A(2024)Improving Automated Code Reviews: Learning From ExperienceProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644910(278-283)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644910
Liu YTantithamthavorn CLiu YLi L(2024)On the Reliability and Explainability of Language Models for Program GenerationACM Transactions on Software Engineering and Methodology10.1145/364154033:5(1-26)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3641540
Fu MNguyen VTantithamthavorn CPhung DLe T(2024)Vision Transformer Inspired Automated Vulnerability RepairACM Transactions on Software Engineering and Methodology10.1145/363274633:3(1-29)Online publication date: 15-Mar-2024
https://dl.acm.org/doi/10.1145/3632746
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents