research-article

Open access

DeepDelta: learning to repair compilation errors

Authors:

Emily Johnston,

Edward AftandilianAuthors Info & Claims

ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 925 - 936

https://doi.org/10.1145/3338906.3340455

Published: 12 August 2019 Publication History

Abstract

Programmers spend a substantial amount of time manually repairing code that does not compile. We observe that the repairs for any particular error class typically follow a pattern and are highly mechanical. We propose a novel approach that automatically learns these patterns with a deep neural network and suggests program repairs for the most costly classes of build-time compilation failures. We describe how we collect all build errors and the human-authored, in-progress code changes that cause those failing builds to transition to successful builds at Google. We generate an AST diff from the textual code changes and transform it into a domain-specific language called Delta that encodes the change that must be made to make the code compile. We then feed the compiler diagnostic information (as source) and the Delta changes that resolved the diagnostic (as target) into a Neural Machine Translation network for training. For the two most prevalent and costly classes of Java compilation errors, namely missing symbols and mismatched method signatures, our system called DeepDelta, generates the correct repair changes for 19,314 out of 38,788 (50%) of unseen compilation errors. The correct changes are in the top three suggested fixes 86% of the time on average.

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Je�rey Dean, Matthieu Devin, Sanjay Ghemawat, Geo�rey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI, Vol. 16. 265–283.

Digital Library

[2]

Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2017. A Survey of Machine Learning for Big Code and Naturalness. arXiv preprint arXiv:1709.06182 (2017).

[3]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In International Conference on Learning Representations.

[4]

Peter F Brown, Vincent J Della Pietra, Robert L Mercer, Stephen A Della Pietra, and Jennifer C Lai. 1992. An Estimate of an Upper Bound for the Entropy of English. Computational Linguistics 18, 1 (1992), 31–40.

Digital Library

[5]

Michael G. Burke and Gerald A. Fisher. 1987. A Practical Method for LR and LL Syntactic Error Diagnosis and Recovery. ACM Trans. Program. Lang. Syst. 9, 2 (March 1987), 164–197.

Digital Library

[6]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations Using RNN Encoder-decoder for Statistical Machine Translation. arXiv preprint arXiv:1406.1078 (2014).

[7]

Deborah Coughlin. 2003. Correlating Automated and Human Assessments of Machine Translation Quality. In Proceedings of MT summit IX. 63–70.

[8]

Loris D’Antoni, Rishabh Singh, and Michael Vaughn. 2017. NoFAQ: Synthesizing Command Repairs from Examples. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM, 582–592.

Digital Library

[9]

Je�rey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. 2012. Large Scale Distributed Deep Networks. In Advances in neural information processing systems. 1223–1231.

Digital Library

[10]

Favio DeMarco, Jifeng Xuan, Daniel Le Berre, and Martin Monperrus. 2014. Automatic Repair of Buggy if Conditions and Missing Preconditions with SMT. In Proceedings of the 6th International Workshop on Constraints in Software Testing, Veri�cation, and Analysis. 30–39.

Digital Library

[11]

Brian Demsky and Martin Rinard. 2005. Data Structure Repair Using Goaldirected Reasoning. In Proceedings of the International Conference on Software Engineering (ICSE). 176–185.

Digital Library

[12]

Bassem Elkarablieh, Ivan Garcia, Yuk Lai Suen, and Sarfraz Khurshid. 2007. Assertion-based Repair of Complex Data Structures. In Proceedings of the twentysecond IEEE/ACM international conference on Automated software engineering. ACM, 64–73.

Digital Library

[13]

Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and Accurate Source Code Di�erencing. In International Conference on Automated Software Engineering (ASE). 313–324.

Digital Library

[14]

Beat Fluri and Harald C Gall. 2006. Classifying Change Types for Qualifying Change Couplings. In Program Comprehension, 2006. ICPC 2006. 14th IEEE International Conference on. IEEE, 35–45.

Digital Library

[15]

Beat Fluri, Michael Wuersch, Martin Pinzger, and Harald Gall. 2007. Change Distilling: Tree Di�erencing for Fine-grained Source Code Change Extraction. IEEE Transactions on software engineering 33, 11 (2007).

Digital Library

[16]

C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer. 2012. GenProg: A Generic Method for Automatic Software Repair. IEEE Transactions on Software Engineering 38, 1 (Jan 2012), 54–72.

Digital Library

[17]

Yvette Graham and Timothy Baldwin. 2014. Testing for Signi�cance of Increased Correlation with Human Judgment. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 172–176.

[18]

Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. In Proceedings of the Conference on Arti�cial Intelligence (AAAI). 1345–1351.

Digital Library

[19]

Quinn Hanam, Fernando S de M Brito, and Ali Mesbah. 2016. Discovering Bug Patterns in JavaScript. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE). ACM, 144–156.

Digital Library

[20]

Foyzul Hassan and Xiaoyin Wang. 2018. HireBuild: An Automatic Approach to History-driven Repair of Build Scripts. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 1078–1089.

Digital Library

[21]

Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the Naturalness of Software. In Proceedings of the International Conference on Software Engineering (ICSE). IEEE, 837–847.

Digital Library

[22]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural computation 9, 8 (1997), 1735–1780.

Digital Library

[23]

James Wayne Hunt and M Douglas McIlroy. 1976. An Algorithm for Di�erential File Comparison. Bell Laboratories Murray Hill.

[24]

Y. Ke, K. T. Stolee, C. L. Goues, and Y. Brun. 2015. Repairing Programs with Semantic Code Search. In Proceedings of the International Conference on Automated Software Engineering (ASE). IEEE Computer Society, 295–306. 1109/ASE.2015.60

[25]

Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic patch generation learned from human-written patches. In Proceedings of the International Conference on Software Engineering (ICSE). 802–811.

Digital Library

[26]

Benjamin Livshits and Thomas Zimmermann. 2005. DynaMine: Finding Common Error Patterns by Mining Software Revision Histories. In ACM SIGSOFT Software Engineering Notes, Vol. 30. ACM, 296–305.

Digital Library

[27]

Fan Long, Peter Amidon, and Martin Rinard. 2017. Automatic Inference of Code Transforms for Patch Generation. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering. ACM, 727–739.

Digital Library

[28]

Fan Long and Martin Rinard. 2015. Staged Program Repair with Condition Synthesis. In Proceedings Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 166–178. 2786805.2786811

Digital Library

[29]

Fan Long and Martin Rinard. 2016. An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems. In Proceedings of the International Conference on Software Engineering (ICSE). ACM, New York, NY, USA, 702–713.

Digital Library

[30]

Thang Luong, Hieu Pham, and Christopher D Manning. 2015. E�ective Approaches to Attention-based Neural Machine Translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1412–1421.

[31]

Christian Macho, Shane McIntosh, and Martin Pinzger. 2018. Automatically Repairing Dependency-related Build Breakage. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 106–117.

[32]

James MacQueen et al. 1967. Some Methods for Classi�cation and Analysis of Multivariate Observations. In Proceedings of the �fth Berkeley symposium on mathematical statistics and probability, Vol. 1. Oakland, CA, USA, 281–297.

[33]

Na Meng, Miryung Kim, and Kathryn S McKinley. 2013. LASE: Locating and Applying Systematic Edits by Learning from Examples. IEEE Press.

[34]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Je� Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Advances in neural information processing systems. 3111–3119.

Digital Library

[35]

Hoan Anh Nguyen, Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N Nguyen, and Hridesh Rajan. 2013. A Study of Repetitiveness of Code Changes in Software Evolution. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 180–190.

Digital Library

[36]

Frolin Ocariza, Karthik Pattabiraman, and Ali Mesbah. 2014. Vejovis: Suggesting Fixes for JavaScript Faults. In Proceedings of the International Conference on Software Engineering (ICSE). ACM, 837–847.

Digital Library

[37]

Kai Pan, Sunghun Kim, and E James Whitehead. 2009. Toward an Understanding of Bug Fix Patterns. Empirical Software Engineering 14, 3 (2009), 286–315.

Digital Library

[38]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 311–318.

Digital Library

[39]

Terence J. Parr and Russell W. Quong. 1995. ANTLR: A Predicated-LL (k) Parser Generator. Software: Practice and Experience 25, 7 (1995), 789–810.

Digital Library

[40]

Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code Completion with Statistical Language Models. In Acm Sigplan Notices, Vol. 49. ACM, 419–428.

Digital Library

[41]

Reudismam Rolim, Gustavo Soares, Loris D’Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. 2017. Learning Syntactic Program Transformations from Examples. In Proceedings of the 39th International Conference on Software Engineering. IEEE Press, 404–415.

Digital Library

[42]

Hesam Samimi, Max Schäfer, Shay Artzi, Todd Millstein, Frank Tip, and Laurie Hendren. 2012. Automated Repair of HTML Generation Errors in PHP Applications Using String Constraint Solving. In Proceedings of the International Conference on Software Engineering (ICSE). IEEE, 277–287.

Digital Library

[43]

Eric L. Seidel, Huma Sibghat, Kamalika Chaudhuri, Westley Weimer, and Ranjit Jhala. 2017. Learning to Blame: Localizing Novice Type Errors with Data-driven Diagnosis. Proc. ACM Program. Lang. 1, OOPSLA, Article 60 (Oct. 2017), 27 pages.

Digital Library

[44]

Hyunmin Seo, Caitlin Sadowski, Sebastian Elbaum, Edward Aftandilian, and Robert Bowdidge. 2014. Programmers’ Build Errors: A Case Study (at Google). In Proceedings of the International Conference on Software Engineering (ICSE). 724–734.

Digital Library

[45]

Nitish Srivastava, Geo�rey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Over�tting. Journal of Machine Learning Research 15 (2014), 1929–1958. http: //jmlr.org/papers/v15/srivastava14a.html

Digital Library

[46]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in neural information processing systems. 3104–3112.

Digital Library

[47]

Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically Learning Semantic Features for Defect Prediction. In Proceedings of the International Conference on Software Engineering (ICSE). ACM, 297–308. ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia A. Mesbah, A. Rice, E. Johnston, N. Glorioso, and E. A�andilian

Digital Library

[48]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Je� Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, ?ukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cli� Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macdu� Hughes, and Je�rey Dean. 2016. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. CoRR abs/1609.08144 (2016). http://arxiv.org/abs/1609.

[49]

08144

[50]

Sai Zhang, Hao Lü, and Michael D Ernst. 2013. Automatically Repairing Broken Work�ows for Evolving GUI Applications. In Proceedings of the International Symposium on Software Testing and Analysis. 45–55.

Digital Library

Cited By

Liu PLin BQin YWeng CChen L(2024)T-RAP: A Template-guided Retrieval-Augmented Vulnerability Patch Generation ApproachProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3672506(105-114)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3672506
Wan YBi ZHe YZhang JZhang HSui YXu GJin HYu P(2024)Deep Learning for Code Intelligence: Survey, Benchmark and ToolkitACM Computing Surveys10.1145/3664597Online publication date: 18-May-2024
https://doi.org/10.1145/3664597
Ramos DLynce IManquinho VMartins RLe Goues C(2024)BatFix: Repairing language model-based transpilationACM Transactions on Software Engineering and Methodology10.1145/365866833:6(1-29)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3658668
Show More Cited By

Index Terms

DeepDelta: learning to repair compilation errors
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

Compilation error repair: for the student programs, from the student programs
ICSE-SEET '18: Proceedings of the 40th International Conference on Software Engineering: Software Engineering Education and Training

Compile-time errors pose a major learning hurdle for students of introductory programming courses. Compiler error messages, while accurate, are targeted at seasoned programmers, and seem cryptic to beginners. In this work, we address this problem of ...
TransRepair: Context-aware Program Repair for Compilation Errors
ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

Automatically fixing compilation errors can greatly raise the productivity of software development, by guiding the novice or AI programmers to write and debug code. Recently, learning-based program repair has gained extensive attention and became the ...
MACER: A Modular Framework for Accelerated Compilation Error Repair
Artificial Intelligence in Education
Abstract
Automated compilation error repair, the problem of suggesting fixes to buggy programs that fail to compile, has pedagogical applications for novice programmers who find compiler error messages cryptic and unhelpful. Existing works frequently ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

August 2019

1264 pages

ISBN:9781450355728

DOI:10.1145/3338906

General Chairs:
Marlon Dumas
University of Tartu, Estonia
,
Dietmar Pfahl
University of Tartu, Estonia
,
Program Chairs:
Sven Apel
Saarland University, Germany
,
Alessandra Russo
Imperial College, UK

Copyright © 2019 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE '19

Sponsor:

SIGSOFT

ESEC/FSE '19: 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

August 26 - 30, 2019

Tallinn, Estonia

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

70
Total Citations
View Citations
1,961
Total Downloads

Downloads (Last 12 months)395
Downloads (Last 6 weeks)53

Reflects downloads up to 14 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu PLin BQin YWeng CChen L(2024)T-RAP: A Template-guided Retrieval-Augmented Vulnerability Patch Generation ApproachProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3672506(105-114)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3672506
Wan YBi ZHe YZhang JZhang HSui YXu GJin HYu P(2024)Deep Learning for Code Intelligence: Survey, Benchmark and ToolkitACM Computing Surveys10.1145/3664597Online publication date: 18-May-2024
https://doi.org/10.1145/3664597
Ramos DLynce IManquinho VMartins RLe Goues C(2024)BatFix: Repairing language model-based transpilationACM Transactions on Software Engineering and Methodology10.1145/365866833:6(1-29)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3658668
Yang WSong LXue YRoychoudhury APaiva AAbreu RStorey M(2024)Rust-lancet: Automated Ownership-Rule-Violation Fixing with Behavior PreservationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639103(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639103
Zhong HMeng NRoychoudhury APaiva AAbreu RStorey M(2024)Compiler-directed Migrating API Callsite of Client CodeProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639084(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639084
Wu ZYang DLei YXie HTang MLi M(2024)Labelrepair: Sequence Labelling for Compilation Errors Repair2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00094(860-871)Online publication date: 12-Mar-2024
https://doi.org/10.1109/SANER60148.2024.00094
Sharma TKechagia MGeorgiou STiwari RVats IMoazen HSarro F(2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111934
Mak CCheung S(2024)Automatic build repair for test cases using incompatible java versionsInformation and Software Technology10.1016/j.infsof.2024.107473(107473)Online publication date: Apr-2024
https://doi.org/10.1016/j.infsof.2024.107473
Song YXie XXu B(2024)When debugging encounters artificial intelligence: state of the art and open challengesScience China Information Sciences10.1007/s11432-022-3803-967:4Online publication date: 21-Feb-2024
https://doi.org/10.1007/s11432-022-3803-9
Shi RHu JLin B(2023)Mining on Students’ Execution Logs and Repairing Compilation Errors Based on Deep LearningApplied Sciences10.3390/app1317993313:17(9933)Online publication date: 2-Sep-2023
https://doi.org/10.3390/app13179933
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents