research-article

GPT-3-Powered Type Error Debugging: Investigating the Use of Large Language Models for Code Repair

Authors:

Francisco Ribeiro,

José Nuno Castro de Macedo,

Kanae Tsushima,

João SaraivaAuthors Info & Claims

SLE 2023: Proceedings of the 16th ACM SIGPLAN International Conference on Software Language Engineering

Pages 111 - 124

https://doi.org/10.1145/3623476.3623522

Published: 23 October 2023 Publication History

Abstract

Type systems are responsible for assigning types to terms in programs. That way, they enforce the actions that can be taken and can, consequently, detect type errors during compilation. However, while they are able to flag the existence of an error, they often fail to pinpoint its cause or provide a helpful error message. Thus, without adequate support, debugging this kind of errors can take a considerable amount of effort. Recently, neural network models have been developed that are able to understand programming languages and perform several downstream tasks. We argue that type error debugging can be enhanced by taking advantage of this deeper understanding of the language’s structure. In this paper, we present a technique that leverages GPT-3’s capabilities to automatically fix type errors in OCaml programs. We perform multiple source code analysis tasks to produce useful prompts that are then provided to GPT-3 to generate potential patches. Our publicly available tool, Mentat, supports multiple modes and was validated on an existing public dataset with thousands of OCaml programs. We automatically validate successful repairs by using Quickcheck to verify which generated patches produce the same output as the user-intended fixed version, achieving a 39% repair rate. In a comparative study, Mentat outperformed two other techniques in automatically fixing ill-typed OCaml programs.

References

[1]

Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3, POPL (2019), 1–29.

Digital Library

[2]

Aaron Ang, Alexandre Perez, Arie Van Deursen, and Rui Abreu. 2017. Revisiting the practical use of automated software fault localization techniques. In 2017 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). 175–182.

[3]

Andrea Arcuri. 2011. Evolutionary repair of faulty software. Applied soft computing, 11, 4 (2011), 3494–3514.

[4]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. CoRR, abs/2005.14165 (2020), arXiv:2005.14165. arxiv:2005.14165

[5]

José Campos, André Riboira, Alexandre Perez, and Rui Abreu. 2012. Gzoltar: an eclipse plug-in for testing and debugging. In Proceedings of the 27th IEEE/ACM international conference on automated software engineering. 378–381.

Digital Library

[6]

Sheng Chen and Martin Erwig. 2014. Counter-factual typing for debugging type errors. In Symposium on Principles of Programming Languages. Proceedings (POPL ’14). ACM, 583–594. isbn:9781450325448 https://doi.org/10.1145/2535838.2535863

Digital Library

[7]

Sheng Chen and Martin Erwig. 2014. Guided type debugging. In Functional and Logic Programming. Proceedings, Michael Codish and Eijiro Sumii (Eds.) (LNCS 8475). Springer, 35–51. https://doi.org/10.1007/978-3-319-07151-0_3

[8]

Zimin Chen, Steve Kommrusch, Michele Tufano, Louis-Noël Pouchet, Denys Poshyvanyk, and Martin Monperrus. 2019. Sequencer: Sequence-to-sequence learning for end-to-end program repair. IEEE Transactions on Software Engineering, 47, 9 (2019), 1943–1959.

[9]

Olaf Chitil. 2001. Compositional explanation of types and algorithmic debugging of type errors. In International Conference on Functional Programming. Proceedings (ICFP ’01). ACM, 193–204. isbn:1581134150 https://doi.org/10.1145/507635.507659

Digital Library

[10]

Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. SIGPLAN Not., 35, 9 (2000), sep, 268–279. issn:0362-1340 https://doi.org/10.1145/357766.351266

Digital Library

[11]

David Coimbra, Sofia Reis, Rui Abreu, Corina Păsăreanu, and Hakan Erdogmus. 2021. On using distributed representations of source code for the detection of C security vulnerabilities. arXiv preprint arXiv:2106.01367.

[12]

Luis Damas and Robin Milner. 1982. Principal type-schemes for functional programs. In Symposium on Principles of Programming Languages. Proceedings (POPL ’82). ACM, 207–212. isbn:0897910656 https://doi.org/10.1145/582153.582176

Digital Library

[13]

Richard A DeMillo, Richard J Lipton, and Frederick G Sayward. 1978. Hints on test data selection: Help for the practicing programmer. Computer, 11, 4 (1978), 34–41.

Digital Library

[14]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

[15]

Yangruibo Ding, Baishakhi Ray, Premkumar Devanbu, and Vincent J Hellendoorn. 2020. Patching as translation: the data and the metaphor. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). 275–286.

Digital Library

[16]

Thomas Durieux and Martin Monperrus. 2016. Dynamoth: dynamic code synthesis for automatic program repair. In Proceedings of the 11th International Workshop on Automation of Software Test. 85–91.

Digital Library

[17]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, and Daxin Jiang. 2020. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155.

[18]

Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated Program Repair. Commun. ACM, 62, 12 (2019), nov, 56–65. issn:0001-0782 https://doi.org/10.1145/3318162

Digital Library

[19]

Christian Haack and Joe B. Wells. 2004. Type error slicing in implicitly typed higher-order languages. Science of Computer Programming, 50, 1-3 (2004), 189–224. https://doi.org/10.1016/j.scico.2004.01.004

Digital Library

[20]

BJ Heeren, JT Jeuring, Doaitse Swierstra, and Pablo Azero Alcocer. 2002. Improving type-error messages in functional languages.

[21]

Bastiaan Heeren, Daan Leijen, and Arjan van IJzendoorn. 2003. Helium, for learning Haskell. In Proceedings of the 2003 ACM SIGPLAN workshop on Haskell. 62–71.

Digital Library

[22]

Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis. 2018. Deep Learning Type Inference. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA. 152–162. isbn:9781450355735 https://doi.org/10.1145/3236024.3236051

Digital Library

[23]

Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2011. Genprog: A generic method for automatic software repair. Ieee transactions on software engineering, 38, 1 (2011), 54–72.

[24]

Oukseh Lee and Kwangkeun Yi. 1998. Proofs about a folklore let-polymorphic type inference algorithm. ACM Transactions on Programming Languages and Systems, 20, 4 (1998), 707–723. issn:0164-0925 https://doi.org/10.1145/291891.291892

Digital Library

[25]

Benjamin S. Lerner, Matthew Flower, Dan Grossman, and Craig Chambers. 2007. Searching for Type-Error Messages. SIGPLAN Not., 42, 6 (2007), jun, 425–434. issn:0362-1340 https://doi.org/10.1145/1273442.1250783

Digital Library

[26]

Yi Li, Shaohua Wang, and Tien N Nguyen. 2020. Dlfix: Context-based code transformation learning for automated program repair. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 602–614.

Digital Library

[27]

Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, and Duyu Tang. 2021. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664.

[28]

Thibaud Lutellier, Hung Viet Pham, Lawrence Pang, Yitong Li, Moshi Wei, and Lin Tan. 2020. Coconut: combining context-aware neural translation models using ensemble for program repair. In Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis. 101–114.

Digital Library

[29]

Seokhyeon Moon, Yunho Kim, Moonzoo Kim, and Shin Yoo. 2014. Ask the mutants: Mutating faulty programs for fault localization. In 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation. 153–162.

Digital Library

[30]

Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. 2013. Semfix: Program repair via semantic analysis. In 2013 35th International Conference on Software Engineering (ICSE). 772–781.

[31]

Chris Parnin and Alessandro Orso. 2011. Are automated debugging techniques actually helping programmers? In Proceedings of the 2011 international symposium on software testing and analysis. 199–209.

Digital Library

[32]

Spencer Pearson, José Campos, René Just, Gordon Fraser, Rui Abreu, Michael D Ernst, Deric Pang, and Benjamin Keller. 2017. Evaluating and improving fault localization. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). 609–620.

Digital Library

[33]

Alexandre Perez, Rui Abreu, and IT HASLab. 2018. Leveraging Qualitative Reasoning to Improve SFL. In IJCAI. 1935–1941.

[34]

Benjamin C. Pierce. 2002. Types and Programming Languages (1st ed.). The MIT Press. isbn:0262162091

[35]

Julian Aron Prenner, Hlib Babii, and Romain Robbes. 2022. Can OpenAI’s codex fix bugs? an evaluation on QuixBugs. In Proceedings of the Third International Workshop on Automated Program Repair. 69–75.

Digital Library

[36]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1, 8 (2019), 9.

[37]

Vincent Rahli, Joe B. Wells, John Pirie, and Fairouz Kamareddine. 2017. Skalpel: a constraint-based type error slicer for Standard ML. Journal of Symbolic Computation, 80, 1 (2017), 164–208. https://doi.org/10.1016/j.jsc.2016.07.013

Digital Library

[38]

Francisco Ribeiro, Rui Abreu, and João Saraiva. 2021. On Understanding Contextual Changes of Failures. In 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS). 1036–1047.

[39]

Francisco Ribeiro, Rui Abreu, and João Saraiva. 2022. Framing Program Repair as Code Completion. In Proceedings of the Third International Workshop on Automated Program Repair (APR ’22). Association for Computing Machinery, New York, NY, USA. 38–45. isbn:9781450392853 https://doi.org/10.1145/3524459.3527347

Digital Library

[40]

Francisco Ribeiro, José Macedo, Kanae Tsushima, Rui Abreu, and João Saraiva. 2023. GPT-3-Powered Type Error Debugging: Investigating the Use of Large Language Models for Code Repair (SLE 2023). 10, https://doi.org/10.6084/m9.figshare.23646903.v2

[41]

Georgios Sakkas, Madeline Endres, Benjamin Cosman, Westley Weimer, and Ranjit Jhala. 2020. Type Error Feedback via Analytic Program Repair. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 16–30. isbn:9781450376136 https://doi.org/10.1145/3385412.3386005

Digital Library

[42]

Thomas Schilling. 2012. Constraint-free type error slicing. In Trends in Functional Programming. Proceedings, Ricardo Peña and Rex Page (Eds.) (LNCS 7193). Springer, 1–16. isbn:978-3-642-32037-8 https://doi.org/10.1007/978-3-642-32037-8_1

Digital Library

[43]

Peter J. Stuckey, Martin Sulzmann, and Jeremy Wazny. 2003. Interactive type debugging in Haskell. In Workshop on Haskell. Proceedings (Haskell ’03). ACM, 72–83. https://doi.org/10.1145/871895.871903

Digital Library

[44]

Peter J. Stuckey, Martin Sulzmann, and Jeremy Wazny. 2004. Improving type error diagnosis. In Workshop on Haskell. Proceedings (Haskell ’04). ACM, 80–91. isbn:1581138504 https://doi.org/10.1145/1017472.1017486

Digital Library

[45]

Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. Intellicode compose: Code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1433–1443.

Digital Library

[46]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arxiv:2302.13971.

[47]

Kanae Tsushima and Kenichi Asai. 2012. An embedded type debugger. In Symposium on Implementation and Application of Functional Languages. 190–206.

[48]

Kanae Tsushima and Kenichi Asai. 2013. An embedded type debugger. In Implementation and Application of Functional Languages. Proceedings, Ralf Hinze (Ed.) (LNCS 8241). Springer, 190–206. https://doi.org/10.1007/978-3-642-41582-1_12

[49]

Kanae Tsushima and Kenichi Asai. 2014. A weighted type-error slicer. Journal of Computer Software, 31, 4 (2014), 131–148.

[50]

Kanae Tsushima, Olaf Chitil, and Joanna Sharrad. 2019. Type debugging with counter-factual type error messages using an existing type checker. In Symposium on Implementation and Application of Functional Languages. Proceedings (IFL ’19). ACM, Article 7, 12 pages. isbn:9781450375627 https://doi.org/10.1145/3412932.3412939

Digital Library

[51]

Mitchell Wand. 1986. Finding the source of type errors. In Symposium on Principles of Programming Languages. Proceedings (POPL ’86). ACM, 38–43. isbn:9781450373470 https://doi.org/10.1145/512644.512648

Digital Library

[52]

Jifeng Xuan, Matias Martinez, Favio Demarco, Maxime Clement, Sebastian Lamelas Marcote, Thomas Durieux, Daniel Le Berre, and Martin Monperrus. 2016. Nopol: Automatic repair of conditional statement bugs in java programs. IEEE Transactions on Software Engineering, 43, 1 (2016), 34–55.

Digital Library

[53]

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. 2022. OPT: Open Pre-trained Transformer Language Models. arxiv:2205.01068.

Cited By

Sengul CNeykova RDestefanis G(2024)Software engineering education in the era of conversational AI: current trends and future directionsFrontiers in Artificial Intelligence10.3389/frai.2024.14363507Online publication date: 29-Aug-2024
https://doi.org/10.3389/frai.2024.1436350
Koutcheme CDainese NSarsa SHellas ALeinonen JDenny PMonga MLonati VBarendsen ESheard JPaterson J(2024)Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-JudgeProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653612(52-58)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653612
Santos SSaraiva JRibeiro FHuyen PTan SMechtaev SKhurshid S(2024)Large Language Models in Automated Repair of Haskell Type ErrorsProceedings of the 5th ACM/IEEE International Workshop on Automated Program Repair10.1145/3643788.3648012(42-45)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643788.3648012
Show More Cited By

Index Terms

GPT-3-Powered Type Error Debugging: Investigating the Use of Large Language Models for Code Repair
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Software evolution

Recommendations

Large Language Models for Automated Program Repair
SPLASH 2023: Companion Proceedings of the 2023 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity

This paper introduces two methods for automated program repair (APR) utilizing pre-trained language models. The first method demonstrates program repair as a code completion task and is validated on a dataset of Java programs. The second method, ...
Large Language Models in Automated Repair of Haskell Type Errors
APR '24: Proceedings of the 5th ACM/IEEE International Workshop on Automated Program Repair

This paper introduces a new method of Automated Program Repair that relies on a combination of the GPT-4 Large Language Model and automatic type checking of Haskell programs. This method identifies the source of a type error and asks GPT-4 to fix that ...
Comparing developer-provided to user-provided tests for fault localization and automated program repair
ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

To realistically evaluate a software testing or debugging technique, it must be run on defects and tests that are characteristic of those a developer would encounter in practice. For example, to determine the utility of a fault localization or automated ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SLE 2023: Proceedings of the 16th ACM SIGPLAN International Conference on Software Language Engineering

October 2023

231 pages

ISBN:9798400703966

DOI:10.1145/3623476

General Chair:
João Saraiva
University of Minho, Portugal
,
Program Chairs:
Thomas Degueule
CNRS, France / LaBRI, France
,
Elizabeth Scott
Royal Holloway University of London, UK

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Artifacts Available / v1.1

Author Tags

Qualifiers

Research-article

Funding Sources

Fundação para a Ciência e a Tecnologia
Haslab/INESC TEC

Conference

SLE '23

Sponsor:

SIGPLAN

SLE '23: 16th ACM SIGPLAN International Conference on Software Language Engineering

October 23 - 24, 2023

Cascais, Portugal

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
306
Total Downloads

Downloads (Last 12 months)306
Downloads (Last 6 weeks)16

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sengul CNeykova RDestefanis G(2024)Software engineering education in the era of conversational AI: current trends and future directionsFrontiers in Artificial Intelligence10.3389/frai.2024.14363507Online publication date: 29-Aug-2024
https://doi.org/10.3389/frai.2024.1436350
Koutcheme CDainese NSarsa SHellas ALeinonen JDenny PMonga MLonati VBarendsen ESheard JPaterson J(2024)Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-JudgeProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653612(52-58)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653612
Santos SSaraiva JRibeiro FHuyen PTan SMechtaev SKhurshid S(2024)Large Language Models in Automated Repair of Haskell Type ErrorsProceedings of the 5th ACM/IEEE International Workshop on Automated Program Repair10.1145/3643788.3648012(42-45)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643788.3648012
Wang JHuang YChen CLiu ZWang SWang Q(2024)Software Testing With Large Language Models: Survey, Landscape, and VisionIEEE Transactions on Software Engineering10.1109/TSE.2024.336820850:4(911-936)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3368208
Guo YPatsakis CHu QTang QCasino F(2024)Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability DetectionComputer Security – ESORICS 202410.1007/978-3-031-70879-4_14(271-289)Online publication date: 5-Sep-2024
https://doi.org/10.1007/978-3-031-70879-4_14

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents