short-paper

An empirical evaluation of GitHub copilot's code suggestions

Authors:

Sarah NadiAuthors Info & Claims

MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories

Pages 1 - 5

https://doi.org/10.1145/3524842.3528470

Published: 17 October 2022 Publication History

Abstract

GitHub and OpenAI recently launched Copilot, an "AI pair programmer" that utilizes the power of Natural Language Processing, Static Analysis, Code Synthesis, and Artificial Intelligence. Given a natural language description of the target functionality, Copilot can generate corresponding code in several programming languages. In this paper, we perform an empirical study to evaluate the correctness and understandability of Copilot's suggested code. We use 33 LeetCode questions to create queries for Copilot in four different programming languages. We evaluate the correctness of the corresponding 132 Copilot solutions by running LeetCode's provided tests, and evaluate understandability using SonarQube's cyclomatic complexity and cognitive complexity metrics. We find that Copilot's Java suggestions have the highest correctness score (57%) while JavaScript is the lowest (27%). Overall, Copilot's suggestions have low complexity with no notable differences between the programming languages. We also find some potential Copilot shortcomings, such as generating code that can be further simplified and code that relies on undefined helper methods.

References

[1]

Romaana Aamir. 2021. GitHub copilot-bright future or an impending doom. https://code.likeagirl.io/github-copilot-bright-future-or-an-impending-doom-df0f1674a50c

[2]

Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51, 4 (2018), 1--37.

Digital Library

[3]

Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages 3, POPL (2019), 1--29.

Digital Library

[4]

Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. arXiv:2108.07732 [cs.PL]

[5]

Jose Cambronero, Hongyu Li, Seohyun Kim, Koushik Sen, and Satish Chandra. 2019. When deep learning met code search. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 964--974.

Digital Library

[6]

G. Ann Campbell. 2018. Cognitive Complexity: An Overview and Evaluation. In Proceedings of the 2018 International Conference on Technical Debt (Gothenburg, Sweden) (TechDebt '18). Association for Computing Machinery, New York, NY, USA, 57--58.

Digital Library

[7]

G. Ann Campbell. 2021. Cognitive complexity - A new way of measuring understandability. https://www.sonarsource.com/docs/CognitiveComplexity.pdf

[8]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs.LG]

[9]

Carlos Eduardo de Carvalho Dantas and Marcelo de Almeida Maia. 2021. Readability and Understandability Scores for Snippet Assessment: an Exploratory Study. CoRR abs/2108.09181 (2021). arXiv:2108.09181 https://arxiv.org/abs/2108.09181

[10]

fabasoad and Sachin131. 2016. Is there public API endpoints available for leet-code? https://leetcode.com/discuss/general-discussion/1297705/is-there-public-api-endpoints-available-for-leetcode

[11]

GitHub. 2021. GitHub Copilot · Your AI pair programmer. https://copilot.github.com/

[12]

Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep Code Search. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 933--944.

Digital Library

[13]

HackerRank. [n.d.]. HackerRank for Work API. https://www.hackerrank.com/work/apidocs#

[14]

Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In 2012 34th International Conference on Software Engineering (ICSE). 837--847.

[15]

LeetCode. [n.d.]. Integer Break. https://leetcode.com/problems/integer-break/

[16]

LeetCode. [n.d.]. Longest Increasing Path in a Matrix. https://leetcode.com/problems/longest-increasing-path-in-a-matrix

[17]

LeetCode. 2019. Start your coding practice -. https://support.leetcode.com/hc/enus/articles/360012016874-Start-your-Coding-Practice

[18]

LeetCode. 2021. The world's leading online programming learning platform. https://leetcode.com/

[19]

Jian Li, Yue Wang, Michael R. Lyu, and Irwin King. 2018. Code Completion with Neural Attention and Pointer Networks. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18. International Joint Conferences on Artificial Intelligence Organization, 4159--4165.

[20]

Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. 2019. Aroma: Code recommendation via structural code search. Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 1--28.

Digital Library

[21]

Matthew MacDonald. 2021. GitHub copilot: Fatally flawed or the future of software development? https://medium.com/young-coder/github-copilot-fatally-flawed-or-the-future-of-software-development-390c30afbc97

[22]

Gerald Mücke and G Ann Campbell. 2021. How to use cognitive complexity? https://community.sonarsource.com/t/how-to-use-cognitive-complexity/1894/7

[23]

Nhan Nguyen and Sarah Nadi. 2022. Online artifact for MSR 2022 Submission "An Empirical Evaluation of GitHub Copilot's Code Suggestions".

[24]

Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2021. Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions. arXiv:2108.09293 [cs.CR]

[25]

Martin Robillard, Robert Walker, and Thomas Zimmermann. 2009. Recommendation systems for software engineering. IEEE software 27, 4 (2009), 80--86.

[26]

Swapnil Rustagi and Jagga Jasoos. 2019. Access to CodeChef API. https://discuss.codechef.com/t/access-to-codechef-api/27308

[27]

Dominik Sobania, Martin Briesch, and Franz Rothlauf. 2021. Choose Your Programming Copilot: A Comparison of the Program Synthesis Performance of GitHub Copilot and Genetic Programming. arXiv:2111.07875 [cs.SE]

[28]

SonarQube. 2021. Code quality and code security. https://www.sonarqube.org/

[29]

SonarQube. 2021. Metric definitions. https://docs.sonarqube.org/latest/user-guide/metric-definitions/

[30]

Meng Xia, Mingfei Sun, Huan Wei, Qing Chen, Yong Wang, Lei Shi, Huamin Qu, and Xiaojuan Ma. 2019. PeerLens: Peer-Inspired Interactive Learning Path Planning in Online Question Pool. Association for Computing Machinery, New York, NY, USA, 1--12.

Digital Library

[31]

Ziyu Yao, Jayavardhan Reddy Peddamail, and Huan Sun. 2019. CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning. In The World Wide Web Conference (San Francisco, CA, USA) (WWW '19). Association for Computing Machinery, New York, NY, USA, 2203--2214.

Digital Library

[32]

Qihao Zhu and Wenjie Zhang. 2021. Code Generation Based on Deep Learning: a Brief Review. arXiv:2106.08253 [cs.SE]

Cited By

Lira WSantos Neto POsorio L(2024)Uma análise do uso de ferramentas de geração de código por alunos de computaçãoAnais do IV Simpósio Brasileiro de Educação em Computação (EDUCOMP 2024)10.5753/educomp.2024.237427(63-71)Online publication date: 22-Apr-2024
https://doi.org/10.5753/educomp.2024.237427
Abu Doush I(2024)The Current State of Generative Artificial Intelligence Tools for Accessibility in Product DevelopmentNafath10.54455/MCN26059:26Online publication date: 30-Jul-2024
https://doi.org/10.54455/MCN2605
Yeo SMa YKim SJun HKim T(2024)Framework for evaluating code generation ability of large language modelsETRI Journal10.4218/etrij.2023-035746:1(106-117)Online publication date: 14-Feb-2024
https://doi.org/10.4218/etrij.2023-0357
Show More Cited By

Index Terms

An empirical evaluation of GitHub copilot's code suggestions
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Reusability
  2. Software notations and tools
    1. Development frameworks and environments
      1. Integrated and visual development environments

Recommendations

Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language
SIGCSE 2023: Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1

GitHub Copilot is an artificial intelligence tool for automatically generating source code from natural language problem descriptions. Since June 2022, Copilot has officially been available for free to all students as a plug-in to development ...
Choose your programming copilot: a comparison of the program synthesis performance of github copilot and genetic programming
GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference

GitHub Copilot, an extension for the Visual Studio Code development environment powered by the large-scale language model Codex, makes automatic program synthesis available for software developers. This model has been extensively studied in the field of ...
Empirical Evaluation of Bug Linking
CSMR '13: Proceedings of the 2013 17th European Conference on Software Maintenance and Reengineering

To collect software bugs found by users, development teams often set up bug trackers using systems such as Bugzilla. Developers would then fix some of the bugs and commit corresponding code changes into version control systems such as svn or git. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories

May 2022

815 pages

ISBN:9781450393034

DOI:10.1145/3524842

General Chair:
David Lo
Singapore Management University, Singapore
,
Program Chairs:
Shane McIntosh
University of Waterloo, Canada
,
Nicole Novielli
University of Bari, Italy

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Canada Research Chairs Program

Conference

MSR '22

Sponsor:

SIGSOFT

MSR '22: 19th International Conference on Mining Software Repositories

May 23 - 24, 2022

Pennsylvania, Pittsburgh

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

75
Total Citations
View Citations
2,572
Total Downloads

Downloads (Last 12 months)1,332
Downloads (Last 6 weeks)55

Reflects downloads up to 14 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lira WSantos Neto POsorio L(2024)Uma análise do uso de ferramentas de geração de código por alunos de computaçãoAnais do IV Simpósio Brasileiro de Educação em Computação (EDUCOMP 2024)10.5753/educomp.2024.237427(63-71)Online publication date: 22-Apr-2024
https://doi.org/10.5753/educomp.2024.237427
Abu Doush I(2024)The Current State of Generative Artificial Intelligence Tools for Accessibility in Product DevelopmentNafath10.54455/MCN26059:26Online publication date: 30-Jul-2024
https://doi.org/10.54455/MCN2605
Yeo SMa YKim SJun HKim T(2024)Framework for evaluating code generation ability of large language modelsETRI Journal10.4218/etrij.2023-035746:1(106-117)Online publication date: 14-Feb-2024
https://doi.org/10.4218/etrij.2023-0357
Poitras ECrane BDempsey DBragg TSiegel ALin M(2024)Cognitive Apprenticeship and Artificial Intelligence Coding AssistantsNavigating Computer Science Education in the 21st Century10.4018/979-8-3693-1066-3.ch013(261-281)Online publication date: 26-Feb-2024
https://doi.org/10.4018/979-8-3693-1066-3.ch013
Idrisov BSchlippe T(2024)Program Code Generation with Generative AIsAlgorithms10.3390/a1702006217:2(62)Online publication date: 31-Jan-2024
https://doi.org/10.3390/a17020062
Dunay OCheng DTait AThakkar PRigby PChiu AAhmad IGanesan AMaddila CMurali VTayyebi ANagappan Nd'Amorim M(2024)Multi-line AI-Assisted Code AuthoringCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663836(150-160)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663836
Khojah RMohamad MLeitner Pde Oliveira Neto F(2024)Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering PracticeProceedings of the ACM on Software Engineering10.1145/36607881:FSE(1819-1840)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660788
Zhang L(2024)Designing Accessible Content Creation Support with Blind and Low Vision CreatorsACM SIGACCESS Accessibility and Computing10.1145/3654768.3654775(1-1)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3654768.3654775
Russo D(2024)Navigating the Complexity of Generative AI Adoption in Software EngineeringACM Transactions on Software Engineering and Methodology10.1145/365215433:5(1-50)Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3652154
Paspallis NPanayiotou P(2024)An Assessment of ML-based Sentiment Analysis for Intelligent Web FilteringProceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3652037.3652039(80-87)Online publication date: 26-Jun-2024
https://dl.acm.org/doi/10.1145/3652037.3652039
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents