skip to main content
10.1145/3524842.3528470acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
short-paper

An empirical evaluation of GitHub copilot's code suggestions

Published: 17 October 2022 Publication History
  • Get Citation Alerts
  • Abstract

    GitHub and OpenAI recently launched Copilot, an "AI pair programmer" that utilizes the power of Natural Language Processing, Static Analysis, Code Synthesis, and Artificial Intelligence. Given a natural language description of the target functionality, Copilot can generate corresponding code in several programming languages. In this paper, we perform an empirical study to evaluate the correctness and understandability of Copilot's suggested code. We use 33 LeetCode questions to create queries for Copilot in four different programming languages. We evaluate the correctness of the corresponding 132 Copilot solutions by running LeetCode's provided tests, and evaluate understandability using SonarQube's cyclomatic complexity and cognitive complexity metrics. We find that Copilot's Java suggestions have the highest correctness score (57%) while JavaScript is the lowest (27%). Overall, Copilot's suggestions have low complexity with no notable differences between the programming languages. We also find some potential Copilot shortcomings, such as generating code that can be further simplified and code that relies on undefined helper methods.

    References

    [1]
    Romaana Aamir. 2021. GitHub copilot-bright future or an impending doom. https://code.likeagirl.io/github-copilot-bright-future-or-an-impending-doom-df0f1674a50c
    [2]
    Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51, 4 (2018), 1--37.
    [3]
    Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages 3, POPL (2019), 1--29.
    [4]
    Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. arXiv:2108.07732 [cs.PL]
    [5]
    Jose Cambronero, Hongyu Li, Seohyun Kim, Koushik Sen, and Satish Chandra. 2019. When deep learning met code search. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 964--974.
    [6]
    G. Ann Campbell. 2018. Cognitive Complexity: An Overview and Evaluation. In Proceedings of the 2018 International Conference on Technical Debt (Gothenburg, Sweden) (TechDebt '18). Association for Computing Machinery, New York, NY, USA, 57--58.
    [7]
    G. Ann Campbell. 2021. Cognitive complexity - A new way of measuring understandability. https://www.sonarsource.com/docs/CognitiveComplexity.pdf
    [8]
    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs.LG]
    [9]
    Carlos Eduardo de Carvalho Dantas and Marcelo de Almeida Maia. 2021. Readability and Understandability Scores for Snippet Assessment: an Exploratory Study. CoRR abs/2108.09181 (2021). arXiv:2108.09181 https://arxiv.org/abs/2108.09181
    [10]
    fabasoad and Sachin131. 2016. Is there public API endpoints available for leet-code? https://leetcode.com/discuss/general-discussion/1297705/is-there-public-api-endpoints-available-for-leetcode
    [11]
    GitHub. 2021. GitHub Copilot · Your AI pair programmer. https://copilot.github.com/
    [12]
    Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep Code Search. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 933--944.
    [13]
    HackerRank. [n.d.]. HackerRank for Work API. https://www.hackerrank.com/work/apidocs#
    [14]
    Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In 2012 34th International Conference on Software Engineering (ICSE). 837--847.
    [15]
    LeetCode. [n.d.]. Integer Break. https://leetcode.com/problems/integer-break/
    [16]
    LeetCode. [n.d.]. Longest Increasing Path in a Matrix. https://leetcode.com/problems/longest-increasing-path-in-a-matrix
    [17]
    LeetCode. 2019. Start your coding practice -. https://support.leetcode.com/hc/enus/articles/360012016874-Start-your-Coding-Practice
    [18]
    LeetCode. 2021. The world's leading online programming learning platform. https://leetcode.com/
    [19]
    Jian Li, Yue Wang, Michael R. Lyu, and Irwin King. 2018. Code Completion with Neural Attention and Pointer Networks. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18. International Joint Conferences on Artificial Intelligence Organization, 4159--4165.
    [20]
    Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. 2019. Aroma: Code recommendation via structural code search. Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 1--28.
    [21]
    Matthew MacDonald. 2021. GitHub copilot: Fatally flawed or the future of software development? https://medium.com/young-coder/github-copilot-fatally-flawed-or-the-future-of-software-development-390c30afbc97
    [22]
    Gerald Mücke and G Ann Campbell. 2021. How to use cognitive complexity? https://community.sonarsource.com/t/how-to-use-cognitive-complexity/1894/7
    [23]
    Nhan Nguyen and Sarah Nadi. 2022. Online artifact for MSR 2022 Submission "An Empirical Evaluation of GitHub Copilot's Code Suggestions".
    [24]
    Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2021. Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions. arXiv:2108.09293 [cs.CR]
    [25]
    Martin Robillard, Robert Walker, and Thomas Zimmermann. 2009. Recommendation systems for software engineering. IEEE software 27, 4 (2009), 80--86.
    [26]
    Swapnil Rustagi and Jagga Jasoos. 2019. Access to CodeChef API. https://discuss.codechef.com/t/access-to-codechef-api/27308
    [27]
    Dominik Sobania, Martin Briesch, and Franz Rothlauf. 2021. Choose Your Programming Copilot: A Comparison of the Program Synthesis Performance of GitHub Copilot and Genetic Programming. arXiv:2111.07875 [cs.SE]
    [28]
    SonarQube. 2021. Code quality and code security. https://www.sonarqube.org/
    [29]
    SonarQube. 2021. Metric definitions. https://docs.sonarqube.org/latest/user-guide/metric-definitions/
    [30]
    Meng Xia, Mingfei Sun, Huan Wei, Qing Chen, Yong Wang, Lei Shi, Huamin Qu, and Xiaojuan Ma. 2019. PeerLens: Peer-Inspired Interactive Learning Path Planning in Online Question Pool. Association for Computing Machinery, New York, NY, USA, 1--12.
    [31]
    Ziyu Yao, Jayavardhan Reddy Peddamail, and Huan Sun. 2019. CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning. In The World Wide Web Conference (San Francisco, CA, USA) (WWW '19). Association for Computing Machinery, New York, NY, USA, 2203--2214.
    [32]
    Qihao Zhu and Wenjie Zhang. 2021. Code Generation Based on Deep Learning: a Brief Review. arXiv:2106.08253 [cs.SE]

    Cited By

    View all
    • (2024)Uma análise do uso de ferramentas de geração de código por alunos de computaçãoAnais do IV Simpósio Brasileiro de Educação em Computação (EDUCOMP 2024)10.5753/educomp.2024.237427(63-71)Online publication date: 22-Apr-2024
    • (2024)The Current State of Generative Artificial Intelligence Tools for Accessibility in Product DevelopmentNafath10.54455/MCN26059:26Online publication date: 30-Jul-2024
    • (2024)Framework for evaluating code generation ability of large language modelsETRI Journal10.4218/etrij.2023-035746:1(106-117)Online publication date: 14-Feb-2024
    • Show More Cited By

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories
    May 2022
    815 pages
    ISBN:9781450393034
    DOI:10.1145/3524842
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GitHub copilot
    2. codex
    3. empirical evaluation
    4. program synthesis

    Qualifiers

    • Short-paper

    Funding Sources

    • Canada Research Chairs Program

    Conference

    MSR '22
    Sponsor:

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,332
    • Downloads (Last 6 weeks)55
    Reflects downloads up to 14 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Uma análise do uso de ferramentas de geração de código por alunos de computaçãoAnais do IV Simpósio Brasileiro de Educação em Computação (EDUCOMP 2024)10.5753/educomp.2024.237427(63-71)Online publication date: 22-Apr-2024
    • (2024)The Current State of Generative Artificial Intelligence Tools for Accessibility in Product DevelopmentNafath10.54455/MCN26059:26Online publication date: 30-Jul-2024
    • (2024)Framework for evaluating code generation ability of large language modelsETRI Journal10.4218/etrij.2023-035746:1(106-117)Online publication date: 14-Feb-2024
    • (2024)Cognitive Apprenticeship and Artificial Intelligence Coding AssistantsNavigating Computer Science Education in the 21st Century10.4018/979-8-3693-1066-3.ch013(261-281)Online publication date: 26-Feb-2024
    • (2024)Program Code Generation with Generative AIsAlgorithms10.3390/a1702006217:2(62)Online publication date: 31-Jan-2024
    • (2024)Multi-line AI-Assisted Code AuthoringCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663836(150-160)Online publication date: 10-Jul-2024
    • (2024)Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering PracticeProceedings of the ACM on Software Engineering10.1145/36607881:FSE(1819-1840)Online publication date: 12-Jul-2024
    • (2024)Designing Accessible Content Creation Support with Blind and Low Vision CreatorsACM SIGACCESS Accessibility and Computing10.1145/3654768.3654775(1-1)Online publication date: 26-Mar-2024
    • (2024)Navigating the Complexity of Generative AI Adoption in Software EngineeringACM Transactions on Software Engineering and Methodology10.1145/365215433:5(1-50)Online publication date: 4-Jun-2024
    • (2024)An Assessment of ML-based Sentiment Analysis for Intelligent Web FilteringProceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3652037.3652039(80-87)Online publication date: 26-Jun-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media