research-article

Open access

Assessing the quality of GitHub copilot’s code generation

Authors:

Burak Yetistiren,

Eray TuzunAuthors Info & Claims

PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering

Pages 62 - 71

https://doi.org/10.1145/3558489.3559072

Published: 09 November 2022 Publication History

Abstract

The introduction of GitHub’s new code generation tool, GitHub Copilot, seems to be the first well-established instance of an AI pair-programmer. GitHub Copilot has access to a large number of open-source projects, enabling it to utilize more extensive code in various programming languages than other code generation tools. Although the initial and informal assessments are promising, a systematic evaluation is needed to explore the limits and benefits of GitHub Copilot. The main objective of this study is to assess the quality of generated code provided by GitHub Copilot. We also aim to evaluate the impact of the quality and variety of input parameters fed to GitHub Copilot. To achieve this aim, we created an experimental setup for evaluating the generated code in terms of validity, correctness, and efficiency. Our results suggest that GitHub Copilot was able to generate valid code with a 91.5% success rate. In terms of code correctness, out of 164 problems, 47 (28.7%) were correctly, while 84 (51.2%) were partially correctly, and 33 (20.1%) were incorrectly generated. Our empirical analysis shows that GitHub Copilot is a promising tool based on the results we obtained, however further and more comprehensive assessment is needed in the future.

References

[1]

Matt Asay. 2021. GitHub copilot isn’t changing the future. https://www.infoworld.com/article/3625517/github-copilot-isnt-changing-the-future.html

[2]

Scott Carey. 2021. Developers react to github copilot. https://www.infoworld.com/article/3624688/developers-react-to-github-copilot.html

[3]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. https://doi.org/10.48550/ARXIV.2107.03374

[4]

Neil A. Ernst and Gabriele Bavota. 2022. AI-Driven Development Is Here: Should You Worry? IEEE Software, 39, 2 (2022), 106–110. https://doi.org/10.1109/MS.2021.3133805

[5]

Shirley Anugrah Hayati, Raphael Olivier, Pravalika Avvaru, Pengcheng Yin, Anthony Tomasic, and Graham Neubig. 2018. Retrieval-Based Neural Code Generation. https://doi.org/10.48550/ARXIV.1808.10025

[6]

JetBrains. 2022. GitHub copilot - intellij IDES plugin: Marketplace. https://plugins.jetbrains.com/plugin/17718-github-copilot

[7]

Renato Losio. 2021. GitHub previews copilot, an openai-powered coding assistant. https://www.infoq.com/news/2021/07/github-copilot-pair-programmming/

[8]

Chen Lyu, Ruyun Wang, Hongyu Zhang, Hanwen Zhang, and Songlin Hu. 2021. Embedding API dependency graph for neural code generation. Empirical Software Engineering, 26, 4 (2021), 21 Apr, 61. issn:1573-7616 https://doi.org/10.1007/s10664-021-09968-2

Digital Library

[9]

Nhan Nguyen and Sarah Nadi. 2022. An Empirical Evaluation of GitHub Copilot’s Code Suggestions. In 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR). 1–5. https://doi.org/10.1145/3524842.3528470

Digital Library

[10]

Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2021. Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions. https://doi.org/10.48550/ARXIV.2108.09293

[11]

Dominik Sobania, Martin Briesch, and Franz Rothlauf. 2022. Choose Your Programming Copilot: A Comparison of the Program Synthesis Performance of Github Copilot and Genetic Programming. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’22). Association for Computing Machinery, New York, NY, USA. 1019–1027. isbn:9781450392372 https://doi.org/10.1145/3512290.3528700

Digital Library

[12]

Zeyu Sun, Qihao Zhu, Yingfei Xiong, Yican Sun, Lili Mou, and Lu Zhang. 2020. TreeGen: A Tree-Based Transformer Architecture for Code Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 05 (2020), Apr., 8984–8991. https://doi.org/10.1609/aaai.v34i05.6430

[13]

Darryl K. Taft. 2021. GitHub copilot: A powerful, controversial autocomplete for developers. https://thenewstack.io/github-copilot-a-powerful-controversial-autocomplete-for-developers/

[14]

Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (CHI EA ’22). Association for Computing Machinery, New York, NY, USA. Article 332, 7 pages. isbn:9781450391566 https://doi.org/10.1145/3491101.3519665

Digital Library

[15]

Frank F. Xu, Bogdan Vasilescu, and Graham Neubig. 2022. In-IDE Code Generation from Natural Language: Promise and Challenges. ACM Trans. Softw. Eng. Methodol., 31, 2 (2022), Article 29, mar, 47 pages. issn:1049-331X https://doi.org/10.1145/3487569

Digital Library

[16]

Maosheng Zhong, Gen Liu, Hongwei Li, Jiangling Kuang, Jinshan Zeng, and Mingwen Wang. 2022. CodeGen-Test: An Automatic Code Generation Model Integrating Program Test Information. https://doi.org/10.48550/ARXIV.2202.07612

Cited By

Abu Doush I(2024)The Current State of Generative Artificial Intelligence Tools for Accessibility in Product DevelopmentNafath10.54455/MCN26059:26Online publication date: 30-Jul-2024
https://doi.org/10.54455/MCN2605
Yeo SMa YKim SJun HKim T(2024)Framework for evaluating code generation ability of large language modelsETRI Journal10.4218/etrij.2023-035746:1(106-117)Online publication date: 14-Feb-2024
https://doi.org/10.4218/etrij.2023-0357
Idrisov BSchlippe T(2024)Program Code Generation with Generative AIsAlgorithms10.3390/a1702006217:2(62)Online publication date: 31-Jan-2024
https://doi.org/10.3390/a17020062
Show More Cited By

Index Terms

Assessing the quality of GitHub copilot’s code generation

Index terms have been assigned to the content through auto-classification.

Recommendations

GitHub Copilot AI pair programmer: Asset or Liability?
Abstract
Automatic program synthesis is a long-lasting dream in software engineering. Recently, a promising Deep Learning (DL) based solution, called Copilot, has been proposed by OpenAI and Microsoft as an industrial product. Although some ...
Graphical abstract

Display Omitted
Highlights
- We investigate the quality of the code Copilot generates as an AI pair programmer.
Assessing AI-Based Code Assistants in Method Generation Tasks
ICSE-Companion '24: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings

AI-based code assistants are increasingly popular as a means to enhance productivity and improve code quality. This study compares four AI-based code assistants, GitHub Copilot, Tabnine, ChatGPT, and Google Bard, in method generation tasks, assessing ...
How Readable is Model-generated Code? Examining Readability and Visual Inspection of GitHub Copilot
ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

Background: Recent advancements in large language models have motivated the practical use of such models in code generation and program synthesis. However, little is known about the effects of such tools on code readability and visual attention in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering

November 2022

101 pages

ISBN:9781450398602

DOI:10.1145/3558489

General Chair:
Shane McIntosh
University of Waterloo, Canada
,
Program Chairs:
Weiyi Shang
Concordia University, Canada
,
Gema Rodriguez Perez
University of British Columbia, Canada

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PROMISE '22

Sponsor:

PROMISE '22: 18th International Conference on Predictive Models and Data Analytics in Software Engineering

November 17, 2022

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 98 of 213 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
4,143
Total Downloads

Downloads (Last 12 months)3,299
Downloads (Last 6 weeks)257

Reflects downloads up to 14 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Abu Doush I(2024)The Current State of Generative Artificial Intelligence Tools for Accessibility in Product DevelopmentNafath10.54455/MCN26059:26Online publication date: 30-Jul-2024
https://doi.org/10.54455/MCN2605
Yeo SMa YKim SJun HKim T(2024)Framework for evaluating code generation ability of large language modelsETRI Journal10.4218/etrij.2023-035746:1(106-117)Online publication date: 14-Feb-2024
https://doi.org/10.4218/etrij.2023-0357
Idrisov BSchlippe T(2024)Program Code Generation with Generative AIsAlgorithms10.3390/a1702006217:2(62)Online publication date: 31-Jan-2024
https://doi.org/10.3390/a17020062
Eskandani NSalvaneschi GAdams BZimmermann TOzkaya ILin DZhang J(2024)Towards AI for Software SystemsProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3664767(79-84)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3664646.3664767
Weber TBrandmaier MSchmidt AMayer S(2024)Significant Productivity Gains through Programming with Large Language ModelsProceedings of the ACM on Human-Computer Interaction10.1145/36611458:EICS(1-29)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3661145
Kou BChen SWang ZMa LZhang T(2024)Do Large Language Models Pay Similar Attention Like Human Programmers When Generating Code?Proceedings of the ACM on Software Engineering10.1145/36608071:FSE(2261-2284)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660807
Hassan MSalvador JSantu SRahman A(2024)State Reconciliation Defects in Infrastructure as CodeProceedings of the ACM on Software Engineering10.1145/36607901:FSE(1865-1888)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660790
Santos PFigueiredo ANuno Moura PDiirr BAlvim ASantos R(2024)How Do Information Technology Professionals Use Generative Artificial Intelligence?Proceedings of the 20th Brazilian Symposium on Information Systems10.1145/3658321.3658367(1-9)Online publication date: 23-May-2024
https://doi.org/10.1145/3658321.3658367
Paspallis NPanayiotou P(2024)An Assessment of ML-based Sentiment Analysis for Intelligent Web FilteringProceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3652037.3652039(80-87)Online publication date: 26-Jun-2024
https://dl.acm.org/doi/10.1145/3652037.3652039
Gardella NPettit RRiggs SMonga MLonati VBarendsen ESheard JPaterson J(2024)Performance, Workload, Emotion, and Self-Efficacy of Novice Programmers Using AI Code GenerationProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653615(290-296)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653615
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents