research-article

GitHub Copilot AI pair programmer: : Asset or Liability?

Authors:

Arghavan Moradi Dakhel,

Zhen Ming (Jack) JiangAuthors Info & Claims

Volume 203, Issue C

https://doi.org/10.1016/j.jss.2023.111734

Published: 01 September 2023 Publication History

Abstract

Automatic program synthesis is a long-lasting dream in software engineering. Recently, a promising Deep Learning (DL) based solution, called Copilot, has been proposed by OpenAI and Microsoft as an industrial product. Although some studies evaluate the correctness of Copilot solutions and report its issues, more empirical evaluations are necessary to understand how developers can benefit from it effectively. In this paper, we study the capabilities of Copilot in two different programming tasks: (i) generating (and reproducing) correct and efficient solutions for fundamental algorithmic problems, and (ii) comparing Copilot’s proposed solutions with those of human programmers on a set of programming tasks. For the former, we assess the performance and functionality of Copilot in solving selected fundamental problems in computer science, like sorting and implementing data structures. In the latter, a dataset of programming problems with human-provided solutions is used. The results show that Copilot is capable of providing solutions for almost all fundamental algorithmic problems, however, some solutions are buggy and non-reproducible. Moreover, Copilot has some difficulties in combining multiple methods to generate a solution. Comparing Copilot to humans, our results show that the correct ratio of humans’ solutions is greater than Copilot’s suggestions, while the buggy solutions generated by Copilot require less effort to be repaired. Based on our findings, if Copilot is used by expert developers in software projects, it can become an asset since its suggestions could be comparable to humans’ contributions in terms of quality. However, Copilot can become a liability if it is used by novice developers who may fail to filter its buggy or non-optimal solutions due to a lack of expertise.

Graphical abstract

Display Omitted

Highlights

•

We investigate the quality of the code Copilot generates as an AI pair programmer.

•

Copilot provides efficient solutions; but some are buggy and/or non-reproducible.

•

Its solutions are more buggy but easier to fix compared to humans’.

•

Copilot’s suggestions are comparable to humans’ contributions in terms of quality.

•

Copilot can become an asset for experts, but a liability for novice developers.

References

[1]

Ahmed, U.Z., Srivastava, N., Sindhgatta, R., Karkare, A., 2020. Characterizing the pedagogical benefits of adaptive feedback for compilation errors by novice programmers. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering Education and Training. pp. 139–150.

Abstract

Graphical abstract

Highlights

References

Cited By

Recommendations

Assessing the quality of GitHub copilot’s code generation

An empirical evaluation of GitHub copilot's code suggestions

Is GitHub copilot a substitute for human pair-programming?: an empirical study

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations