skip to main content
10.1145/3512290.3528700acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Choose your programming copilot: a comparison of the program synthesis performance of github copilot and genetic programming

Published: 08 July 2022 Publication History
  • Get Citation Alerts
  • Abstract

    GitHub Copilot, an extension for the Visual Studio Code development environment powered by the large-scale language model Codex, makes automatic program synthesis available for software developers. This model has been extensively studied in the field of deep learning, however, a comparison to genetic programming, which is also known for its performance in automatic program synthesis, has not yet been carried out. In this paper, we evaluate GitHub Copilot on standard program synthesis benchmark problems and compare the achieved results with those from the genetic programming literature. In addition, we discuss the performance of both approaches. We find that the performance of the two approaches on the benchmark problems is quite similar, however, in comparison to GitHub Copilot, the program synthesis approaches based on genetic programming are not yet mature enough to support programmers in practical software development. Genetic programming usually needs a huge amount of expensive hand-labeled training cases and takes too much time to generate solutions. Furthermore, source code generated by genetic programming approaches is often bloated and difficult to understand. For future work on program synthesis with genetic programming, we suggest researchers to focus on improving the execution time, readability, and usability.

    References

    [1]
    Francisco Baeta, João Correia, Tiago Martins, and Penousal Machado. 2021. TensorGP-Genetic Programming Engine in TensorFlow. In EvoApplications. Springer, 763--778.
    [2]
    Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1409.0473
    [3]
    Markus F Brameier and Wolfgang Banzhaf. 2007. Linear genetic programming. Springer Science & Business Media.
    [4]
    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877--1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
    [5]
    Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel. 2021. Extracting Training Data from Large Language Models. In 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 2633--2650. https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting
    [6]
    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
    [7]
    Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1724--1734.
    [8]
    Colin Clement, Dawn Drain, Jonathan Timcheck, Alexey Svyatkovskiy, and Neel Sundaresan. 2020. PyMT5: Multi-mode Translation of Natural Language and Python Code with Transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9052--9065.
    [9]
    Nichael Lynn Cramer. 1985. A representation for the adaptive generation of simple sequential programs. In proceedings of an International Conference on Genetic Algorithms and the Applications. 183--187.
    [10]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT. 4171--4186.
    [11]
    Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. 2020. CodeBERT: A PreTrained Model for Programming and Natural Languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 1536--1547.
    [12]
    Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2017. A grammar design pattern for arbitrary program synthesis problems in genetic programming. In European Conference on Genetic Programming. Springer, 262--277.
    [13]
    Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2018. Extending program synthesis grammars for grammar-guided genetic programming. In International Conference on Parallel Problem Solving from Nature. Springer, 197--208.
    [14]
    Sumit Gulwani. 2010. Dimensions in program synthesis. In Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming. 13--24.
    [15]
    Thomas Helmuth and Amr Abdelhady. 2020. Benchmarking parent selection for program synthesis by genetic programming. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion. 237--238.
    [16]
    Thomas Helmuth and Peter Kelly. 2021. PSB2: the second program synthesis benchmark suite. In Proceedings of the Genetic and Evolutionary Computation Conference. 785--794.
    [17]
    Thomas Helmuth, Nicholas Freitag McPhee, Edward Pantridge, and Lee Spector. 2017. Improving generalization of evolved programs through automatic simplification. In Proceedings of the Genetic and Evolutionary Computation Conference. 937--944.
    [18]
    Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2018. Program synthesis using uniform mutation by addition and deletion. In Proceedings of the Genetic and Evolutionary Computation Conference. 1127--1134.
    [19]
    Thomas Helmuth and Lee Spector. 2015. General program synthesis benchmark suite. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. 1039--1046.
    [20]
    Thomas Helmuth and Lee Spector. 2021. Problem-solving benefits of down-sampled lexicase selection. arXiv preprint arXiv:2106.06085 (2021).
    [21]
    Erik Hemberg, Jonathan Kelly, and Una-May O'Reilly. 2019. On domain knowledge and novelty to improve program synthesis performance with grammatical evolution. In Proceedings of the Genetic and Evolutionary Computation Conference. 1039--1046.
    [22]
    Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
    [23]
    Jeremy Howard and Sebastian Ruder. 2018. Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 328--339.
    [24]
    John R Koza. 1992. Genetic programming: on the programming of computers by means of natural selection. Vol. 1. MIT press.
    [25]
    Alexander Lalejini and Charles Ofria. 2019. Tag-accessed memory for genetic programming. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. 346--347.
    [26]
    Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1412--1421.
    [27]
    Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018).
    [28]
    Md Omar Faruk Rokon, Risul Islam, Ahmad Darki, Evangelos E. Papalexakis, and Michalis Faloutsos. 2020. SourceFinder: Finding Malware Source-Code from Publicly Available Repositories in GitHub. In 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020). USENIX Association, San Sebastian, 149--163. https://www.usenix.org/conference/raid2020/presentation/omar
    [29]
    Dominik Sobania. 2021. On the generalizability of programs synthesized by grammar-guided genetic programming. In European Conference on Genetic Programming (Part of EvoStar). Springer, 130--145.
    [30]
    Dominik Sobania and Franz Rothlauf. 2019. Teaching GP to program like a human software developer: using perplexity pressure to guide program synthesis approaches. In Proceedings of the Genetic and Evolutionary Computation Conference. 1065--1074.
    [31]
    Dominik Sobania and Franz Rothlauf. 2020. Challenges of program synthesis with grammatical evolution. In European Conference on Genetic Programming (Part of EvoStar). Springer, 211--227.
    [32]
    Dominik Sobania and Franz Rothlauf. 2021. A generalizability measure for program synthesis with genetic programming. In Proceedings of the Genetic and Evolutionary Computation Conference. 822--829.
    [33]
    Dominik Sobania, Dirk Schweim, and Franz Rothlauf. 2022. A Comprehensive Survey on Program Synthesis with Evolutionary Algorithms. IEEE Transactions on Evolutionary Computation (2022).
    [34]
    Lee Spector, Jon Klein, and Maarten Keijzer. 2005. The push3 execution stack and the evolution of control. In Proceedings of the 7th annual conference on Genetic and evolutionary computation. 1689--1696.
    [35]
    Lee Spector and Alan Robinson. 2002. Genetic programming and autoconstructive evolution with the push programming language. Genetic Programming and Evolvable Machines 3, 1 (2002), 7--40.
    [36]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.

    Cited By

    View all
    • (2024)Enhancing Program Synthesis with Large Language Models Using Many-Objective Grammar-Guided Genetic ProgrammingAlgorithms10.3390/a1707028717:7(287)Online publication date: 1-Jul-2024
    • (2024)Significant Productivity Gains through Programming with Large Language ModelsProceedings of the ACM on Human-Computer Interaction10.1145/36611458:EICS(1-29)Online publication date: 17-Jun-2024
    • (2024)Navigating the Complexity of Generative AI Adoption in Software EngineeringACM Transactions on Software Engineering and Methodology10.1145/365215433:5(1-50)Online publication date: 4-Jun-2024
    • Show More Cited By

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference
    July 2022
    1472 pages
    ISBN:9781450392372
    DOI:10.1145/3512290
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 July 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GitHub copilot
    2. codex
    3. genetic programming
    4. large-scale language models
    5. program synthesis
    6. software engineering

    Qualifiers

    • Research-article

    Conference

    GECCO '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)467
    • Downloads (Last 6 weeks)20
    Reflects downloads up to 14 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enhancing Program Synthesis with Large Language Models Using Many-Objective Grammar-Guided Genetic ProgrammingAlgorithms10.3390/a1707028717:7(287)Online publication date: 1-Jul-2024
    • (2024)Significant Productivity Gains through Programming with Large Language ModelsProceedings of the ACM on Human-Computer Interaction10.1145/36611458:EICS(1-29)Online publication date: 17-Jun-2024
    • (2024)Navigating the Complexity of Generative AI Adoption in Software EngineeringACM Transactions on Software Engineering and Methodology10.1145/365215433:5(1-50)Online publication date: 4-Jun-2024
    • (2024)Jamplate: Exploring LLM-Enhanced Templates for Idea ReflectionProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645196(907-921)Online publication date: 18-Mar-2024
    • (2024)Facilitating Function Application in Code Building Genetic ProgrammingProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654068(887-895)Online publication date: 14-Jul-2024
    • (2024)No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPTIEEE Transactions on Software Engineering10.1109/TSE.2024.339249950:6(1548-1584)Online publication date: Jun-2024
    • (2024)A Comparison of the Effectiveness of ChatGPT and Co-Pilot for Generating Quality Python Code Solutions2024 IEEE International Conference on Software Analysis, Evolution and Reengineering - Companion (SANER-C)10.1109/SANER-C62648.2024.00018(93-101)Online publication date: 12-Mar-2024
    • (2024)Methodology for Code Synthesis Evaluation of LLMs Presented by a Case Study of ChatGPT and CopilotIEEE Access10.1109/ACCESS.2024.340385812(72303-72316)Online publication date: 2024
    • (2024)Circuit2Graph: Circuits With Graph Neural NetworksIEEE Access10.1109/ACCESS.2024.338586212(51818-51827)Online publication date: 2024
    • (2024)Equation-based and data-driven modeling: Open-source software current state and future directionsComputers & Chemical Engineering10.1016/j.compchemeng.2023.108521181(108521)Online publication date: Feb-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media