skip to main content
10.1145/3520312.3534864acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Open access

Productivity assessment of neural code completion

Published: 13 June 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Neural code synthesis has reached a point where snippet generation is accurate enough to be considered for integration into human software development workflows. Commercial products aim to increase programmers’ productivity, without being able to measure it directly. In this case study, we asked users of GitHub Copilot about its impact on their productivity, and sought to find a reflection of their perception in directly measurable user data. We find that the rate with which shown suggestions are accepted, rather than more specific metrics regarding the persistence of completions in the code over time, drives developers’ perception of productivity.

    References

    [1]
    Sven Amann, Sebastian Proksch, Sarah Nadi, and Mira Mezini. 2016. A Study of Visual Studio Usage in Practice. In IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016, Suita, Osaka, Japan, March 14-18, 2016 - Volume 1. IEEE Computer Society, 124–134. https://doi.org/10.1109/SANER.2016.39
    [2]
    Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie J. Cai, Michael Terry, Quoc V. Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. CoRR, abs/2108.07732 (2021), arXiv:2108.07732. arxiv:2108.07732
    [3]
    Gareth Ari Aye, Seohyun Kim, and Hongyu Li. 2021. Learning Autocompletion from Real-World Datasets. In 43rd IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice, ICSE (SEIP) 2021, Madrid, Spain, May 25-28, 2021. IEEE, 131–139. https://doi.org/10.1109/ICSE-SEIP52600.2021.00022
    [4]
    Moritz Beller, Vince Orgovan, Spencer Buja, and Thomas Zimmermann. 2020. Mind the gap: on the relationship between automatically measured and self-reported productivity. IEEE Software, 38, 5 (2020), 24–31.
    [5]
    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. CoRR, abs/2107.03374 (2021), arXiv:2107.03374. arxiv:2107.03374
    [6]
    Nicole Forsgren, Margaret-Anne Storey, Chandra Maddila, Thomas Zimmermann, Brian Houck, and Jenna Butler. 2021. The SPACE of Developer Productivity: There’s more to it than you think. Queue, 19, 1 (2021), 20–48.
    [7]
    Vincent J. Hellendoorn, Sebastian Proksch, Harald C. Gall, and Alberto Bacchelli. 2019. When code completion fails: a case study on real-world completions. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE / ACM, 960–970. https://doi.org/10.1109/ICSE.2019.00101
    [8]
    Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, and Jacob Steinhardt. 2021. Measuring Coding Challenge Competence With APPS. CoRR, abs/2105.09938 (2021), arXiv:2105.09938. arxiv:2105.09938
    [9]
    Sumith Kulal, Panupong Pasupat, Kartik Chandra, Mina Lee, Oded Padon, Alex Aiken, and Percy Liang. 2019. SPoC: Search-based Pseudocode to Code. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 11883–11894. https://proceedings.neurips.cc/paper/2019/hash/7298332f04ac004a0ca44cc69ecf6f6b-Abstract.html
    [10]
    Abigail See, Stephen Roller, Douwe Kiela, and Jason Weston. 2019. What makes a good conversation? How controllable attributes affect human judgments. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 1702–1723. https://doi.org/10.18653/v1/n19-1170
    [11]
    Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. IntelliCode compose: code generation using transformer. In ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 1433–1443. https://doi.org/10.1145/3368089.3417058
    [12]
    Alexey Svyatkovskiy, Sebastian Lee, Anna Hadjitofi, Maik Riechert, Juliana Vicente Franco, and Miltiadis Allamanis. 2021. Fast and Memory-Efficient Neural Code Completion. In 18th IEEE/ACM International Conference on Mining Software Repositories, MSR 2021, Madrid, Spain, May 17-19, 2021. IEEE, 329–340. https://doi.org/10.1109/MSR52588.2021.00045
    [13]
    Priyan Vaithilingam, Tianyi Zhang, and Elena Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In CHI ’22 Late-Breaking Work: Proceedings of the 2022 Conference on Human Factors in Computing Systems.
    [14]
    Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In CHI Conference on Human Factors in Computing Systems Extended Abstracts (CHI EA ’22). Association for Computing Machinery, New York, NY, USA. Article 332, 7 pages. isbn:9781450391566 https://doi.org/10.1145/3491101.3519665
    [15]
    Chris van der Lee, Albert Gatt, Emiel van Miltenburg, Sander Wubben, and Emiel Krahmer. 2019. Best practices for the human evaluation of automatically generated text. In Proceedings of the 12th International Conference on Natural Language Generation, INLG 2019, Tokyo, Japan, October 29 - November 1, 2019, Kees van Deemter, Chenghua Lin, and Hiroya Takamura (Eds.). Association for Computational Linguistics, 355–368. https://doi.org/10.18653/v1/W19-8643
    [16]
    Justin D. Weisz, Michael J. Muller, Stephanie Houde, John T. Richards, Steven I. Ross, Fernando Martinez, Mayank Agarwal, and Kartik Talamadupula. 2021. Perfection Not Required? Human-AI Partnerships in Code Translation. In IUI ’21: 26th International Conference on Intelligent User Interfaces, College Station, TX, USA, April 13-17, 2021, Tracy Hammond, Katrien Verbert, Dennis Parra, Bart P. Knijnenburg, John O’Donovan, and Paul Teale (Eds.). ACM, 402–412. https://doi.org/10.1145/3397481.3450656
    [17]
    Svante Wold, Michael Sjöström, and Lennart Eriksson. 2001. PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58, 2 (2001), 109–130. issn:0169-7439 https://doi.org/10.1016/S0169-7439(01)00155-1 PLS Methods
    [18]
    Frank F. Xu, Bogdan Vasilescu, and Graham Neubig. 2021. In-IDE Code Generation from Natural Language: Promise and Challenges. CoRR, abs/2101.11149 (2021), arXiv:2101.11149. arxiv:2101.11149
    [19]
    Wen Zhou, Seohyun Kim, Vijayaraghavan Murali, and Gareth Ari Aye. 2021. Improving Code Autocompletion with Transfer Learning. CoRR, abs/2105.05991 (2021), arXiv:2105.05991. arxiv:2105.05991

    Cited By

    View all
    • (2024)Non-Expert Programmers in the Generative AI FutureProceedings of the 3rd Annual Meeting of the Symposium on Human-Computer Interaction for Work10.1145/3663384.3663393(1-19)Online publication date: 25-Jun-2024
    • (2024)Significant Productivity Gains through Programming with Large Language ModelsProceedings of the ACM on Human-Computer Interaction10.1145/36611458:EICS(1-29)Online publication date: 17-Jun-2024
    • (2024)“It would work for me too”: How Online Communities Shape Software Developers’ Trust in AI-Powered Code Generation ToolsACM Transactions on Interactive Intelligent Systems10.1145/365199014:2(1-39)Online publication date: 9-Mar-2024
    • Show More Cited By

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MAPS 2022: Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming
    June 2022
    79 pages
    ISBN:9781450392730
    DOI:10.1145/3520312
    This work is licensed under a Creative Commons Attribution 4.0 International License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. code completion
    2. code synthesis
    3. neural networks
    4. productivity

    Qualifiers

    • Research-article

    Conference

    MAPS '22
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4,312
    • Downloads (Last 6 weeks)379

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Non-Expert Programmers in the Generative AI FutureProceedings of the 3rd Annual Meeting of the Symposium on Human-Computer Interaction for Work10.1145/3663384.3663393(1-19)Online publication date: 25-Jun-2024
    • (2024)Significant Productivity Gains through Programming with Large Language ModelsProceedings of the ACM on Human-Computer Interaction10.1145/36611458:EICS(1-29)Online publication date: 17-Jun-2024
    • (2024)“It would work for me too”: How Online Communities Shape Software Developers’ Trust in AI-Powered Code Generation ToolsACM Transactions on Interactive Intelligent Systems10.1145/365199014:2(1-39)Online publication date: 9-Mar-2024
    • (2024)Using GitHub Copilot for Test Generation in Python: An Empirical StudyProceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024)10.1145/3644032.3644443(45-55)Online publication date: 15-Apr-2024
    • (2024)Quality Assessment of ChatGPT Generated Code and their Use by DevelopersProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3645071(152-156)Online publication date: 15-Apr-2024
    • (2024)Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based StudyProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644918(571-583)Online publication date: 15-Apr-2024
    • (2024)Understanding Regular Expression Denial of Service (ReDoS): Insights from LLM-Generated Regexes and Developer ForumsProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644424(190-201)Online publication date: 15-Apr-2024
    • (2024)"You're on a bicycle with a little motor": Benefits and Challenges of Using AI Code AssistantsProceedings of the 2024 IEEE/ACM 17th International Conference on Cooperative and Human Aspects of Software Engineering10.1145/3641822.3641882(144-152)Online publication date: 14-Apr-2024
    • (2024)How much SPACE do metrics have in GenAI assisted software development?Proceedings of the 17th Innovations in Software Engineering Conference10.1145/3641399.3641419(1-5)Online publication date: 22-Feb-2024
    • (2024)Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI CollaborationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645198(370-384)Online publication date: 18-Mar-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media