skip to main content
10.1145/3510457.3513061acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
short-paper

Improving code autocompletion with transfer learning

Published: 17 October 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Software language models have achieved promising results predicting code completion usages, and several industry studies have described successful IDE integration. Recently, accuracy in autocompletion prediction improved 12.8%[2] from training on a real-world dataset collected from programmers' IDE activities. But what if the number of examples of IDE autocompletion in the target programming language is inadequate for model training? In this paper, we highlight practical reasons for this inadequacy, and make a call to action in using transfer learning to overcome the issue.

    References

    [1]
    Gareth Ari Aye and Gail E. Kaiser. 2020. Sequence Model Design for Code Completion in the Modern IDE. arXiv:2004.05249 [cs.SE]
    [2]
    Gareth Ari Aye, Seohyun Kim, and Hongyu Li. 2020. Learning Autocompletion from Real-World Datasets. arXiv:2011.04542 [cs.SE]
    [3]
    Marc Brockschmidt, Miltiadis Allamanis, Alexander L Gaunt, and Oleksandr Polozov. 2018. Generative code modeling with graphs. arXiv preprint arXiv:1805.08490 (2018).
    [4]
    Marcel Bruch, Martin Monperrus, and Mira Mezini. 2009. Learning from Examples to Improve Code Completion Systems. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (Amsterdam, The Netherlands) (ESEC/FSE '09). ACM, New York, NY, USA, 213--222.
    [5]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805 (2018).
    [6]
    Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. arXiv:2002.08155 [cs.CL]
    [7]
    Vincent J. Hellendoorn, Sebastian Proksch, Harald C. Gall, and Alberto Bacchelli. 2019. When Code Completion Fails: A Case Study on Real-World Completions (ICSE '19). IEEE Press, 960--970.
    [8]
    Seohyun Kim, Jinman Zhao, Yuchi Tian, and Satish Chandra. 2020. Code Prediction by Feeding Trees to Transformers. arXiv:2003.13848 [cs.SE]
    [9]
    Jian Li, Yue Wang, Michael R. Lyu, and Irwin King. 2018. Code Completion with Neural Attention and Pointer Networks. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (Jul 2018).
    [10]
    Antonio Mastropaolo, Simone Scalabrino, Nathan Cooper, David Nader Palacio, Denys Poshyvanyk, Rocco Oliveto, and Gabriele Bavota. 2021. Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks. arXiv:2102.02017 [cs.SE]
    [11]
    Gail C. Murphy, Mik Kersten, and Leah Findlater. 2006. How Are Java Software Developers Using the Eclipse IDE? IEEE Softw. 23, 4 (July 2006), 76--83.
    [12]
    Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. (2019).
    [13]
    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1--67. http://jmlr.org/papers/v21/20-074.html

    Cited By

    View all
    • (2024)Multi-line AI-Assisted Code AuthoringCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663836(150-160)Online publication date: 10-Jul-2024
    • (2024)An Analysis of the Costs and Benefits of Autocomplete in IDEsProceedings of the ACM on Software Engineering10.1145/36607651:FSE(1284-1306)Online publication date: 12-Jul-2024
    • (2024)In-IDE Human-AI Experience in the Era of Large Language Models; A Literature ReviewProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648463(95-100)Online publication date: 20-Apr-2024
    • Show More Cited By

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICSE-SEIP '22: Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice
    May 2022
    371 pages
    ISBN:9781450392266
    DOI:10.1145/3510457
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. code completion
    2. integrated development environments
    3. machine learning
    4. naturalness
    5. neural networks
    6. software language models
    7. software tools

    Qualifiers

    • Short-paper

    Conference

    ICSE '22
    Sponsor:

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)32
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 14 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multi-line AI-Assisted Code AuthoringCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663836(150-160)Online publication date: 10-Jul-2024
    • (2024)An Analysis of the Costs and Benefits of Autocomplete in IDEsProceedings of the ACM on Software Engineering10.1145/36607651:FSE(1284-1306)Online publication date: 12-Jul-2024
    • (2024)In-IDE Human-AI Experience in the Era of Large Language Models; A Literature ReviewProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648463(95-100)Online publication date: 20-Apr-2024
    • (2024)AI-Assisted Code Authoring at Scale: Fine-Tuning, Deploying, and Mixed Methods EvaluationProceedings of the ACM on Software Engineering10.1145/36437741:FSE(1066-1085)Online publication date: 12-Jul-2024
    • (2023)Large Language Models for Software Engineering: Survey and Open Problems2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE)10.1109/ICSE-FoSE59343.2023.00008(31-53)Online publication date: 14-May-2023
    • (2023)Completing Function Documentation Comments Using Structural InformationEmpirical Software Engineering10.1007/s10664-022-10284-628:4Online publication date: 23-May-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media