short-paper

Improving code autocompletion with transfer learning

Authors:

Wen Zhou,

Seohyun Kim,

Vijayaraghavan Murali,

Gareth Ari AyeAuthors Info & Claims

ICSE-SEIP '22: Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice

Pages 161 - 162

https://doi.org/10.1145/3510457.3513061

Published: 17 October 2022 Publication History

Get Access

Abstract

Software language models have achieved promising results predicting code completion usages, and several industry studies have described successful IDE integration. Recently, accuracy in autocompletion prediction improved 12.8%[2] from training on a real-world dataset collected from programmers' IDE activities. But what if the number of examples of IDE autocompletion in the target programming language is inadequate for model training? In this paper, we highlight practical reasons for this inadequacy, and make a call to action in using transfer learning to overcome the issue.

References

[1]

Gareth Ari Aye and Gail E. Kaiser. 2020. Sequence Model Design for Code Completion in the Modern IDE. arXiv:2004.05249 [cs.SE]

Google Scholar

[2]

Gareth Ari Aye, Seohyun Kim, and Hongyu Li. 2020. Learning Autocompletion from Real-World Datasets. arXiv:2011.04542 [cs.SE]

Google Scholar

[3]

Marc Brockschmidt, Miltiadis Allamanis, Alexander L Gaunt, and Oleksandr Polozov. 2018. Generative code modeling with graphs. arXiv preprint arXiv:1805.08490 (2018).

Google Scholar

[4]

Marcel Bruch, Martin Monperrus, and Mira Mezini. 2009. Learning from Examples to Improve Code Completion Systems. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (Amsterdam, The Netherlands) (ESEC/FSE '09). ACM, New York, NY, USA, 213--222.

Digital Library

Google Scholar

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805 (2018).

Google Scholar

[6]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. arXiv:2002.08155 [cs.CL]

Google Scholar

[7]

Vincent J. Hellendoorn, Sebastian Proksch, Harald C. Gall, and Alberto Bacchelli. 2019. When Code Completion Fails: A Case Study on Real-World Completions (ICSE '19). IEEE Press, 960--970.

Digital Library

Google Scholar

[8]

Seohyun Kim, Jinman Zhao, Yuchi Tian, and Satish Chandra. 2020. Code Prediction by Feeding Trees to Transformers. arXiv:2003.13848 [cs.SE]

Google Scholar

[9]

Jian Li, Yue Wang, Michael R. Lyu, and Irwin King. 2018. Code Completion with Neural Attention and Pointer Networks. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (Jul 2018).

Crossref

Google Scholar

[10]

Antonio Mastropaolo, Simone Scalabrino, Nathan Cooper, David Nader Palacio, Denys Poshyvanyk, Rocco Oliveto, and Gabriele Bavota. 2021. Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks. arXiv:2102.02017 [cs.SE]

Google Scholar

[11]

Gail C. Murphy, Mik Kersten, and Leah Findlater. 2006. How Are Java Software Developers Using the Eclipse IDE? IEEE Softw. 23, 4 (July 2006), 76--83.

Digital Library

Google Scholar

[12]

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. (2019).

Google Scholar

[13]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1--67. http://jmlr.org/papers/v21/20-074.html

Google Scholar

Cited By

View all

Dunay OCheng DTait AThakkar PRigby PChiu AAhmad IGanesan AMaddila CMurali VTayyebi ANagappan Nd'Amorim M(2024)Multi-line AI-Assisted Code AuthoringCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663836(150-160)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663836
Jiang SCoblenz M(2024)An Analysis of the Costs and Benefits of Autocomplete in IDEsProceedings of the ACM on Software Engineering10.1145/36607651:FSE(1284-1306)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660765
Sergeyuk ATitov SIzadi MDig DBryksin TGolubev YBezzubov A(2024)In-IDE Human-AI Experience in the Era of Large Language Models; A Literature ReviewProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648463(95-100)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643796.3648463
Show More Cited By

Recommendations

Learning autocompletion from real-world datasets
ICSE-SEIP '21: Proceedings of the 43rd International Conference on Software Engineering: Software Engineering in Practice

Code completion is a popular software development tool integrated into all major IDEs. Many neural language models have achieved promising results in completion suggestion prediction on synthetic benchmarks. However, a recent study When Code Completion ...
Toward deep learning software repositories
MSR '15: Proceedings of the 12th Working Conference on Mining Software Repositories

Deep learning subsumes algorithms that automatically learn compositional representations. The ability of these models to generalize well has ushered in tremendous advances in many fields such as natural language processing (NLP). Recent research in the ...
Is your code harmful too? Understanding harmful code through transfer learning
SBQS '23: Proceedings of the XXII Brazilian Symposium on Software Quality

Code smells are indicators of poor design implementation and decision-making that can potentially harm the quality of software. Therefore, detecting these smells is crucial to prevent such issues. Some studies aim to comprehend the impact of code smells ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ICSE-SEIP '22: Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice

May 2022

371 pages

ISBN:9781450392266

DOI:10.1145/3510457

Conference Chairs:
Mark Harman
Facebook, Inc & University College London
,
Heather Miller
Carnegie Mellon University

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

ICSE '22

Sponsor:

SIGSOFT

ICSE '22: 44th International Conference on Software Engineering

May 21 - 29, 2022

Pennsylvania, Pittsburgh

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
88
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)1

Reflects downloads up to 14 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Dunay OCheng DTait AThakkar PRigby PChiu AAhmad IGanesan AMaddila CMurali VTayyebi ANagappan Nd'Amorim M(2024)Multi-line AI-Assisted Code AuthoringCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663836(150-160)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663836
Jiang SCoblenz M(2024)An Analysis of the Costs and Benefits of Autocomplete in IDEsProceedings of the ACM on Software Engineering10.1145/36607651:FSE(1284-1306)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660765
Sergeyuk ATitov SIzadi MDig DBryksin TGolubev YBezzubov A(2024)In-IDE Human-AI Experience in the Era of Large Language Models; A Literature ReviewProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648463(95-100)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643796.3648463
Murali VMaddila CAhmad IBolin MCheng DGhorbani NFernandez RNagappan NRigby P(2024)AI-Assisted Code Authoring at Scale: Fine-Tuning, Deploying, and Mixed Methods EvaluationProceedings of the ACM on Software Engineering10.1145/36437741:FSE(1066-1085)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643774
Fan AGokkaya BHarman MLyubarskiy MSengupta SYoo SZhang J(2023)Large Language Models for Software Engineering: Survey and Open Problems2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE)10.1109/ICSE-FoSE59343.2023.00008(31-53)Online publication date: 14-May-2023
https://doi.org/10.1109/ICSE-FoSE59343.2023.00008
Ciurumelea AAlexandru CGall HProksch S(2023)Completing Function Documentation Comments Using Structural InformationEmpirical Software Engineering10.1007/s10664-022-10284-628:4Online publication date: 23-May-2023
https://dl.acm.org/doi/10.1007/s10664-022-10284-6

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

Learning autocompletion from real-world datasets

Toward deep learning software repositories

Is your code harmful too? Understanding harmful code through transfer learning