research-article

CCMC: Code Completion with a Memory Mechanism and a Copy Mechanism

Authors:

Li KuangAuthors Info & Claims

EASE '21: Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

Pages 129 - 138

https://doi.org/10.1145/3463274.3463332

Published: 21 June 2021 Publication History

Abstract

Code completion tools are increasingly important when developing modern software. Recently, statistical language modeling techniques have achieved great success in the code completion task. However, two major issues with these techniques severely affect the performance of neural language models (NLMs) of code completion. a) Long-range dependences are common in program source code. b) New and rare vocabulary in code is much higher than natural language. To address the challenges above, in this paper, we propose code completion with a memory mechanism and a copy mechanism (CCMC). To capture the long-range dependencies in the program source code, we employ Transformer-XL as our base model. To utilize the locally repeated terms in program source code, we apply the pointer network into our base model and design CopyMask to improve the training efficiency, which is inspired by masked multihead attention in the transformer decoder. To combine the long-range dependency modeling ability from Transformer-XL and the ability to copy the input token to output from the pointer network, we design a memory mechanism and a copy mechanism. Through our memory mechanism, our model can uniformly manage the context used by Transformer-XL and pointer network. Through our copy mechanism, our model can either generate a within-vocabulary token or copy an out-of-vocabulary (OOV) token from inputs. Experiments on a real-world dataset demonstrate the effectiveness of our CCMC on the code completion task.

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16). 265–283.

[2]

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2017. Learning to represent programs with graphs. arXiv preprint arXiv:1711.00740(2017).

[3]

Avishkar Bhoopchand, Tim Rocktäschel, Earl Barr, and Sebastian Riedel. 2016. Learning python code suggestion with a sparse pointer network. arXiv preprint arXiv:1611.08307(2016).

[4]

Pavol Bielik, Veselin Raychev, and Martin Vechev. 2016. PHOG: probabilistic model for code. In International Conference on Machine Learning. 2933–2942.

[5]

Milan Cvitkovic, Badal Singh, and Animashree Anandkumar. 2019. Open vocabulary learning on source code with a graph-structured cache. In International Conference on Machine Learning. PMLR, 1475–1485.

[6]

Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860(2019).

[7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2018).

[8]

Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao Wuen Hon. 2019. Unified Language Model Pre-training for Natural Language Understanding and Generation. (2019).

[9]

Jiatao Gu, Zhengdong Lu, Hang Li, and Victor OK Li. 2016. Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393(2016).

[10]

Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Bowen Zhou, and Yoshua Bengio. 2016. Pointing the unknown words. arXiv preprint arXiv:1603.08148(2016).

[11]

Sangmok Han, David R Wallace, and Robert C Miller. 2009. Code completion from abbreviated input. In 2009 IEEE/ACM International Conference on Automated Software Engineering. IEEE, 332–343.

Digital Library

[12]

Vincent J Hellendoorn and Premkumar Devanbu. 2017. Are deep neural networks the best choice for modeling source code?. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 763–773.

Digital Library

[13]

Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In 2012 34th International Conference on Software Engineering (ICSE). IEEE, 837–847.

Digital Library

[14]

Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.

Digital Library

[15]

Urvashi Khandelwal, He He, Peng Qi, and Dan Jurafsky. 2018. Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context. Association of Computational Linguistics (ACL) (2018).

[16]

Seohyun Kim, Jinman Zhao, Yuchi Tian, and Satish Chandra. 2020. Code Prediction by Feeding Trees to Transformers. arxiv:2003.13848 [cs.SE]

[17]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).

[18]

Jian Li, Yue Wang, Michael R Lyu, and Irwin King. 2017. Code completion with neural attention and pointer networks. arXiv preprint arXiv:1711.09573(2017).

[19]

Chang Liu, Xin Wang, Richard Shin, Joseph E Gonzalez, and Dawn Song. 2016. Neural code completion. (2016).

[20]

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. (2019).

[21]

Veselin Raychev, Pavol Bielik, and Martin Vechev. 2016. Probabilistic model for code with decision trees. ACM SIGPLAN Notices 51, 10 (2016), 731–747.

Digital Library

[22]

Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. 419–428.

Digital Library

[23]

Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368(2017).

[24]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104–3112.

[25]

Zhaopeng Tu, Zhendong Su, and Premkumar Devanbu. 2014. On the localness of software. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 269–280.

Digital Library

[26]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.

[27]

Martin White, Christopher Vendome, Mario Linares-Vásquez, and Denys Poshyvanyk. 2015. Toward deep learning software repositories. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. IEEE, 334–345.

Cited By

Kotsiantis SVerykios VTzagarakis M(2024)AI-Assisted Programming Tasks Using Code Embeddings and TransformersElectronics10.3390/electronics1304076713:4(767)Online publication date: 15-Feb-2024
https://doi.org/10.3390/electronics13040767
Zhang XLiu JLong THu H(2024)A Code Completion Approach Combining Pointer Network and Transformer-XL NetworkCollaborative Computing: Networking, Applications and Worksharing10.1007/978-3-031-54521-4_17(303-322)Online publication date: 23-Feb-2024
https://doi.org/10.1007/978-3-031-54521-4_17

Recommendations

Exploring and Improving Code Completion for Test Code
ICPC '24: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension

Code completion is an important feature in Integrated Development Environments (IDEs). These years, researchers have been making efforts for intelligent code completion. However, existing work on intelligent code completion either only considered ...
Don’t Complete It! Preventing Unhelpful Code Completion for Productive and Sustainable Neural Code Completion Systems
Currently, large pre-trained language models are widely applied in neural code completion systems. Though large code models significantly outperform their smaller counterparts, around 70% of displayed code completions from Github Copilot are not accepted ...
Principled syntactic code completion using placeholders
SLE 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Software Language Engineering

Principled syntactic code completion enables developers to change source code by inserting code templates, thus increasing developer efficiency and supporting language exploration. However, existing code completion systems are ad-hoc and neither ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

EASE '21: Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

June 2021

417 pages

ISBN:9781450390538

DOI:10.1145/3463274

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Key R&D Program of China
National Natural Science Foundation of China
Natural Science Foundation of Hunan Province

Conference

EASE 2021

EASE 2021: Evaluation and Assessment in Software Engineering

June 21 - 23, 2021

Trondheim, Norway

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
151
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kotsiantis SVerykios VTzagarakis M(2024)AI-Assisted Programming Tasks Using Code Embeddings and TransformersElectronics10.3390/electronics1304076713:4(767)Online publication date: 15-Feb-2024
https://doi.org/10.3390/electronics13040767
Zhang XLiu JLong THu H(2024)A Code Completion Approach Combining Pointer Network and Transformer-XL NetworkCollaborative Computing: Networking, Applications and Worksharing10.1007/978-3-031-54521-4_17(303-322)Online publication date: 23-Feb-2024
https://doi.org/10.1007/978-3-031-54521-4_17

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents