skip to main content
10.1145/3387940.3391489acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Open access

Improving Code Recommendations by Combining Neural and Classical Machine Learning Approaches

Published: 25 September 2020 Publication History

Abstract

Code recommendation systems for software engineering are designed to accelerate the development of large software projects. A classical example is code completion or next token prediction offered by modern integrated development environments. A particular challenging case for such systems are dynamic languages like Python due to limited type information at editing time. Recently, researchers proposed machine learning approaches to address this challenge. In particular, the Probabilistic Higher Order Grammar technique (Bielik et al., ICML 2016) uses a grammar-based approach with a classical machine learning schema to exploit local context. A method by Li et al., (IJCAI 2018) uses deep learning methods, in detail a Recurrent Neural Network coupled with a Pointer Network. We compare these two approaches quantitatively on a large corpus of Python files from GitHub. We also propose a combination of both approaches, where a neural network decides which schema to use for each prediction. The proposed method achieves a slightly better accuracy than either of the systems alone. This demonstrates the potential of ensemble-like methods for code completion and recommendation tasks in dynamically typed languages.

References

[1]
M. Robillard, R. Walker, and T. Zimmermann, "Recommendation systems for software engineering," IEEE Software, vol. 27, pp. 80--86, July 2010.
[2]
M. Allamanis, E. T. Barr, P. Devanbu, and C. Sutton, "A survey of machine learning for big code and naturalness," ACM Comput. Surv., vol. 51, pp. 81:1-81:37, July 2018.
[3]
C. Liu, X. Wang, R. Shin, J. E. Gonzalez, and D. Song, "Neural code completion," 2017.
[4]
V. Raychev, P. Bielik, and M. Vechev, "Probabilistic model for code with decision trees," in Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, (New York, NY, USA), pp. 731--747, ACM, 2016.
[5]
J. Li, Y. Wang, M. R. Lyu, and I. King, "Code completion with neural attention and pointer networks," in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden., pp. 4159--4165, ijcai.org, 2018.
[6]
P. Bielik, V. Raychev, and M. Vechev, "Phog: Probabilistic model for code," in Proceedings of The 33rd International Conference on Machine Learning, vol. 48 of Proceedings of Machine Learning Research, (New York, New York, USA), pp. 2933--2942, PMLR, 20-22 Jun 2016.
[7]
O. Vinyals, M. Fortunato, and N. Jaitly, "Pointer networks," in Advances in Neural Information Processing Systems 28, pp. 2692--2700, Curran Associates, Inc., 2015.
[8]
D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
[9]
Y. Gal and Z. Ghahramani, "A theoretically grounded application of dropout in recurrent neural networks," in Advances in Neural Information Processing Systems 29, pp. 1019--1027, Curran Associates, Inc., 2016.
[10]
M. Brockschmidt, M. Allamanis, A. L. Gaunt, and O. Polozov, "Generative Code Modeling with Graphs," in International Conference on Learning Representations (ICLR 2019), Apr. 2019.
[11]
U. Alon, R. Sadaka, O. Levy, and E. Yahav, "Structural Language Models of Code," arXiv:1910.00577, Feb. 2020.
[12]
P. M. Nadkarni, L. Ohno-Machado, and W. W. Chapman, "Natural language processing: an introduction," Journal of the American Medical Informatics Association, vol. 18, pp. 544--551, 09 2011.
[13]
A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu, "On the naturalness of software," in 2012 34th International Conference on Software Engineering (ICSE), pp. 837--847, June 2012.
[14]
Z. Tu, Z. Su, and P. Devanbu, "On the localness of software," in Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, (New York, NY, USA), pp. 269--280, ACM, 2014.
[15]
T. Mikolov, W.-t. Yih, and G. Zweig, "Linguistic regularities in continuous space word representations," in Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (Atlanta, Georgia), pp. 746--751, Association for Computational Linguistics, June 2013.
[16]
S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735--1780, 1997.
[17]
T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," in 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, 2013.
[18]
D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
[19]
T. Gvero and V. Kuncak, "Interactive synthesis using free-form queries," in 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 2, pp. 689--692, May 2015.

Cited By

View all
  • (2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
  • (2024)Rethinking AI code generation: a one-shot correction approach based on user feedbackAutomated Software Engineering10.1007/s10515-024-00451-y31:2Online publication date: 12-Jul-2024
  • (2023)Evaluating the Usability and Functionality of Intelligent Source Code Completion Assistants: A Comprehensive ReviewApplied Sciences10.3390/app13241306113:24(13061)Online publication date: 7-Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSEW'20: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops
June 2020
831 pages
ISBN:9781450379632
DOI:10.1145/3387940
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. code recommendations
  2. machine learning
  3. neural networks

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICSE '20
Sponsor:
ICSE '20: 42nd International Conference on Software Engineering
June 27 - July 19, 2020
Seoul, Republic of Korea

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)68
  • Downloads (Last 6 weeks)8
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
  • (2024)Rethinking AI code generation: a one-shot correction approach based on user feedbackAutomated Software Engineering10.1007/s10515-024-00451-y31:2Online publication date: 12-Jul-2024
  • (2023)Evaluating the Usability and Functionality of Intelligent Source Code Completion Assistants: A Comprehensive ReviewApplied Sciences10.3390/app13241306113:24(13061)Online publication date: 7-Dec-2023
  • (2023)Big Code Search: A BibliographyACM Computing Surveys10.1145/360490556:1(1-49)Online publication date: 26-Aug-2023
  • (2022)A methodology for refined evaluation of neural code completion approachesData Mining and Knowledge Discovery10.1007/s10618-022-00866-937:1(167-204)Online publication date: 1-Nov-2022
  • (2021)A Literature Review of Using Machine Learning in Software Development Life Cycle StagesIEEE Access10.1109/ACCESS.2021.31197469(140896-140920)Online publication date: 2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media