research-article

A neural framework for retrieval and summarization of source code

Authors:

Minghui ZhouAuthors Info & Claims

ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

Pages 826 - 831

https://doi.org/10.1145/3238147.3240471

Published: 03 September 2018 Publication History

Abstract

Code retrieval and summarization are two tasks often employed by software developers to reuse code that spreads over online repositories. In this paper, we present a neural framework that allows bidirectional mapping between source code and natural language to improve these two tasks. Our framework, BVAE, is designed to have two Variational AutoEncoders (VAEs) to model bimodal data: C-VAE for source code and L-VAE for natural language. Both VAEs are trained jointly to reconstruct their input as much as possible with regularization that captures the closeness between the latent variables of code and description. BVAE could learn semantic vector representations for both code and description and generate completely new descriptions for arbitrary code snippets. We design two instance models of BVAE for retrieval and summarization tasks respectively and evaluate their performance on a benchmark which involves two programming languages: C# and SQL. Experiments demonstrate BVAE’s potential on the two tasks.

References

[1]

Miltos Allamanis, Daniel Tarlow, Andrew Gordon, and Yi Wei. 2015. Bimodal modelling of source code and natural language. In ICML. 2123–2132.

Digital Library

[2]

Sushil Bajracharya, Joel Ossher, and Cristina Lopes. 2014. Sourcerer: An infrastructure for large-scale collection and analysis of open-source code. Science of Computer Programming 79 (2014), 241–259.

Digital Library

[3]

Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65–72.

[4]

Jeffrey S Beis and David G Lowe. 1997. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In CVPR. IEEE, 1000–1006.

Digital Library

[5]

Stephen Blott and Roger Weber. 1997. A simple vector-approximation file for similarity search in high-dimensional vector spaces. ESPRIT Technical Report TR19, ca (1997).

[6]

Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating Sentences from a Continuous Space. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning. 10–21.

[7]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).

[8]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 249–256.

[9]

Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. In Proceedings of the 40th International Conference on Software Engineering. ACM, 933–944.

Digital Library

[10]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.

Digital Library

[11]

Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In ICPC.

Digital Library

[12]

Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, and Zhi Jin. 2018. Summarizing Source Code with Transferred API Knowledge. In IJCAI. 2269–2275.

[13]

Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In ACL, Vol. 1. 2073– 2083.

[14]

Iman Keivanloo, Juergen Rilling, and Ying Zou. 2014. Spotting working code examples. In ICSE. ACM, 664–675.

Digital Library

[15]

Diederik P Kingma and Max Welling. 2014. Auto-encoding variational bayes. In ICLR.

[16]

Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions. ACL, 177–180.

Digital Library

[17]

Georgia Koutrika, Alkis Simitsis, and Yannis E Ioannidis. 2010. Explaining structured queries in natural language. In ICDE. IEEE, 333–344.

[18]

Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. In EMNLP. 1412âĂŞ– 1421.

[19]

Paul W McBurney and Collin McMillan. 2016. Automatic source code summarization of context for java methods. IEEE Transactions on Software Engineering 42, 2 (2016), 103–119.

Digital Library

[20]

Laura Moreno, Jairo Aponte, Giriprasad Sridhara, Andrian Marcus, Lori Pollock, and K Vijay-Shanker. 2013. Automatic generation of natural language summaries for java classes. In ICPC. IEEE, 23–32.

[21]

Dana Movshovitz-Attias and William W. Cohen. 2013. Natural language models for predicting programming comments. In ACL. 35–40.

[22]

Axel-Cyrille Ngonga Ngomo, Lorenz Bühmann, Christina Unger, Jens Lehmann, and Daniel Gerber. 2013. Sorry, i don’t speak SPARQL: translating SPARQL queries into natural language. In WWW. ACM, 977–988.

Digital Library

[23]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In ACL. ACL, 311–318.

Digital Library

[24]

Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. In ICML.

Digital Library

[25]

Alexander M Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. In EMNLP. 379âĂŞ–389.

[26]

Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori Pollock, and K Vijay-Shanker. 2010. Towards automatically generating summary comments for java methods. In Proceedings of the IEEE/ACM international conference on Automated software engineering. ACM, 43–52.

Digital Library

[27]

Giriprasad Sridhara, Lori Pollock, and K Vijay-Shanker. 2011. Automatically detecting and describing high level actions within methods. In ICSE. ACM, 101– 110.

Digital Library

[28]

Quan Wang and Suya You. 2006. Fast similarity search for high-dimensional dataset. In Multimedia, 2006. ISM’06. Eighth IEEE International Symposium on. IEEE, 799–804.

Digital Library

[29]

Wei Wang, Beng Chin Ooi, Xiaoyan Yang, Dongxiang Zhang, and Yueting Zhuang. 2014. Effective multi-modal retrieval based on stacked auto-encoders. Proceedings of the VLDB Endowment 7, 8 (2014), 649–660.

Digital Library

Cited By

Bibi NMaqbool ARana TAfzal FKhan A(2024)C2B: A Semantic Source Code Retrieval Model Using CodeT5 and Bi-LSTMApplied Sciences10.3390/app1413579514:13(5795)Online publication date: 2-Jul-2024
https://doi.org/10.3390/app14135795
Ding XHuang YChen XBian J(2024)Adversarial Attack and Robustness Improvement on Code SummarizationProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661173(17-27)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3661167.3661173
Sun WFang CGe YHu YChen YZhang QGe XLiu YChen Z(2024)A Survey of Source Code Search: A 3-Dimensional PerspectiveACM Transactions on Software Engineering and Methodology10.1145/365634133:6(1-51)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3656341
Show More Cited By

Index Terms

A neural framework for retrieval and summarization of source code
1. Software and its engineering
  1. Software creation and management
    1. Search-based software engineering

Recommendations

Retrieval-based neural source code summarization
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

Source code summarization aims to automatically generate concise summaries of source code in natural language texts, in order to help developers better understand and maintain source code. Traditional work generates a source code summary by utilizing ...
Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning
WWW '20: Proceedings of The Web Conference 2020

Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query. Since both tasks aim to model the association between natural language and ...
An Extractive-and-Abstractive Framework for Source Code Summarization
(Source) Code summarization aims to automatically generate summaries/comments for given code snippets in the form of natural language. Such summaries play a key role in helping developers understand and maintain source code. Existing code summarization ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

September 2018

955 pages

ISBN:9781450359375

DOI:10.1145/3238147

General Chair:
Marianne Huchard
University of Montpellier, France
,
Program Chairs:
Christian Kästner
Carnegie Mellon University, USA
,
Gordon Fraser
University of Passau, Germany

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence
CNRS: Centre National De La Rechercue Scientifique
SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
National key research and development program of China
National Basic Research Program of China

Conference

ASE '18

Sponsor:

SIGAI
CNRS
SIGSOFT
IEEE-CS

ASE '18: 33rd ACM/IEEE International Conference on Automated Software Engineering

September 3 - 7, 2018

Montpellier, France

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

71
Total Citations
View Citations
1,149
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)6

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bibi NMaqbool ARana TAfzal FKhan A(2024)C2B: A Semantic Source Code Retrieval Model Using CodeT5 and Bi-LSTMApplied Sciences10.3390/app1413579514:13(5795)Online publication date: 2-Jul-2024
https://doi.org/10.3390/app14135795
Ding XHuang YChen XBian J(2024)Adversarial Attack and Robustness Improvement on Code SummarizationProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661173(17-27)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3661167.3661173
Sun WFang CGe YHu YChen YZhang QGe XLiu YChen Z(2024)A Survey of Source Code Search: A 3-Dimensional PerspectiveACM Transactions on Software Engineering and Methodology10.1145/365634133:6(1-51)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3656341
Gao KHe RXie BZhou M(2024)Characterizing Deep Learning Package Supply Chains in PyPI: Domains, Clusters, and DisengagementACM Transactions on Software Engineering and Methodology10.1145/364033633:4(1-27)Online publication date: 10-Jan-2024
https://dl.acm.org/doi/10.1145/3640336
Zhou ZLi MYu HFan GYang PHuang Z(2024)Learning to Generate Structured Code Summaries From Hybrid Code ContextIEEE Transactions on Software Engineering10.1109/TSE.2024.343956250:10(2512-2528)Online publication date: Oct-2024
https://doi.org/10.1109/TSE.2024.3439562
Sharma TKechagia MGeorgiou STiwari RVats IMoazen HSarro F(2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111934
Zhang XHou XQiao XSong W(2024)A review of automatic source code summarizationEmpirical Software Engineering10.1007/s10664-024-10553-629:6Online publication date: 7-Oct-2024
https://doi.org/10.1007/s10664-024-10553-6
Liu XLiu JHu HLiu Y(2024)Enrich Code Search Query Semantics with Raw DescriptionsCollaborative Computing: Networking, Applications and Worksharing10.1007/978-3-031-54521-4_16(284-302)Online publication date: 23-Feb-2024
https://doi.org/10.1007/978-3-031-54521-4_16
Wong MGuo SHang CHo STan C(2023)Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A ReviewEntropy10.3390/e2506088825:6(888)Online publication date: 1-Jun-2023
https://doi.org/10.3390/e25060888
Xie YLin JDong HZhang LWu Z(2023)Survey of Code Search Based on Deep LearningACM Transactions on Software Engineering and Methodology10.1145/362816133:2(1-42)Online publication date: 23-Dec-2023
https://dl.acm.org/doi/10.1145/3628161
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents