Natural language to code: How far are we?

S Wang, M Geng, B Lin, Z Sun, M Wen, Y Liu… - Proceedings of the 31st …, 2023 - dl.acm.org
Proceedings of the 31st ACM Joint European Software Engineering Conference …, 2023dl.acm.org
A longstanding dream in software engineering research is to devise effective approaches for
automating development tasks based on developers' informally-specified intentions. Such
intentions are generally in the form of natural language descriptions. In recent literature, a
number of approaches have been proposed to automate tasks such as code search and
even code generation based on natural language inputs. While these approaches vary in
terms of technical designs, their objective is the same: transforming a developer's intention …
A longstanding dream in software engineering research is to devise effective approaches for automating development tasks based on developers' informally-specified intentions. Such intentions are generally in the form of natural language descriptions. In recent literature, a number of approaches have been proposed to automate tasks such as code search and even code generation based on natural language inputs. While these approaches vary in terms of technical designs, their objective is the same: transforming a developer's intention into source code. The literature, however, lacks a comprehensive understanding towards the effectiveness of existing techniques as well as their complementarity to each other. We propose to fill this gap through a large-scale empirical study where we systematically evaluate natural language to code techniques. Specifically, we consider six state-of-the-art techniques targeting code search, and four targeting code generation. Through extensive evaluations on a dataset of 22K+ natural language queries, our study reveals the following major findings: (1) code search techniques based on model pre-training are so far the most effective while code generation techniques can also provide promising results; (2) complementarity widely exists among the existing techniques; and (3) combining the ten techniques together can enhance the performance for 35% compared with the most effective standalone technique. Finally, we propose a post-processing strategy to automatically integrate different techniques based on their generated code. Experimental results show that our devised strategy is both effective and extensible.
ACM Digital Library