skip to main content
research-article
Open access

In-IDE Code Generation from Natural Language: Promise and Challenges

Published: 04 March 2022 Publication History

Abstract

A great part of software development involves conceptualizing or communicating the underlying procedures and logic that needs to be expressed in programs. One major difficulty of programming is turning concept into code, especially when dealing with the APIs of unfamiliar libraries. Recently, there has been a proliferation of machine learning methods for code generation and retrieval from natural language queries, but these have primarily been evaluated purely based on retrieval accuracy or overlap of generated code with developer-written code, and the actual effect of these methods on the developer workflow is surprisingly unattested. In this article, we perform the first comprehensive investigation of the promise and challenges of using such technology inside the PyCharm IDE, asking, “At the current state of technology does it improve developer productivity or accuracy, how does it affect the developer experience, and what are the remaining gaps and challenges?” To facilitate the study, we first develop a plugin for the PyCharm IDE that implements a hybrid of code generation and code retrieval functionality, and we orchestrate virtual environments to enable collection of many user events (e.g., web browsing, keystrokes, fine-grained code edits). We ask developers with various backgrounds to complete 7 varieties of 14 Python programming tasks ranging from basic file manipulation to machine learning or data visualization, with or without the help of the plugin. While qualitative surveys of developer experience are largely positive, quantitative results with regards to increased productivity, code quality, or program correctness are inconclusive. Further analysis identifies several pain points that could improve the effectiveness of future machine learning-based code generation/retrieval developer assistants and demonstrates when developers prefer code generation over code retrieval and vice versa. We release all data and software to pave the road for future empirical studies on this topic, as well as development of better code generation models.

References

[1]
R. Agashe, Srini Iyer, and Luke Zettlemoyer. 2019. JuICe: A large scale distantly supervised dataset for open domain context-based code generation. In Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP/IJCNLP).
[2]
Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In International Symposium on Foundations of Software Engineering (ESEC/FSE). 281–293.
[3]
Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Comput. Surv. 51, 4 (2018), 1–37.
[4]
Miltiadis Allamanis, Daniel Tarlow, A. Gordon, and Y. Wei. 2015. Bimodal modelling of source code and natural language. In 32nd International Conference on Machine Learning (ICML).
[5]
S. Amann, Sebastian Proksch, and S. Nadi. 2016. FeedBaG: An interaction tracker for Visual Studio. In International Conference on Program Comprehension (ICPC). 1–3.
[6]
Sven Amann, Sebastian Proksch, Sarah Nadi, and Mira Mezini. 2016. A study of visual studio usage in practice. In IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 124–134.
[7]
Yigal Arens, Craig A. Knoblock, and Wei-Min Shen. 1996. Query reformulation for dynamic information integration. J. Intell. Inf. Syst. 6, 2–3 (1996), 99–130.
[8]
Philip Arthur, Graham Neubig, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. 2015. Semantic parsing of ambiguous input through paraphrasing and verification. Trans. Assoc. Comput. Ling. 3 (2015), 571–584.
[9]
Alberto Bacchelli, Luca Ponzanelli, and Michele Lanza. 2012. Harnessing stack overflow for the IDE. In International Workshop on Recommendation Systems for Software Engineering (RSSE). IEEE, 26–30.
[10]
Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2017. DeepCoder: Learning to write programs. In 5th International Conference on Learning Representations (ICLR).
[11]
S. Barman, Sarah E. Chasins, Rastislav Bodík, and Sumit Gulwani. 2016. Ringer: Web automation by demonstration. In ACM SIGPLAN International Conference on Object-oriented Programming, Systems, Languages, and Applications.
[12]
D. Basin, Y. Deville, P. Flener, A. Hamfelt, and Jørgen Fischer Nilsson. 2004. Synthesis of programs in computational logic. In Program Development in Computational Logic.
[13]
Andrew Bell, Malcolm Fairbrother, and Kelvyn Jones. 2019. Fixed and random effects models: Making an informed choice. Qual. Quant. 53, 2 (2019), 1051–1074.
[14]
Tony Beltramelli. 2018. pix2code: Generating code from a graphical user interface screenshot. In ACM SIGCHI Symposium on Engineering Interactive Computing Systems. ACM, 3:1–3:6. DOI:
[15]
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Conference on Empirical Methods in Natural Language Processing (EMNLP). 1533–1544.
[16]
J. Brandt, P. Guo, J. Lewenstein, Mira Dontcheva, and Scott R. Klemmer. 2009. Two studies of opportunistic programming: Interleaving web foraging, learning, and writing code. In SIGCHI Conference on Human Factors in Computing Systems (CHI).
[17]
Brock Angus Campbell and Christoph Treude. 2017. NLP2Code: Code snippet content assist via natural language tasks. In International Conference on Software Maintenance and Evolution (ICSME). IEEE, 628–632.
[18]
Veronica Cateté and T. Barnes. 2017. Application of the delphi method in computer science principles rubric creation. InACM Conference on Innovation and Technology in Computer Science Education.
[19]
Sarah E. Chasins, S. Barman, Rastislav Bodík, and Sumit Gulwani. 2015. Browser record and replay as a building block for end-user web automation tools. In 24th International Conference on World Wide Web (WWW).
[20]
Sarah E. Chasins, Maria Mueller, and Rastislav Bodík. 2018. Rousillon: Scraping distributed hierarchical web data. In 31st Annual ACM Symposium on User Interface Software and Technology (UIST).
[21]
X. Chen, C. Liu, and D. Song. 2019. Execution-guided neural program synthesis. In 7th International Conference on Learning Representations (ICLR).
[22]
J. Cohen. 2003. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Lawrence Erlbaum.
[23]
Harald Cramér. 1999. Mathematical Methods of Statistics. Vol. 43. Princeton University Press.
[24]
A. Cypher, Daniel C. Halbert, D. Kurlander, H. Lieberman, D. Maulsby, B. Myers, and Alan Turransky. 1993. Watch what I do: Programming by demonstration.
[25]
M. Dawood, Khalid A. Buragga, Abdul Raouf Khan, and Noor Zaman. 2013. Rubric based assessment plan implementation for computer science program: A practical approach. In IEEE International Conference on Teaching, Assessment and Learning for Engineering (TALE). 551–555.
[26]
Edsger W. Dijkstra. 1979. On the foolishness of “natural language programming.” In Program Construction. Springer, 51–53.
[27]
K. Ellis, Maxwell Nye, Y. Pu, Felix Sosa, J. Tenenbaum, and Armando Solar-Lezama. 2019. Write, execute, assess: Program synthesis with a REPL. In 33rd Conference on Neural Information Processing Systems (NeurIPS).
[28]
William Fedus, Barret Zoph, and Noam Shazeer. 2021. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. arXiv preprint arXiv:2101.03961 (2021).
[29]
Y. Feng, R. Martins, Osbert Bastani, and Isil Dillig. 2018. Program synthesis using conflict-driven learning. In 39th ACM SIGPLAN Conference on Programming Language Design and Implementation.
[30]
Zhangyin Feng, Daya Guo, Duyu Tang, N. Duan, X. Feng, Ming Gong, Linjun Shou, B. Qin, Ting Liu, Daxin Jiang, and M. Zhou. 2020. CodeBERT: A pre-trained model for programming and natural languages. In Conference on Empirical Methods in Natural Language Processing (EMNLP).
[31]
John K. Feser, S. Chaudhuri, and Isil Dillig. 2015. Synthesizing data structure transformations from input-output examples. In 36th Annual ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).
[32]
Christine Franks, Zhaopeng Tu, Premkumar Devanbu, and Vincent Hellendoorn. 2015. CACHECA: A cache language model based code suggestion tool. In International Conference on Software Engineering (ICSE). IEEE, 705–708.
[33]
Gordon Fraser, Matt Staats, Phil McMinn, Andrea Arcuri, and Frank Padberg. 2015. Does automated unit test generation really help software testers? A controlled empirical study. ACM Trans. Softw. Eng. Methodol. 24, 4 (2015), 1–49.
[34]
Andrew Gelman and Jennifer Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
[35]
J. Ginsparg. 1978. Natural language processing in an automatic programming domain.
[36]
Shuchi Grover, S. Basu, and Patricia K. Schank. 2018. What we can learn about student learning from open-ended programming projects in middle school computer science. In 49th ACM Technical Symposium on Computer Science Education.
[37]
Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. In IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 933–944.
[38]
Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. ACM SIGPLAN Not. 46, 1 (2011), 317–330.
[39]
Sonia Haiduc, G. Bavota, A. Marcus, R. Oliveto, A. Lucia, and T. Menzies. 2013. Automatic query reformulations for text retrieval in software engineering. In 35th International Conference on Software Engineering (ICSE). 842–851.
[40]
Tatsunori B. Hashimoto, Kelvin Guu, Yonatan Oren, and Percy S. Liang. 2018. A retrieve-and-edit framework for predicting structured outputs. In Conference on Advances in Neural Information Processing Systems (NeurIPS). 10052–10062.
[41]
Shirley Anugrah Hayati, Raphael Olivier, Pravalika Avvaru, Pengcheng Yin, Anthony Tomasic, and Graham Neubig. 2018. Retrieval-based neural code generation. In Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 925–930. DOI:
[42]
Andrew Head, Elena Leah Glassman, B. Hartmann, and Marti A. Hearst. 2018. Interactive extraction of examples from existing code. In CHI Conference on Human Factors in Computing Systems.
[43]
Andrew Head, Elena Leah Glassman, Gustavo Soares, R. Suzuki, Lucas Figueredo, L. D’Antoni, and B. Hartmann. 2017. Writing reusable code feedback at scale with mixed-initiative program synthesis. In 4th ACM Conference on Learning @ Scale.
[44]
George E. Heidorn. 1976. Automatic programming through natural language dialogue: A survey. IBM J. Res. Devel. 20, 4 (1976), 302–313.
[45]
E. Hill, Manuel Roldan-Vega, J. Fails, and Greg Mallet. 2014. NL-based query refinement and contextualized code search results: A user study. In Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE). 34–43.
[46]
Joseph L. Hodges Jr. and Erich L. Lehmann. 1963. Estimates of location based on rank tests. Ann. Math. Statist. (1963), 598–611.
[47]
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019).
[48]
Srini Iyer, Ioannis Konstas, A. Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In 54th Annual Meeting of the Association for Computational Linguistics (ACL).
[49]
Srini Iyer, Ioannis Konstas, A. Cheung, and Luke Zettlemoyer. 2018. Mapping language to code in programmatic context. In Conference on Empirical Methods in Natural Language Processing (EMNLP).
[50]
Paul C. D. Johnson. 2014. Extension of Nakagawa & Schielzeth’s \(R^2_{GLMM}\) to random slopes models. Meth. Ecol. Evolut. 5, 9 (2014), 944–946.
[51]
Siddharth Karamcheti, Dorsa Sadigh, and Percy Liang. 2020. Learning adaptive language interfaces through decomposition. In 1st Workshop on Interactive and Executable Semantic Parsing. Association for Computational Linguistics, 23–33. DOI:
[52]
I. Keivanloo, J. Rilling, and Ying Zou. 2014. Spotting working code examples. In 36th International Conference on Software Engineering (ICSE).
[53]
Mary Beth Kery, Amber Horvath, and B. Myers. 2017. Variolite: Supporting exploratory programming by data scientists. CHI Conference on Human Factors in Computing Systems (CHI).
[54]
Mary Beth Kery and B. Myers. 2017. Exploring exploratory programming. In IEEE Symposium on Visual Languages and Human-centric Computing (VL/HCC). 25–29.
[55]
A. Ko and B. Myers. 2004. Designing the whyline: A debugging interface for asking questions about program behavior. In CHI Conference on Human Factors in Computing Systems (CHI).
[56]
A. Ko and B. Myers. 2008. Debugging reinvented. In ACM/IEEE 30th International Conference on Software Engineering (ICSE). 301–310.
[57]
Amy Ko, Brad A. Myers, and Htet Htet Aung. 2004. Six learning barriers in end-user programming systems. In IEEE Symposium on Visual Languages and Human-centric Computing (VL/HCC). IEEE, 199–206.
[58]
Ned Kock and Gary Lynn. 2012. Lateral collinearity and misleading results in variance-based SEM: An illustration and recommendations. J. Assoc. Inf. Syst. 13, 7 (2012).
[59]
S. Kulal, Panupong Pasupat, K. Chandra, Mina Lee, Oded Padon, A. Aiken, and Percy Liang. 2019. SPoC: Search-based pseudocode to code. In 33rd Conference on Neural Information Processing Systems (NeurIPS).
[60]
Nate Kushman and R. Barzilay. 2013. Using semantic unification to generate regular expressions from natural language. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL).
[61]
Davy Landman, Alexander Serebrenik, Eric Bouwers, and Jurgen J. Vinju. 2016. Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods and C functions. J. Softw. Evolut. Process 28, 7 (2016), 589–618.
[62]
Vu Le and Sumit Gulwani. 2014. FlashExtract: A framework for data extraction by examples. ACM SIGPLAN Not. 49, 6 (2014), 542–553.
[63]
Tao Lei, F. Long, R. Barzilay, and M. Rinard. 2013. From natural language specifications to program input parsers. In 51st Annual Meeting of the Association for Computational Linguistics (ACL).
[64]
Toby Jia-Jun Li, Amos Azaria, and B. Myers. 2017. SUGILITE: Creating multimodal smartphone automation by demonstration. In CHI Conference on Human Factors in Computing Systems (CHI).
[65]
Toby Jia-Jun Li, I. Labutov, X. Li, X. Zhang, W. Shi, Wanling Ding, Tom Michael Mitchell, and B. Myers. 2018. APPINITE: A multi-modal interface for specifying data descriptions in programming by demonstration using natural language instructions. In IEEE Symposium on Visual Languages and Human-centric Computing (VL/HCC). 105–114.
[66]
Toby Jia-Jun Li, Marissa Radensky, J. Jia, Kirielle Singarajah, Tom Michael Mitchell, and B. Myers. 2019. PUMICE: A multi-modal agent that learns concepts and conditionals from natural language and demonstrations. In 32nd Annual ACM Symposium on User Interface Software and Technology (UIST).
[67]
H. Lieberman, F. Paternò, Markus Klann, and V. Wulf. 2006. End-user development: An emerging paradigm. In End User Development.
[68]
Wang Ling, Phil Blunsom, Edward Grefenstette, Karl Moritz Hermann, Tomás Kociský, Fumin Wang, and Andrew W. Senior. 2016. Latent predictor networks for code generation. In 54th Annual Meeting of the Association for Computational Linguistics (ACL). The Association for Computer Linguistics. DOI:
[69]
C. Liu, Xin Xia, David Lo, Cuiyun Gao, Xiaohu Yang, and J. Grundy. 2020. Opportunities and challenges in code search tools. ArXiv abs/2011.02297 (2020).
[70]
X. Liu, Beijun Shen, H. Zhong, and Jiangang Zhu. 2016. EXPSOL: Recommending online threads for exception-related bug reports. In 23rd Asia-Pacific Software Engineering Conference (APSEC). 25–32.
[71]
Meili Lu, Xiaobing Sun, S. Wang, D. Lo, and Yucong Duan. 2015. Query expansion via WordNet for effective code search. In International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 545–549.
[72]
J. Maloney, M. Resnick, N. Rusk, B. Silverman, and Evelyn Eastmond. 2010. The scratch programming language and environment. ACM Trans. Comput. Educ. 10 (2010), 16:1–16:15.
[73]
Christopher D. Manning, Hinrich Schütze, and Prabhakar Raghavan. 2008. Introduction to Information Retrieval. Cambridge University Press.
[74]
Mehdi Manshadi, Daniel Gildea, and James F. Allen. 2013. Integrating programming by example and natural language programming. In AAAI Conference on Artificial Intelligence (AAAI).
[75]
T. McCabe. 1976. A complexity measure. IEEE Trans. Softw. Eng. SE-2 (1976), 308–320.
[76]
Rada Mihalcea, Hugo Liu, and Henry Lieberman. 2006. NLP (natural language processing) for NLP (natural language programming). In International Conference on Intelligent Text Processing and Computational Linguistics. Springer, 319–330.
[77]
Parastoo Mohagheghi and Reidar Conradi. 2007. Quality, productivity and economic benefits of software reuse: A review of industrial studies. Empir. Softw. Eng. 12, 5 (2007), 471–516.
[78]
Laura Moreno, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Andrian Marcus. 2015. How can I use this method? In IEEE/ACM 37th IEEE International Conference on Software Engineering. IEEE, 880–890.
[79]
Yair Mundlak. 1978. On the pooling of time series and cross section data. Economet.: J. Economet. Societ. (1978), 69–85.
[80]
Lauren Murphy, Mary Beth Kery, Oluwatosin Alliyu, Andrew Macvean, and Brad A. Myers. 2018. API designers in the field: Design practices and challenges for creating usable APIs. In IEEE Symposium on Visual Languages and Human-centric Computing (VL/HCC). IEEE, 249–258.
[81]
B. Myers, J. Pane, and A. Ko. 2004. Natural programming languages and environments. Commun. ACM 47 (2004), 47–52.
[82]
B. Myers and Jeffrey Stylos. 2016. Improving API usability. Commun. ACM 59 (2016), 62–69.
[83]
Brad A. Myers, Amy Ko, Thomas D. LaToza, and YoungSeok Yoon. 2016. Programmers are users too: Human-centered methods for improving programming tools. Computer 49, 7 (2016), 44–52.
[84]
Brad A. Myers and Jeffrey Stylos. 2016. Improving API usability. Commun. ACM 59, 6 (2016), 62–69.
[85]
Shinichi Nakagawa and Holger Schielzeth. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Meth. Ecol. Evolut. 4, 2 (2013), 133–142.
[86]
Daye Nam, Amber Horvath, Andrew Macvean, Brad Myers, and Bogdan Vasilescu. 2019. Marble: Mining for boilerplate code to identify API usability problems. In International Conference on Automated Software Engineering (ASE). IEEE, 615–627.
[87]
T. Nguyen and C. Csallner. 2015. Reverse engineering mobile application user interfaces with REMAUI (T). In 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 248–259.
[88]
Lorelli S. Nowell, Jill M. Norris, Deborah E. White, and Nancy J. Moules. 2017. Thematic analysis: Striving to meet the trustworthiness criteria. Int. J. Qualit. Meth. 16, 1 (2017), 1609406917733847.
[89]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In 40th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 311–318. DOI:
[90]
Emilio Parisotto, Abdel Rahman Mohamed, R. Singh, L. Li, Dengyong Zhou, and Pushmeet Kohli. 2017. Neuro-symbolic program synthesis. In 5th International Conference on Learning Representations (ICLR).
[91]
Luca Ponzanelli, Alberto Bacchelli, and Michele Lanza. 2013. Seahawk: Stack overflow in the IDE. In International Conference on Software Engineering (ICSE). IEEE, 1295–1298.
[92]
Luca Ponzanelli, G. Bavota, M. D. Penta, R. Oliveto, and M. Lanza. 2014. Mining Stack Overflow to turn the IDE into a self-confident programming prompter. In International Conference on Mining Software Repositories (MSR).
[93]
David Price, Ellen Rilofff, Joseph Zachary, and Brandon Harvey. 2000. NaturalJava: A natural language interface for programming in Java. In International Conference on Intelligent User Interfaces (IUI). 207–211.
[94]
Sebastian Proksch, Sven Amann, and Sarah Nadi. 2018. Enriched event streams: A general dataset for empirical studies on in-IDE activities of software developers. In 15th International Conference on Mining Software Repositories (MSR). 62–65.
[95]
Karthik Radhakrishnan, Arvind Srikantan, and Xi Victoria Lin. 2020. ColloQL: Robust Text-to-SQL over search queries. In 1st Workshop on Interactive and Executable Semantic Parsing. 34–45.
[96]
Mukund Raghothaman, Y. Wei, and Y. Hamadi. 2016. SWIM: Synthesizing what I mean—Code search and idiomatic snippet synthesis. In IEEE/ACM 38th International Conference on Software Engineering (ICSE). 357–367.
[97]
M. M. Rahman and C. Roy. 2014. SurfClipse: Context-aware meta-search in the IDE. In IEEE International Conference on Software Maintenance and Evolution. 617–620.
[98]
Mohammad Masudur Rahman, Shamima Yeasmin, and Chanchal K. Roy. 2014. Towards a context-aware IDE-based meta search engine for recommendation about programming errors and exceptions. In International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 194–203.
[99]
Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In ACM Conference on Programming Language Design and Implementation (PLDI). ACM, 419–428.
[100]
Mohammad Raza, Sumit Gulwani, and Natasa Milic-Frayling. 2015. Compositional program synthesis from natural language and examples. In 24th International Joint Conference on Artificial Intelligence (IJCAI).
[101]
Henry Gordon Rice. 1953. Classes of recursively enumerable sets and their decision problems. Trans. Amer. Math. Soc. 74, 2 (1953), 358–366.
[102]
Matthew Richardson, Ewa Dominowska, and Robert Ragno. 2007. Predicting clicks: Estimating the click-through rate for new ads. In 16th International Conference on World Wide Web. 521–530.
[103]
Devjeet Roy, Ziyi Zhang, Maggie Ma, Venera Arnaoudova, Annibale Panichella, Sebastiano Panichella, Danielle Gonzalez, and Mehdi Mirakhorli. 2020. DeepTC-Enhancer: Improving the readability of automatically generated tests. In 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 287–298.
[104]
Caitlin Sadowski, Kathryn T. Stolee, and Sebastian Elbaum. 2015. How developers search for code: A case study. In 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE). 191–201.
[105]
Apurvanand Sahay, Arsene Indamutsa, D. D. Ruscio, and A. Pierantonio. 2020. Supporting the understanding and comparison of low-code development platforms. In 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). 171–178.
[106]
Richard Shin, Miltiadis Allamanis, Marc Brockschmidt, and Oleksandr Polozov. 2019. Program synthesis and semantic parsing with learned code idioms. In 33rd Conference on Neural Information Processing Systems (NeurIPS).
[107]
Forrest Shull, Janice Singer, and Dag I. K. Sjøberg. 2007. Guide to Advanced Empirical Software Engineering. Springer.
[108]
Armando Solar-Lezama. 2008. Program synthesis by sketching.
[109]
Siddharth Subramanian, Laura Inozemtseva, and Reid Holmes. 2014. Live API documentation. In International Conference on Software Engineering (ICSE).
[110]
Ngoc Tran, Hieu Tran, Son Nguyen, Hoan Nguyen, and Tien Nguyen. 2019. Does BLEU score work for code migration? In IEEE/ACM 27th International Conference on Program Comprehension (ICPC). IEEE, 165–176.
[111]
Zhaopeng Tu, Zhendong Su, and Premkumar Devanbu. 2014. On the localness of software. In International Symposium on Foundations of Software Engineering (ESEC/FSE). ACM, 269–280.
[112]
David Vadas and James R. Curran. 2005. Programming with unrestricted natural language. In Australasian Language Technology Workshop. 191–199.
[113]
Venkatesh Vinayakarao, A. Sarma, R. Purandare, Shuktika Jain, and Saumya Jain. 2017. ANNE: Improving source code search using entity retrieval approach. In Web Search and Data Mining Conference (WSDM).
[114]
Yushi Wang, Jonathan Berant, and Percy Liang. 2015. Building a semantic parser overnight. In 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP). 1332–1342.
[115]
Yi Wei, Nirupama Chandrasekaran, Sumit Gulwani, and Youssef Hamadi. 2015. Building Bing Developer Assistant. Technical Report. MSR-TR-2015-36, Microsoft Research.
[116]
Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in Software Engineering. Springer Science & Business Media.
[117]
Frank F. Xu, Zhengbao Jiang, Pengcheng Yin, Bogdan Vasilescu, and Graham Neubig. 2020. Incorporating external knowledge through pre-training for natural language to code generation. In Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 6045–6052.
[118]
Xuchen Yao and Benjamin Van Durme. 2014. Information extraction over structured data: Question answering with freebase. In 52nd Annual Meeting of the Association for Computational Linguistics (ACL). 956–966.
[119]
Ziyu Yao, Xiujun Li, Jianfeng Gao, Brian Sadler, and Huan Sun. 2019. Interactive semantic parsing for if-then recipes via hierarchical reinforcement learning. In AAAI Conference on Artificial Intelligence (AAAI). 2547–2554.
[120]
Ziyu Yao, Jayavardhan Reddy Peddamail, and Huan Sun. 2019. CoaCor: Code annotation for code retrieval with reinforcement learning. In World Wide Web Conference (WWW).
[121]
Ziyu Yao, Daniel S. Weld, W. Chen, and Huan Sun. 2018. StaQC: A systematically mined question-code dataset from stack overflow. In World Wide Web Conference (WWW).
[122]
Pengcheng Yin, Bowen Deng, Edgar Chen, Bogdan Vasilescu, and Graham Neubig. 2018. Learning to mine aligned code and natural language pairs from stack overflow. In International Conference on Mining Software Repositories (MSR). ACM, 476–486. DOI:
[123]
Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. In Annual Meeting of the Association for Computational Linguistics (ACL).
[124]
Pengcheng Yin and Graham Neubig. 2018. TRANX: A transition-based neural abstract syntax parser for semantic parsing and code generation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), Demo Track.
[125]
Pengcheng Yin and Graham Neubig. 2019. Reranking for neural semantic parsing. In 57th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 4553–4559. DOI:
[126]
Maksym Zavershynskyi, Alex Skidanov, and Illia Polosukhin. 2018. NAPS: Natural program synthesis dataset. In 2nd Workshop on Neural Abstract Machines & Program Induction (NAMPI).
[127]
John M. Zelle and Raymond J. Mooney. 1996. Learning to parse database queries using inductive logic programming. In National Conference on Artificial Intelligence. 1050–1055.
[128]
Luke Zettlemoyer and Michael Collins. 2007. Online learning of relaxed CCG grammars for parsing to logical form. In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). 678–687.
[129]
Ruiqi Zhong, Mitchell Stern, and D. Klein. 2020. Semantic scaffolds for pseudocode-to-code generation. In Meeting of the Association for Computational Linguistics.
[130]
Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103 (2017).

Cited By

View all
  • (2024)How New Developers Approach Augmented Reality Development Using Simplified Creation Tools: An Observational StudyMultimodal Technologies and Interaction10.3390/mti80400358:4(35)Online publication date: 22-Apr-2024
  • (2024)Data Mesh: A Systematic Gray Literature ReviewACM Computing Surveys10.1145/368730157:1(1-36)Online publication date: 7-Oct-2024
  • (2024)Understanding Novice Users' Mental Models of Gesture Discoverability and Designing Effective OnboardingCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678370(290-295)Online publication date: 5-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 31, Issue 2
April 2022
789 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3492439
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 March 2022
Accepted: 01 September 2021
Revised: 01 July 2021
Received: 01 January 2021
Published in TOSEM Volume 31, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Natural language programming assistant
  2. code generation
  3. code retrieval
  4. empirical study

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • NSF

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3,554
  • Downloads (Last 6 weeks)413
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)How New Developers Approach Augmented Reality Development Using Simplified Creation Tools: An Observational StudyMultimodal Technologies and Interaction10.3390/mti80400358:4(35)Online publication date: 22-Apr-2024
  • (2024)Data Mesh: A Systematic Gray Literature ReviewACM Computing Surveys10.1145/368730157:1(1-36)Online publication date: 7-Oct-2024
  • (2024)Understanding Novice Users' Mental Models of Gesture Discoverability and Designing Effective OnboardingCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678370(290-295)Online publication date: 5-Oct-2024
  • (2024)Significant Productivity Gains through Programming with Large Language ModelsProceedings of the ACM on Human-Computer Interaction10.1145/36611458:EICS(1-29)Online publication date: 17-Jun-2024
  • (2024)BatFix: Repairing language model-based transpilationACM Transactions on Software Engineering and Methodology10.1145/365866833:6(1-29)Online publication date: 27-Jun-2024
  • (2024)“It would work for me too”: How Online Communities Shape Software Developers’ Trust in AI-Powered Code Generation ToolsACM Transactions on Interactive Intelligent Systems10.1145/365199014:2(1-39)Online publication date: 15-May-2024
  • (2024)Performance, Workload, Emotion, and Self-Efficacy of Novice Programmers Using AI Code GenerationProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653615(290-296)Online publication date: 3-Jul-2024
  • (2024)LLM-based and Retrieval-Augmented Control Code GenerationProceedings of the 1st International Workshop on Large Language Models for Code10.1145/3643795.3648384(22-29)Online publication date: 20-Apr-2024
  • (2024)Rocks Coding, Not Development: A Human-Centric, Experimental Evaluation of LLM-Supported SE TasksProceedings of the ACM on Software Engineering10.1145/36437581:FSE(699-721)Online publication date: 12-Jul-2024
  • (2024)JIT-Smart: A Multi-task Learning Framework for Just-in-Time Defect Prediction and LocalizationProceedings of the ACM on Software Engineering10.1145/36437271:FSE(1-23)Online publication date: 12-Jul-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media