skip to main content
10.1145/3491102.3501870acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Discovering the Syntax and Strategies of Natural Language Programming with Generative Language Models

Published: 29 April 2022 Publication History
  • Get Citation Alerts
  • Abstract

    In this paper, we present a natural language code synthesis tool, GenLine, backed by 1) a large generative language model and 2) a set of task-specific prompts that create or change code. To understand the user experience of natural language code synthesis with these new types of models, we conducted a user study in which participants applied GenLine to two programming tasks. Our results indicate that while natural language code synthesis can sometimes provide a magical experience, participants still faced challenges. In particular, participants felt that they needed to learn the model’s “syntax,” despite their input being natural language. Participants also struggled to form an accurate mental model of the types of requests the model can reliably translate and developed a set of strategies to debug model input. From these findings, we discuss design implications for future natural language code synthesis tools built using large generative language models.

    Supplementary Material

    MP4 File (3491102.3501870-video-figure.mp4)
    Video Figure
    MP4 File (3491102.3501870-video-preview.mp4)
    Video Preview
    MP4 File (3491102.3501870-talk-video.mp4)
    Talk Video

    References

    [1]
    [n.d.]. GitHub Copilot. https://copilot.github.com/. Accessed: 2021-09-02.
    [2]
    [n.d.]. GPT-3 Creative Fiction. https://www.gwern.net/GPT-3. Accessed: 2021-03-30.
    [3]
    [n.d.]. OpenAI API: Code Completion. https://beta.openai.com/?app=productivity&example=4_4_0. Accessed: 2021-03-30.
    [4]
    [n.d.]. OpenAI API: Natural Language Shell. https://beta.openai.com/?app=productivity&example=4_2_0. Accessed: 2021-03-30.
    [5]
    [n.d.]. OpenAI Prompt Library. https://openai.com/blog/gpt-3-apps/. Accessed: 2021-03-30.
    [6]
    [n.d.]. Tweet: ’First work with #GPT3, I asked it to draw an image. I gave it seed SVG code and asked it to generate an SVG code by itself. Turns out it drew something resembling a Floppy Disk.’. https://twitter.com/fabinrasheed/status/1284052438392004608. Accessed: 2021-03-30.
    [7]
    [n.d.]. Tweet: ’I only had to write 2 samples to give GPT-3 context for what I wanted it to do. It then properly formatted all of the other samples. There were a few exceptions, like the JSX code for tables being larger than the 512 token limit.’. https://twitter.com/sharifshameem/status/1282692481608331265. Accessed: 2021-04-07.
    [8]
    [n.d.]. Tweet: ’Meet Marz. Like @ProjectJupyter, but closer to Earth. No-code data notebook to go from ’natural language’ question to SQL to insight, powered by @OpenAI’s GPT3. Built with @barrnanas @idavidgoldberg @imfanjin as part of @beondeck’s Build Weekend!’. https://twitter.com/albertgozzi/status/1320526310729539584. Accessed: 2021-03-30.
    [9]
    Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, and Quoc V. Le. 2020. Towards a Human-like Open-Domain Chatbot. arxiv:2001.09977 [cs.CL] Accessed: 2021-08-12.
    [10]
    Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A Survey of Machine Learning for Big Code and Naturalness. ACM Comput. Surv. 51, 4, Article 81 (July 2018), 37 pages. https://doi.org/10.1145/3212695
    [11]
    Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. arxiv:2108.07732 [cs.PL]
    [12]
    M. Beth Kery and B. A. Myers. 2017. Exploring exploratory programming. In 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 25–29. https://doi.org/10.1109/VLHCC.2017.8103446
    [13]
    Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie S. Chen, Kathleen Creel, Jared Quincy Davis, Dorottya Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah D. Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark S. Krass, Ranjay Krishna, Rohith Kuditipudi, and et al.2021. On the Opportunities and Risks of Foundation Models. CoRR abs/2108.07258(2021). arxiv:2108.07258https://arxiv.org/abs/2108.07258
    [14]
    Joel Brandt, Mira Dontcheva, Marcos Weskamp, and Scott R. Klemmer. 2010. Example-Centric Programming: Integrating Web Search into the Development Environment. Association for Computing Machinery, New York, NY, USA, 513–522. https://doi.org/10.1145/1753326.1753402
    [15]
    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.). Vol. 33. Curran Associates, Inc., 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
    [16]
    Daniel Buschek, Lukas Mecke, Florian Lehmann, and Hai Dang. 2021. Nine Potential Pitfalls when Designing Human-AI Co-Creative Systems. arXiv preprint arXiv:2104.00358(2021).
    [17]
    Carrie J. Cai, Samantha Winter, David Steiner, Lauren Wilcox, and Michael Terry. 2019. ”Hello AI”: Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 104 (Nov. 2019), 24 pages. https://doi.org/10.1145/3359206
    [18]
    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arxiv:2107.03374 [cs.LG]
    [19]
    Qiaochu Chen, Xinyu Wang, Xi Ye, Greg Durrett, and Isil Dillig. 2020. Multi-Modal Synthesis of Regular Expressions. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (London, UK) (PLDI 2020). Association for Computing Machinery, New York, NY, USA, 487–502. https://doi.org/10.1145/3385412.3385988
    [20]
    Eli Collins and Zoubin Ghahramani. 2021. LaMDA: our breakthrough conversation technology. https://blog.google/technology/ai/lamda/ Accessed: 2021-07-14.
    [21]
    Prem Devanbu, Matthew Dwyer, Sebastian Elbaum, Michael Lowry, Kevin Moran, Denys Poshyvanyk, Baishakhi Ray, Rishabh Singh, and Xiangyu Zhang. 2020. Deep Learning & Software Engineering: State of Research and Future Directions. arxiv:2009.08525 [cs.SE]
    [22]
    Kasra Ferdowsifard, Allen Ordookhanians, Hila Peleg, Sorin Lerner, and Nadia Polikarpova. 2020. Small-Step Live Programming by Example. Association for Computing Machinery, New York, NY, USA, 614–626. https://doi.org/10.1145/3379337.3415869
    [23]
    G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais. 1987. The Vocabulary Problem in Human-System Communication. Commun. ACM 30, 11 (Nov. 1987), 964–971. https://doi.org/10.1145/32206.32212
    [24]
    Barney G. Glaser and Anselm L. Strauss. 1967. The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine de Gruyter, New York, NY.
    [25]
    Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-Output Examples. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages(Austin, Texas, USA) (POPL ’11). Association for Computing Machinery, New York, NY, USA, 317–330. https://doi.org/10.1145/1926385.1926423
    [26]
    Sumit Gulwani and Mark Marron. 2014. NLyze: Interactive Programming by Natural Language for Spreadsheet Data Analysis and Manipulation. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (Snowbird, Utah, USA) (SIGMOD ’14). Association for Computing Machinery, New York, NY, USA, 803–814. https://doi.org/10.1145/2588555.2612177
    [27]
    Marti A. Hearst. 2009. Search User Interfaces(1st ed.). Cambridge University Press, USA.
    [28]
    Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, and Douglas Eck. 2019. Music Transformer. In International Conference on Learning Representations. https://openreview.net/forum?id=rJe4ShAcF7
    [29]
    Sandeep Kaur Kuttal, Bali Ong, Kate Kwasny, and Peter Robe. 2021. Trade-Offs for Substituting a Human with an Agent in a Pair Programming Context: The Good, the Bad, and the Ugly. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 243, 20 pages. https://doi.org/10.1145/3411764.3445659
    [30]
    Toby Jia-Jun Li, Jingya Chen, Haijun Xia, Tom M. Mitchell, and Brad A. Myers. 2020. Multi-Modal Repairs of Conversational Breakdowns in Task-Oriented Dialogs. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 1094–1107. https://doi.org/10.1145/3379337.3415820
    [31]
    Xi Victoria Lin. 2017. Program Synthesis from Natural Language Using Recurrent Neural Networks. http://victorialin.net/pubs/tellina_tr_2017.pdf Accessed: 2021-04-06.
    [32]
    Xi Victoria Lin, Chenglong Wang, Luke Zettlemoyer, and Michael D. Ernst. 2018. NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan. https://www.aclweb.org/anthology/L18-1491
    [33]
    Ryan Louie, Andy Coenen, Cheng Zhi Huang, Michael Terry, and Carrie J. Cai. 2020. Novice-AI Music Co-Creation via AI-Steering Tools for Deep Generative Models. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376739
    [34]
    A. Narechania, A. Srinivasan, and J. Stasko. 2021. NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Language Queries. IEEE Transactions on Visualization and Computer Graphics 27, 2(2021), 369–379. https://doi.org/10.1109/TVCG.2020.3030378
    [35]
    Donald A. Norman. 2002. The Design of Everyday Things. Basic Books, Inc., USA.
    [36]
    Ana-Maria Popescu, Oren Etzioni, and Henry Kautz. 2003. Towards a Theory of Natural Language Interfaces to Databases. In Proceedings of the 8th International Conference on Intelligent User Interfaces (Miami, Florida, USA) (IUI ’03). Association for Computing Machinery, New York, NY, USA, 149–157. https://doi.org/10.1145/604045.604070
    [37]
    Kia Rahmani, Mohammad Raza, Sumit Gulwani, Vu Le, Dan Morris, Arjun Radhakrishna, Gustavo Soares, and Ashish Tiwari. 2021. Multi-modal Program Inference: a Marriage of Pre-trained Language Models and Component-based Synthesis. In OOPSLA. https://www.microsoft.com/en-us/research/publication/multi-modal-program-inference-a-marriage-of-pre-trained-language-models-and-component-based-synthesis/
    [38]
    Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen, Sebastian Gehrmann, Ellen Jiang, Mahima Pushkarna, Carey Radebaugh, Emily Reif, and Ann Yuan. 2020. The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models. arxiv:2008.05122 [cs.CL]
    [39]
    Gust Verbruggen, Vu Le, and Sumit Gulwani. 2021. Semantic programming by example with pre-trained models. In OOPSLA. ACM. https://www.microsoft.com/en-us/research/publication/semantic-programming-by-example-with-pre-trained-models/
    [40]
    Justin D. Weisz, Michael Muller, Stephanie Houde, John Richards, Steven I. Ross, Fernando Martinez, Mayank Agarwal, and Kartik Talamadupula. 2021. Perfection Not Required? Human-AI Partnerships in Code Translation. 26th International Conference on Intelligent User Interfaces (Apr 2021). https://doi.org/10.1145/3397481.3450656
    [41]
    Frank F. Xu, Bogdan Vasilescu, and Graham Neubig. 2021. In-IDE Code Generation from Natural Language: Promise and Challenges. arxiv:2101.11149 [cs.SE]
    [42]
    Tianyi Zhang, London Lowmanstone, Xinyu Wang, and Elena L. Glassman. 2020. Interactive Program Synthesis by Augmented Examples. Association for Computing Machinery, New York, NY, USA, 627–648. https://doi.org/10.1145/3379337.3415900

    Cited By

    View all
    • (2024)Significant Productivity Gains through Programming with Large Language ModelsProceedings of the ACM on Human-Computer Interaction10.1145/36611458:EICS(1-29)Online publication date: 17-Jun-2024
    • (2024)“It would work for me too”: How Online Communities Shape Software Developers’ Trust in AI-Powered Code Generation ToolsACM Transactions on Interactive Intelligent Systems10.1145/365199014:2(1-39)Online publication date: 15-May-2024
    • (2024)How much SPACE do metrics have in GenAI assisted software development?Proceedings of the 17th Innovations in Software Engineering Conference10.1145/3641399.3641419(1-5)Online publication date: 22-Feb-2024
    • Show More Cited By

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems
    April 2022
    10459 pages
    ISBN:9781450391573
    DOI:10.1145/3491102
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 April 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. code synthesis
    2. generative language models
    3. prompt programming

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    CHI '22
    Sponsor:
    CHI '22: CHI Conference on Human Factors in Computing Systems
    April 29 - May 5, 2022
    LA, New Orleans, USA

    Acceptance Rates

    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)621
    • Downloads (Last 6 weeks)26
    Reflects downloads up to 14 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Significant Productivity Gains through Programming with Large Language ModelsProceedings of the ACM on Human-Computer Interaction10.1145/36611458:EICS(1-29)Online publication date: 17-Jun-2024
    • (2024)“It would work for me too”: How Online Communities Shape Software Developers’ Trust in AI-Powered Code Generation ToolsACM Transactions on Interactive Intelligent Systems10.1145/365199014:2(1-39)Online publication date: 15-May-2024
    • (2024)How much SPACE do metrics have in GenAI assisted software development?Proceedings of the 17th Innovations in Software Engineering Conference10.1145/3641399.3641419(1-5)Online publication date: 22-Feb-2024
    • (2024)ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into PrinciplesProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645144(853-868)Online publication date: 18-Mar-2024
    • (2024)The Emerging Artifacts of Centralized Open-CodeProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3659019(1971-1983)Online publication date: 3-Jun-2024
    • (2024)Computing Education in the Era of Generative AICommunications of the ACM10.1145/362472067:2(56-67)Online publication date: 25-Jan-2024
    • (2024)A Taxonomy for Human-LLM Interaction Modes: An Initial ExplorationExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650786(1-11)Online publication date: 11-May-2024
    • (2024)The Metacognitive Demands and Opportunities of Generative AIProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642902(1-24)Online publication date: 11-May-2024
    • (2024)Learning Agent-based Modeling with LLM Companions: Experiences of Novices and Experts Using ChatGPT & NetLogo ChatProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642377(1-18)Online publication date: 11-May-2024
    • (2024)CoPrompt: Supporting Prompt Sharing and Referring in Collaborative Natural Language ProgrammingProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642212(1-21)Online publication date: 11-May-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media