skip to main content
10.5555/3618408.3619552guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Can large language models reason about program invariants?

Published: 23 July 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Identifying invariants is an important program analysis task with applications towards program understanding, bug finding, vulnerability analysis, and formal verification. Existing tools for identifying program invariants rely on dynamic analysis, requiring traces collected from multiple executions in order to produce reliable invariants. We study the application of large language models to invariant prediction, finding that models trained on source code and fine-tuned for invariant generation can perform invariant prediction as static rather than dynamic analysis. Using a scratch-pad approach where invariants are predicted sequentially through a program gives the best performance, finding invariants statically of quality comparable to those obtained by a dynamic analysis tool with access to five program traces.

    References

    [1]
    Anand, S., Burke, E. K., Chen, T. Y., Clark, J., Cohen, M. B., Grieskamp, W., Harman, M., Harrold, M. J., McMinn, P., Bertolino, A., Jenny Li, J., and Zhu, H. An orchestrated survey of methodologies for automated software test case generation. J. Syst. Softw., 86(8):1978-2001, August 2013.
    [2]
    Austin, J., Odena, A., Nye, M., Bosma, M., Michalewski, H., Dohan, D., Jiang, E., Cai, C., Terry, M., Le, Q., and Sutton, C. Program synthesis with large language models. August 2021.
    [3]
    Bieber, D., Sutton, C., Larochelle, H., and Tarlow, D. Learning to execute programs with instruction pointer attention graph neural networks. Advances in Neural Information Processing Systems, 33:8626-8637, 2020.
    [4]
    Bieber, D., Goel, R., Zheng, D., Larochelle, H., and Tarlow, D. Static prediction of runtime errors by learning to execute programs with external resource descriptions. March 2022.
    [5]
    Brockschmidt, M., Chen, Y., Cook, B., Kohli, P., and Tarlow, D. Learning to decipher the heap for program verification. In ICML Workshop on Constructive Machine Learning (CML), 2015.
    [6]
    Brockschmidt, M., Chen, Y., Kohli, P., Krishna, S., and Tarlow, D. Learning shape analysis. In Static Analysis Symposium (SAS), 2017.
    [7]
    Cadar, C., Dunbar, D., Engler, D. R., et al. KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, volume 8, pp. 209-224, 2008.
    [8]
    Chen, M., Tworek, J., Jun, H., Yuan, Q., Ponde, H., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P., Such, F., Cummings, D., Plappert, M., Chantzis, F., Barnes, E., Herbert-Voss, A., Guss, W., Nichol, A., Babuschkin, I., Balaji, S., Jain, S., Carr, A., Leike, J., Achiam, J., Misra, V., Morikawa, E., Radford, A., Knight, M., Brundage, M., Murati, M., Mayer, K., Welinder, P., McGrew, B., Amodei, D., McCandlish, S., Sutskever, I., and Zaremba, W. Evaluating large language models trained on code. July 2021.
    [9]
    Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., et al. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
    [10]
    Claessen, K. and Hughes, J. QuickCheck: a lightweight tool for random testing of haskell programs. In Proceedings of the fifth ACM SIGPLAN international conference on Functional programming, ICFP '00, pp. 268-279, New York, NY, USA, September 2000. Association for Computing Machinery.
    [11]
    Cousot, P. and Cousot, R. Abstract interpretation. In Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages - POPL '77, New York, New York, USA, 1977. ACM Press.
    [12]
    Dijkstra, E. W. Guarded commands, nondeterminacy and formal derivation of programs. Commun. ACM, 18(8): 453-457, 1975.
    [13]
    Ellis, K., Nye, M. I., Pu, Y., Sosa, F., Tenenbaum, J. B., and Solar-Lezama, A. Write, execute, assess: Program synthesis with a REPL. Advances in Neural Information Processing Systems {(NeurIPS)}, 2019.
    [14]
    Ernst, M. D., Perkins, J. H., Guo, P. J., McCamant, S., Pacheco, C., Tschantz, M. S., and Xiao, C. The daikon system for dynamic detection of likely invariants. Science of computer programming, 69(1-3):35-45, 2007.
    [15]
    Flanagan, C. and Leino, K. R. M. Houdini, an annotation assistant for esc/java. In FME 2001: Formal Methods for Increasing Software Productivity: International Symposium of Formal Methods Europe Berlin, Germany, March 12-16, 2001 Proceedings, pp. 500-517. Springer, 2001.
    [16]
    Galeotti, J. P., Furia, C. A., May, E., Fraser, G., and Zeller, A. Inferring loop invariants by mutation, dynamic analysis, and static checking. IEEE Transactions on Software Engineering, 41(10):1019-1037, 2015.
    [17]
    Garg, P., Löding, C., Madhusudan, P., and Neider, D. ICE: A robust framework for learning invariants. In Computer Aided Verification, Lecture notes in computer science, pp. 69-87. Springer International Publishing, Cham, 2014.
    [18]
    Garg, P., Neider, D., Madhusudan, P., and Roth, D. Learning invariants using decision trees and implication counter-examples. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '16, pp. 499-512, New York, NY, USA, January 2016. Association for Computing Machinery.
    [19]
    Hellendoorn, V. J., Devanbu, P. T., Polozov, O., and Marron, M. Are my invariants valid? a learning approach. March 2019.
    [20]
    Hoare, C. A. R. An axiomatic basis for computer programming. Commun. ACM, 12(10):576-580, October 1969.
    [21]
    Kaplan, J., McCandlish, S., Henighan, T. J., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D. Scaling laws for neural language models. ArXiv, abs/2001.08361, 2020.
    [22]
    Li, Y., Choi, D., Chung, J., Kushman, N., Schrittwieser, J., Leblond, R., Eccles, T., Keeling, J., Gimeno, F., Dal Lago, A., Hubert, T., Choy, P., de Masson d'Autume, C., Babuschkin, I., Chen, X., Huang, P.-S., Welbl, J., Gowal, S., Cherepanov, A., Molloy, J., Mankowitz, D., Sutherland Robson, E., Kohli, P., de Freitas, N., Kavukcuoglu, K., and Vinyals, O. Competition-level code generation with alphacode. arXiv preprint arXiv:2203.07814, 2022.
    [23]
    Liu, L., Sheridan, D., Tuohy, W., and Vasudevan, S. A technique for test coverage closure using GoldMine. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 31(5): 790-803, May 2012.
    [24]
    Nguyen, T., Antonopoulos, T., Ruef, A., and Hicks, M. Counterexample-guided approach to finding numerical invariants. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pp. 605-615, 2017.
    [25]
    Nye, M., Andreassen, A. J., Gur-Ari, G., Michalewski, H., Austin, J., Bieber, D., Dohan, D., Lewkowycz, A., Bosma, M., Luan, D., et al. Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114, 2021a.
    [26]
    Nye, M., Pu, Y., Bowers, M., Andreas, J., Tenenbaum, J. B., and Solar-Lezama, A. Representing partial programs with blended abstract semantics. In International Conference on Learning Representations (ICLR), 2021b.
    [27]
    Odena, A. and Sutton, C. Learning to represent programs with property signatures. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rylHspEKPr.
    [28]
    Pacheco, C. and Ernst, M. D. Eclat: Automatic generation and classification of test inputs. In ECOOP 2005 - Object-Oriented Programming, pp. 504-527. Springer Berlin Heidelberg, 2005.
    [29]
    Pacheco, C., Lahiri, S. K., Ernst, M. D., and Ball, T. Feedback-directed random test generation. In ICSE 2007, Proceedings of the 29th International Conference on Software Engineering, pp. 75-84, Minneapolis, MN, USA, May 2007.
    [30]
    Padhi, S., Sharma, R., and Millstein, T. Loopinvgen: A loop invariant generator based on precondition inference. arXiv preprint arXiv:1707.02029, 2017.
    [31]
    Petersen, M. K. A. An evaluation of daikon: A dynamic invariant detector.
    [32]
    Polikarpova, N., Ciupa, I., and Meyer, B. A comparative study of programmer-written and automatically inferred contracts. In Proceedings of the eighteenth international symposium on Software testing and analysis, ISSTA '09, pp. 93-104, New York, NY, USA, July 2009. Association for Computing Machinery.
    [33]
    Ryan, G., Wong, J., Yao, J., Gu, R., and Jana, S. Cln2inv: learning loop invariants with continuous logic networks. arXiv preprint arXiv:1909.11542, 2019.
    [34]
    Sagdeo, P., Athavale, V., Kowshik, S., and Vasudevan, S. PRECIS: Inferring invariants using program path guided clustering. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), pp. 532-535, November 2011.
    [35]
    Sharma, R. and Aiken, A. From invariant checking to invariant inference using randomized search. Formal Methods in System Design, 48(3):235-256, 2016.
    [36]
    Sharma, R., Nori, A. V., and Aiken, A. Interpolants as classifiers. In International Conference on Computer Aided Verification, 2012.
    [37]
    Sharma, R., Gupta, S., Hariharan, B., Aiken, A., Liang, P., and Nori, A. V. A data driven approach for algebraic loop invariants. In Programming Languages and Systems, pp. 574-592. Springer Berlin Heidelberg, 2013a.
    [38]
    Sharma, R., Gupta, S., Hariharan, B., Aiken, A., Liang, P., and Nori, A. V. A data driven approach for algebraic loop invariants. In European Symposium on Programming, pp. 574-592. Springer, 2013b.
    [39]
    Sharma, R., Gupta, S., Hariharan, B., Aiken, A., and Nori, A. V. Verification as learning geometric concepts. In Static Analysis, Lecture notes in computer science, pp. 388-411. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013c.
    [40]
    Shi, K., Dai, H., Ellis, K., and Sutton, C. CrossBeam: Learning to search in Bottom-Up program synthesis. In International Conference on Learning Representations (ICLR), 2022.
    [41]
    Si, X., Dai, H., Raghothaman, M., Naik, M., and Song, L. Learning loop invariants for program verification. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18, pp. 7762-7773, Red Hook, NY, USA, December 2018. Curran Associates Inc.
    [42]
    Si, X., Naik, A., Dai, H., Naik, M., and Song, L. Code2Inv: A deep learning framework for program verification. In Computer Aided Verification, Lecture notes in computer science, pp. 151-164. Springer International Publishing, Cham, 2020.
    [43]
    Tabachnyk, M. and Nikolov, S. Ml-enhanced code completion improves developer productivity. 2022. URL https://ai.googleblog.com/2022/07/ml-enhanced-code-completion-improves.html.
    [44]
    Zaremba, W. and Sutskever, I. Learning to execute. 2014.
    [45]
    Ziegler, A., Kalliamvakou, E., Li, X. A., Rice, A., Rifkin, D., Simister, S., Sittampalam, G., and Aftandilian, E. Productivity assessment of neural code completion. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, MAPS 2022, pp. 21-29, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450392730.
    [46]
    Zohar, A. and Wolf, L. Automatic program synthesis of long programs with a learned garbage collector. September 2018.

    Cited By

    View all
    • (2024)Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated ApproachProceedings of the ACM on Programming Languages10.1145/36498288:OOPSLA1(474-499)Online publication date: 29-Apr-2024

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ICML'23: Proceedings of the 40th International Conference on Machine Learning
    July 2023
    43479 pages

    Publisher

    JMLR.org

    Publication History

    Published: 23 July 2023

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated ApproachProceedings of the ACM on Programming Languages10.1145/36498288:OOPSLA1(474-499)Online publication date: 29-Apr-2024

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media