skip to main content
10.1145/3520312.3534869acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Syntax-guided program reduction for understanding neural code intelligence models

Published: 13 June 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Neural code intelligence (CI) models are opaque black-boxes and offer little insight on the features they use in making predictions. This opacity may lead to distrust in their prediction and hamper their wider adoption in safety-critical applications. Recently, input program reduction techniques have been proposed to identify key features in the input programs to improve the transparency of CI models. However, this approach is syntax-unaware and does not consider the grammar of the programming language.
    In this paper, we apply a syntax-guided program reduction technique that considers the grammar of the input programs during reduction. Our experiments on multiple models across different types of input programs show that the syntax-guided program reduction technique is faster and provides smaller sets of key tokens in reduced programs. We also show that the key tokens could be used in generating adversarial examples for up to 65% of the input programs.

    References

    [1]
    Miltiadis Allamanis. 2019. The Adverse Effects of Code Duplication in Machine Learning Models of Code. In Proceedings of the ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward! 2019). ACM New York, NY, USA, 143–153. https://doi.org/10.1145/3359591.3359735
    [2]
    Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2014. Learning Natural Coding Conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). Association for Computing Machinery, New York, NY, USA. 281–293. https://doi.org/10.1145/2635868.2635883
    [3]
    Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2015. Suggesting Accurate Method and Class Names. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). Association for Computing Machinery, New York, NY, USA. 38–49. https://doi.org/10.1145/2786805.2786849
    [4]
    Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A Survey of Machine Learning for Big Code and Naturalness. In ACM Computing Surveys. 51, Association for Computing Machinery, New York, NY, USA. Article 81, 37 pages. https://doi.org/10.1145/3212695
    [5]
    Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In International Conference on Learning Representations (ICLR 2018). OpenReview.net, Open Access. https://openreview.net/forum?id=BJOFETxR-
    [6]
    Miltiadis Allamanis, Hao Peng, and Charles A. Sutton. 2016. A Convolutional Attention Network for Extreme Summarization of Source Code. In Proceedings of the 33nd International Conference on Machine Learning (ICML 2016). Proceedings of Machine Learning Research (PMLR), Open Access. 2091–2100. http://proceedings.mlr.press/v48/allamanis16.html
    [7]
    Uri Alon, Omer Levy, and Eran Yahav. 2019. code2seq: Generating Sequences from Structured Representations of Code. In International Conference on Learning Representations (ICLR 2019). OpenReview.net, Open Access. https://openreview.net/forum?id=H1gKYo09tX
    [8]
    Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. Code2vec: Learning Distributed Representations of Code. In Proceedings of the ACM on Programming Languages (PACMPL 2019, Vol. 3). Association for Computing Machinery, New York, NY, USA. 40:1–40:29. https://doi.org/10.1145/3290353
    [9]
    Nghi D. Q. Bui, Yijun Yu, and Lingxiao Jiang. 2019. AutoFocus: Interpreting Attention-Based Neural Networks by Code Perturbation. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE 2019). IEEE Press, New York, NY, USA. 38–41. https://doi.org/10.1109/ASE.2019.00014
    [10]
    Zimin Chen and Martin Monperrus. 2019. A Literature Study of Embeddings on Source Code. arxiv:1904.03061. arxiv:1904.03061
    [11]
    Rhys Compton, Eibe Frank, Panos Patros, and Abigail Koay. 2020. Embedding Java Classes with Code2vec: Improvements from Variable Obfuscation. In Proceedings of the 17th International Conference on Mining Software Repositories (MSR 2020). Association for Computing Machinery, New York, NY, USA. 243–253. https://doi.org/10.1145/3379597.3387445
    [12]
    Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis. 2018. Deep Learning Type Inference. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA. 152–162. https://doi.org/10.1145/3236024.3236051
    [13]
    Hong Jin Kang, Tegawendé F. Bissyandé, and David Lo. 2019. Assessing the Generalizability of Code2vec Token Embeddings. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE 2019). IEEE Press, New York, NY, USA. 1–12. https://doi.org/10.1109/ASE.2019.00011
    [14]
    Md Rafiqul Islam Rabin and Mohammad Amin Alipour. 2020. Evaluation of Generalizability of Neural Program Analyzers under Semantic-Preserving Transformations. arxiv:2004.07313. arxiv:2004.07313
    [15]
    Md Rafiqul Islam Rabin and Mohammad Amin Alipour. 2021. Code2Snapshot: Using Code Snapshots for Learning Representations of Source Code. arxiv:2111.01097. arxiv:2111.01097
    [16]
    Md Rafiqul Islam Rabin, Nghi D.Q. Bui, Ke Wang, Yijun Yu, Lingxiao Jiang, and Mohammad Amin Alipour. 2021. On the generalizability of Neural Program Models with respect to semantic-preserving program transformations. In Information and Software Technology. 135, Elsevier, Amsterdam, Netherlands. 106552. https://doi.org/10.1016/j.infsof.2021.106552
    [17]
    Md Rafiqul Islam Rabin, Vincent J. Hellendoorn, and Mohammad Amin Alipour. 2021. Artifact for Article (SIVAND): Understanding Neural Code Intelligence Through Program Simplification. ACM Digital Library, ESEC/FSE, https://doi.org/10.1145/3462296
    [18]
    Md Rafiqul Islam Rabin, Vincent J. Hellendoorn, and Mohammad Amin Alipour. 2021. Understanding Neural Code Intelligence through Program Simplification. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, USA. 441–452. https://doi.org/10.1145/3468264.3468539
    [19]
    Md Rafiqul Islam Rabin, Aftab Hussain, and Mohammad Amin Alipour. 2022. Artifact for Article (CI-DD-Perses): Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models. Zenodo, MAPS, https://doi.org/10.5281/zenodo.6630188
    [20]
    Md Rafiqul Islam Rabin, Aftab Hussain, Mohammad Amin Alipour, and Vincent J. Hellendoorn. 2021. Memorization and Generalization in Neural Code Intelligence Models. arxiv:2106.08704. arxiv:2106.08704
    [21]
    Md Rafiqul Islam Rabin, Arjun Mukherjee, Omprakash Gnawali, and Mohammad Amin Alipour. 2020. Towards Demystifying Dimensions of Source Code Embeddings. In Proceedings of the 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Languages (RL+SE&PL 2020). Association for Computing Machinery, New York, NY, USA. 29–38. https://doi.org/10.1145/3416506.3423580
    [22]
    Md Rafiqul Islam Rabin, Ke Wang, and Mohammad Amin Alipour. 2019. Testing Neural Program Analyzers. 34th IEEE/ACM International Conference on Automated Software Engineering (Late Breaking Results-Track), arxiv:1908.10711
    [23]
    Tushar Sharma, Maria Kechagia, Stefanos Georgiou, Rohit Tiwari, and Federica Sarro. 2021. A Survey on Machine Learning Techniques for Source Code Analysis. arxiv:2110.09610. arxiv:2110.09610
    [24]
    Chengnian Sun, Yuanbo Li, Qirun Zhang, Tianxiao Gu, and Zhendong Su. 2018. Perses: Syntax-Guided Program Reduction. In Proceedings of the 40th International Conference on Software Engineering (ICSE 2018). Association for Computing Machinery, New York, NY, USA. 361–371. https://doi.org/10.1145/3180155.3180236
    [25]
    Sahil Suneja, Yunhui Zheng, Yufan Zhuang, Jim A. Laredo, and Alessandro Morari. 2021. Probing Model Signal-Awareness via Prediction-Preserving Input Minimization. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, USA. 945–955. https://doi.org/10.1145/3468264.3468545
    [26]
    Yu Wang, Fengjuan Gao, and Linzhang Wang. 2021. Demystifying code summarization models. arxiv:2102.04625. arxiv:2102.04625
    [27]
    Noam Yefet, Uri Alon, and Eran Yahav. 2020. Adversarial Examples for Models of Code. In Proceedings of the ACM on Programming Languages (PACMPL 2020, Vol. 4). Association for Computing Machinery, New York, NY, USA. 162:1–162:30. https://doi.org/10.1145/3428230
    [28]
    Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and Isolating Failure-Inducing Input. In IEEE Transactions on Software Engineering. 28, IEEE Press, New York, NY, USA. 183–200. https://doi.org/10.1109/32.988498

    Cited By

    View all
    • (2024)Beyond Fidelity: Explaining Vulnerability Localization of Learning-Based DetectorsACM Transactions on Software Engineering and Methodology10.1145/364154333:5(1-33)Online publication date: 4-Jun-2024
    • (2023)Structural-semantics Guided Program Simplification for Understanding Neural Code Intelligence ModelsProceedings of the 14th Asia-Pacific Symposium on Internetware10.1145/3609437.3609438(1-11)Online publication date: 4-Aug-2023
    • (2023)Type Batched Program ReductionProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598065(398-410)Online publication date: 12-Jul-2023
    • Show More Cited By

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MAPS 2022: Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming
    June 2022
    79 pages
    ISBN:9781450392730
    DOI:10.1145/3520312
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. Feature Engineering
    2. Interpretability
    3. Neural Models of Source Code
    4. Program Reduction
    5. Transparency

    Qualifiers

    • Research-article

    Conference

    MAPS '22
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Beyond Fidelity: Explaining Vulnerability Localization of Learning-Based DetectorsACM Transactions on Software Engineering and Methodology10.1145/364154333:5(1-33)Online publication date: 4-Jun-2024
    • (2023)Structural-semantics Guided Program Simplification for Understanding Neural Code Intelligence ModelsProceedings of the 14th Asia-Pacific Symposium on Internetware10.1145/3609437.3609438(1-11)Online publication date: 4-Aug-2023
    • (2023)Type Batched Program ReductionProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598065(398-410)Online publication date: 12-Jul-2023
    • (2023)Study of Distractors in Neural Models of Code2023 IEEE/ACM International Workshop on Interpretability and Robustness in Neural Software Engineering (InteNSE)10.1109/InteNSE59150.2023.00005(1-7)Online publication date: May-2023
    • (2023)Interpreting CodeBERT for Semantic Code Clone Detection2023 30th Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC60848.2023.00033(229-238)Online publication date: 4-Dec-2023
    • (2023)Memorization and generalization in neural code intelligence modelsInformation and Software Technology10.1016/j.infsof.2022.107066153:COnline publication date: 1-Jan-2023
    • (2022)Code2Snapshot: Using Code Snapshots for Learning Representations of Source Code2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA55696.2022.00140(843-848)Online publication date: Dec-2022

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media