skip to main content
10.1145/3549035.3561184acmconferencesArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
short-paper

SecurityEval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques

Published: 09 November 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Automated source code generation is currently a popular machine-learning-based task. It can be helpful for software developers to write functionally correct code from a given context. However, just like human developers, a code generation model can produce vulnerable code, which the developers can mistakenly use. For this reason, evaluating the security of a code generation model is a must. In this paper, we describe SecurityEval, an evaluation dataset to fulfill this purpose. It contains 130 samples for 75 vulnerability types, which are mapped to the Common Weakness Enumeration (CWE). We also demonstrate using our dataset to evaluate one open-source (i.e., InCoder) and one closed-source code generation model (i.e., GitHub Copilot).

    References

    [1]
    2022. Stack Overflow Developer Survey 2021. https://insights.stackoverflow.com/survey/2021 [Online; accessed 28. Aug. 2022]
    [2]
    Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR), 51, 4 (2018), 1–37.
    [3]
    Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. FlowDroid: Precise Context, Flow, Field, Object-Sensitive and Lifecycle-Aware Taint Analysis for Android Apps. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA. 259–269. isbn:9781450327848 https://doi.org/10.1145/2594291.2594299
    [4]
    Antonio Valerio Miceli Barone and Rico Sennrich. 2017. A parallel corpus of python functions and documentation strings for automated code documentation and code generation. arXiv preprint arXiv:1707.02275.
    [5]
    Stephen Cass. 2022. Top Programming Languages 2022. IEEE Spectrum, Aug., https://spectrum.ieee.org/top-programming-languages-2022
    [6]
    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, and Henrique Ponde de Oliveira Pinto. 2021. Evaluating Large Language Models Trained on Code. arxiv:2107.03374.
    [7]
    Bandit Developers. 2022. Bandit. https://bandit.readthedocs.io/en/latest/
    [8]
    Jiahao Fan, Yi Li, Shaohua Wang, and Tien N Nguyen. 2020. A C/C++ code vulnerability dataset with code changes and CVE summaries. In Proceedings of the 17th International Conference on Mining Software Repositories. 508–512.
    [9]
    Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, and Mike Lewis. 2022. InCoder: A Generative Model for Code Infilling and Synthesis. https://doi.org/10.48550/arXiv.2204.05999
    [10]
    Yuexiu Gao and Chen Lyu. 2022. M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source Code Summarization. arXiv preprint arXiv:2203.09707.
    [11]
    GitHub. 2022. CodeQL. https://github.com/github/codeql
    [12]
    Emanuele Iannone, Roberta Guadagni, Filomena Ferrucci, Andrea De Lucia, and Fabio Palomba. 2022. The Secret Life of Software Vulnerabilities: A Large-Scale Empirical Study. IEEE Transactions on Software Engineering.
    [13]
    GitHub Inc. 2022. GitHub Copilot : Your AI pair programmer. https://copilot.github.com
    [14]
    Maliheh Izadi, Roberta Gismondi, and Georgios Gousios. 2022. CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences. In 44th International Conference on Software Engineering (ICSE).
    [15]
    Seohyun Kim, Jinman Zhao, Yuchi Tian, and Satish Chandra. 2021. Code prediction by feeding trees to transformers. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 150–162.
    [16]
    Yujia Li, David Choi, Junyoung Chung, and Nate Kushman. 2022. Competition-Level Code Generation with AlphaCode. https://doi.org/10.48550/ARXIV.2203.07814
    [17]
    The MITRE Corporation (MITRE). 2022. Common Weakness Enumeration. https://cwe.mitre.org/
    [18]
    Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2022. A Conversational Paradigm for Program Synthesis. https://doi.org/10.48550/arXiv.2203.13474
    [19]
    Georgios Nikitopoulos, Konstantina Dritsa, Panos Louridas, and Dimitris Mitropoulos. 2021. CrossVul: a cross-language vulnerability dataset with commit data. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 1565–1569.
    [20]
    H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri. 2022. Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions. In 2022 2022 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA. 980–994. issn:2375-1207 https://doi.org/10.1109/SP46214.2022.00057
    [21]
    Serena Elisa Ponta, Henrik Plate, Antonino Sabetta, Michele Bezzi, and Cédric Dangremont. 2019. A manually-curated dataset of fixes to vulnerabilities of open-source software. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). 383–387.
    [22]
    Sofia Reis and Rui Abreu. 2021. A ground-truth dataset of real security patches. arXiv preprint arXiv:2110.09635.
    [23]
    SonarSource S.A. 2022. SonarSource static code analysis. https://rules.sonarsource.com
    [24]
    Joanna CS Santos, Anthony Peruma, Mehdi Mirakhorli, Matthias Galstery, Jairo Veloz Vidal, and Adriana Sejfia. 2017. Understanding software vulnerabilities related to architectural security tactics: An empirical investigation of chromium, php and thunderbird. In 2017 IEEE International Conference on Software Architecture (ICSA). 69–78.
    [25]
    Joanna CS Santos, Katy Tarrit, Adriana Sejfia, Mehdi Mirakhorli, and Matthias Galster. 2019. An empirical study of tactical vulnerabilities. Journal of Systems and Software, 149 (2019), 263–284.
    [26]
    Mohammed Latif Siddiq, Shafayat Hossain Majumder, Maisha Rahman Mim, Sourov Jajodia, and Joanna CS Santos. 2022. An Empirical Study of Code Smells in Transformer-based Code Generation Techniques. In 22nd IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE.
    [27]
    Zeyu Sun, Qihao Zhu, Yingfei Xiong, Yican Sun, Lili Mou, and Lu Zhang. 2020. Treegen: A tree-based transformer architecture for code generation. In Proceedings of the AAAI Conference on Artificial Intelligence. 34, 8984–8991.
    [28]
    Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. Intellicode compose: Code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1433–1443.
    [29]
    Alexey Svyatkovskiy, Sebastian Lee, Anna Hadjitofi, Maik Riechert, Juliana Vicente Franco, and Miltiadis Allamanis. 2021. Fast and memory-efficient neural code completion. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). 329–340.
    [30]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. https://doi.org/10.48550/ARXIV.1706.03762

    Cited By

    View all
    • (2024)CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)10.1109/SaTML59370.2024.00040(684-709)Online publication date: 9-Apr-2024
    • (2024)Just another copy and paste? Comparing the security vulnerabilities of ChatGPT generated code and StackOverflow answers2024 IEEE Security and Privacy Workshops (SPW)10.1109/SPW63631.2024.00014(87-94)Online publication date: 23-May-2024
    • (2024)Assessing the Security of GitHub Copilot's Generated Code - A Targeted Replication Study2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00051(435-444)Online publication date: 12-Mar-2024
    • Show More Cited By

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MSR4P&S 2022: Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security
    November 2022
    33 pages
    ISBN:9781450394574
    DOI:10.1145/3549035
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 November 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. code generation
    2. common weakness enumeration
    3. dataset
    4. security

    Qualifiers

    • Short-paper

    Conference

    MSR4P&S '22
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)438
    • Downloads (Last 6 weeks)28
    Reflects downloads up to 14 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)10.1109/SaTML59370.2024.00040(684-709)Online publication date: 9-Apr-2024
    • (2024)Just another copy and paste? Comparing the security vulnerabilities of ChatGPT generated code and StackOverflow answers2024 IEEE Security and Privacy Workshops (SPW)10.1109/SPW63631.2024.00014(87-94)Online publication date: 23-May-2024
    • (2024)Assessing the Security of GitHub Copilot's Generated Code - A Targeted Replication Study2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00051(435-444)Online publication date: 12-Mar-2024
    • (2024)The Recent Trends of Research on GitHub Copilot: A Systematic ReviewComputing and Informatics10.1007/978-981-99-9589-9_27(355-366)Online publication date: 26-Jan-2024
    • (2023)Large Language Models for Code: Security Hardening and Adversarial TestingProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623175(1865-1879)Online publication date: 15-Nov-2023
    • (2023)MSR4P&S 2022 Workshop SummaryACM SIGSOFT Software Engineering Notes10.1145/3573074.357310048:1(97-100)Online publication date: 17-Jan-2023
    • (2023)PwnPilot: Reflections on Trusting Trust in the Age of Large Language Models and AI Code Assistants2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE)10.1109/CSCE60160.2023.00396(2457-2464)Online publication date: 24-Jul-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media