short-paper

SecurityEval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques

Authors:

Mohammed Latif Siddiq,

Joanna C. S. SantosAuthors Info & Claims

MSR4P&S 2022: Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security

Pages 29 - 33

https://doi.org/10.1145/3549035.3561184

Published: 09 November 2022 Publication History

Abstract

Automated source code generation is currently a popular machine-learning-based task. It can be helpful for software developers to write functionally correct code from a given context. However, just like human developers, a code generation model can produce vulnerable code, which the developers can mistakenly use. For this reason, evaluating the security of a code generation model is a must. In this paper, we describe SecurityEval, an evaluation dataset to fulfill this purpose. It contains 130 samples for 75 vulnerability types, which are mapped to the Common Weakness Enumeration (CWE). We also demonstrate using our dataset to evaluate one open-source (i.e., InCoder) and one closed-source code generation model (i.e., GitHub Copilot).

References

[1]

2022. Stack Overflow Developer Survey 2021. https://insights.stackoverflow.com/survey/2021 [Online; accessed 28. Aug. 2022]

[2]

Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR), 51, 4 (2018), 1–37.

Digital Library

[3]

Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. FlowDroid: Precise Context, Flow, Field, Object-Sensitive and Lifecycle-Aware Taint Analysis for Android Apps. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA. 259–269. isbn:9781450327848 https://doi.org/10.1145/2594291.2594299

Digital Library

[4]

Antonio Valerio Miceli Barone and Rico Sennrich. 2017. A parallel corpus of python functions and documentation strings for automated code documentation and code generation. arXiv preprint arXiv:1707.02275.

[5]

Stephen Cass. 2022. Top Programming Languages 2022. IEEE Spectrum, Aug., https://spectrum.ieee.org/top-programming-languages-2022

[6]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, and Henrique Ponde de Oliveira Pinto. 2021. Evaluating Large Language Models Trained on Code. arxiv:2107.03374.

[7]

Bandit Developers. 2022. Bandit. https://bandit.readthedocs.io/en/latest/

[8]

Jiahao Fan, Yi Li, Shaohua Wang, and Tien N Nguyen. 2020. A C/C++ code vulnerability dataset with code changes and CVE summaries. In Proceedings of the 17th International Conference on Mining Software Repositories. 508–512.

Digital Library

[9]

Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, and Mike Lewis. 2022. InCoder: A Generative Model for Code Infilling and Synthesis. https://doi.org/10.48550/arXiv.2204.05999

[10]

Yuexiu Gao and Chen Lyu. 2022. M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source Code Summarization. arXiv preprint arXiv:2203.09707.

[11]

GitHub. 2022. CodeQL. https://github.com/github/codeql

[12]

Emanuele Iannone, Roberta Guadagni, Filomena Ferrucci, Andrea De Lucia, and Fabio Palomba. 2022. The Secret Life of Software Vulnerabilities: A Large-Scale Empirical Study. IEEE Transactions on Software Engineering.

[13]

GitHub Inc. 2022. GitHub Copilot : Your AI pair programmer. https://copilot.github.com

[14]

Maliheh Izadi, Roberta Gismondi, and Georgios Gousios. 2022. CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences. In 44th International Conference on Software Engineering (ICSE).

Digital Library

[15]

Seohyun Kim, Jinman Zhao, Yuchi Tian, and Satish Chandra. 2021. Code prediction by feeding trees to transformers. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 150–162.

Digital Library

[16]

Yujia Li, David Choi, Junyoung Chung, and Nate Kushman. 2022. Competition-Level Code Generation with AlphaCode. https://doi.org/10.48550/ARXIV.2203.07814

[17]

The MITRE Corporation (MITRE). 2022. Common Weakness Enumeration. https://cwe.mitre.org/

[18]

Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2022. A Conversational Paradigm for Program Synthesis. https://doi.org/10.48550/arXiv.2203.13474

[19]

Georgios Nikitopoulos, Konstantina Dritsa, Panos Louridas, and Dimitris Mitropoulos. 2021. CrossVul: a cross-language vulnerability dataset with commit data. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 1565–1569.

Digital Library

[20]

H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri. 2022. Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions. In 2022 2022 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA. 980–994. issn:2375-1207 https://doi.org/10.1109/SP46214.2022.00057

[21]

Serena Elisa Ponta, Henrik Plate, Antonino Sabetta, Michele Bezzi, and Cédric Dangremont. 2019. A manually-curated dataset of fixes to vulnerabilities of open-source software. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). 383–387.

Digital Library

[22]

Sofia Reis and Rui Abreu. 2021. A ground-truth dataset of real security patches. arXiv preprint arXiv:2110.09635.

[23]

SonarSource S.A. 2022. SonarSource static code analysis. https://rules.sonarsource.com

[24]

Joanna CS Santos, Anthony Peruma, Mehdi Mirakhorli, Matthias Galstery, Jairo Veloz Vidal, and Adriana Sejfia. 2017. Understanding software vulnerabilities related to architectural security tactics: An empirical investigation of chromium, php and thunderbird. In 2017 IEEE International Conference on Software Architecture (ICSA). 69–78.

[25]

Joanna CS Santos, Katy Tarrit, Adriana Sejfia, Mehdi Mirakhorli, and Matthias Galster. 2019. An empirical study of tactical vulnerabilities. Journal of Systems and Software, 149 (2019), 263–284.

[26]

Mohammed Latif Siddiq, Shafayat Hossain Majumder, Maisha Rahman Mim, Sourov Jajodia, and Joanna CS Santos. 2022. An Empirical Study of Code Smells in Transformer-based Code Generation Techniques. In 22nd IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE.

[27]

Zeyu Sun, Qihao Zhu, Yingfei Xiong, Yican Sun, Lili Mou, and Lu Zhang. 2020. Treegen: A tree-based transformer architecture for code generation. In Proceedings of the AAAI Conference on Artificial Intelligence. 34, 8984–8991.

[28]

Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. Intellicode compose: Code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1433–1443.

Digital Library

[29]

Alexey Svyatkovskiy, Sebastian Lee, Anna Hadjitofi, Maik Riechert, Juliana Vicente Franco, and Miltiadis Allamanis. 2021. Fast and memory-efficient neural code completion. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). 329–340.

[30]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. https://doi.org/10.48550/ARXIV.1706.03762

Cited By

Hajipour HHassler KHolz TSchönherr LFritz M(2024)CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)10.1109/SaTML59370.2024.00040(684-709)Online publication date: 9-Apr-2024
https://doi.org/10.1109/SaTML59370.2024.00040
Hamer Sd’Amorim MWilliams L(2024)Just another copy and paste? Comparing the security vulnerabilities of ChatGPT generated code and StackOverflow answers2024 IEEE Security and Privacy Workshops (SPW)10.1109/SPW63631.2024.00014(87-94)Online publication date: 23-May-2024
https://doi.org/10.1109/SPW63631.2024.00014
Majdinasab VBishop MRasheed SMoradidakhel ATahir AKhomh F(2024)Assessing the Security of GitHub Copilot's Generated Code - A Targeted Replication Study2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00051(435-444)Online publication date: 12-Mar-2024
https://doi.org/10.1109/SANER60148.2024.00051
Show More Cited By

Index Terms

SecurityEval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques
1. Security and privacy
  1. Software and application security
    1. Software security engineering
2. Software and its engineering
  1. Software creation and management
    1. Software development techniques
    2. Software verification and validation

Recommendations

ml-Codesmell: A code smell prediction dataset for machine learning approaches
SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology

In recent years, many studies on detecting code smells in source code have published datasets with limited characteristics, such as the ambiguity of code smell definitions leads to different interpretations for each code smell, the number of samples of ...
Automating Code Generation for MDE Using Machine Learning
ICSE '23: Proceedings of the 45th International Conference on Software Engineering: Companion Proceedings

The overall aim of our research is to improve the techniques for synthesizing code generators in the Model-Driven Engineering (MDE) context. Code generation is one of the main elements of Model-Driven Engineering, involving transformation from ...
Code smell detection based on supervised learning models: A survey
Abstract
Supervised learning-based code smell detection has become one of the dominant approaches to identify code smell. Existing works optimize the process of code smell detection from multiple aspects, such as high-quality dataset, feature selection, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MSR4P&S 2022: Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security

November 2022

33 pages

ISBN:9781450394574

DOI:10.1145/3549035

General Chairs:
Melina Vidoni
Australian National University, Australia
,
Nicolás E. Díaz Ferreyra
Hamburg University of Technology, Germany
,
Zadia Codabux
University of Saskatchewan, Canada

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

MSR4P&S '22

Sponsor:

MSR4P&S '22: 1st International Workshop on Mining Software Repositories Applications for Privacy and Security

November 18, 2022

Singapore, Singapore

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
673
Total Downloads

Downloads (Last 12 months)438
Downloads (Last 6 weeks)28

Reflects downloads up to 14 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hajipour HHassler KHolz TSchönherr LFritz M(2024)CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)10.1109/SaTML59370.2024.00040(684-709)Online publication date: 9-Apr-2024
https://doi.org/10.1109/SaTML59370.2024.00040
Hamer Sd’Amorim MWilliams L(2024)Just another copy and paste? Comparing the security vulnerabilities of ChatGPT generated code and StackOverflow answers2024 IEEE Security and Privacy Workshops (SPW)10.1109/SPW63631.2024.00014(87-94)Online publication date: 23-May-2024
https://doi.org/10.1109/SPW63631.2024.00014
Majdinasab VBishop MRasheed SMoradidakhel ATahir AKhomh F(2024)Assessing the Security of GitHub Copilot's Generated Code - A Targeted Replication Study2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00051(435-444)Online publication date: 12-Mar-2024
https://doi.org/10.1109/SANER60148.2024.00051
Ani ZHamid ZZhamri N(2024)The Recent Trends of Research on GitHub Copilot: A Systematic ReviewComputing and Informatics10.1007/978-981-99-9589-9_27(355-366)Online publication date: 26-Jan-2024
https://doi.org/10.1007/978-981-99-9589-9_27
He JVechev MMeng WJensen CCremers CKirda E(2023)Large Language Models for Code: Security Hardening and Adversarial TestingProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623175(1865-1879)Online publication date: 15-Nov-2023
https://dl.acm.org/doi/10.1145/3576915.3623175
Vidoni MDíaz Ferreyra NCodabux Z(2023)MSR4P&S 2022 Workshop SummaryACM SIGSOFT Software Engineering Notes10.1145/3573074.357310048:1(97-100)Online publication date: 17-Jan-2023
https://dl.acm.org/doi/10.1145/3573074.3573100
Horne D(2023)PwnPilot: Reflections on Trusting Trust in the Age of Large Language Models and AI Code Assistants2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE)10.1109/CSCE60160.2023.00396(2457-2464)Online publication date: 24-Jul-2023
https://doi.org/10.1109/CSCE60160.2023.00396

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents