Explaining poor performance of text-based machine learning models for vulnerability detection

Napier, Kollin; Bhowmik, Tanmay; Chen, Zhiqian

doi:10.1007/s10664-024-10519-8

Explaining poor performance of text-based machine learning models for vulnerability detection

Published: 22 July 2024

Volume 29, article number 113, (2024)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

131 Accesses
1 Altmetric
Explore all metrics

Abstract

With an increase of severity in software vulnerabilities, machine learning models are being adopted to combat this threat. Given the possibilities towards usage of such models, research in this area has introduced various approaches. Although models may differ in performance, there is an overall lack of explainability in understanding how a model learns and predicts. Furthermore, recent research suggests that models perform poorly in detecting vulnerabilities when interpreting source code as text, known as “text-based” models. To help explain this poor performance, we explore the dimensions of explainability. From recent studies on text-based models, we experiment with removal of overlapping features present in training and testing datasets, deemed “cross-cutting”. We conduct scenario experiments removing such “cross-cutting” data and reassessing model performance. Based on the results, we examine how removal of these “cross-cutting” features may affect model performance. Our results show that removal of “cross-cutting” features may provide greater performance of models in general, thus leading to explainable dimensions regarding data dependency and agnostic models. Overall, we conclude that model performance can be improved, and explainable aspects of such models can be identified via empirical analysis of the models’ performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Bulgaria)

Instant access to the full article PDF.

Institutional subscriptions

An empirical study of text-based machine learning models for vulnerability detection

Article 03 February 2023

Text mining based an automatic model for software vulnerability severity prediction

Article 31 May 2024

Severity Prediction of Software Vulnerabilities Using Textual Data

Data Availability

The datasets generated during and/or analyzed during the current study are available in the “explainability_data” repository, https://github.com/krn65/explainability_data

Notes

References

Alenezi M, Zarour M (2020) On the relationship between software complexity and security. arXiv preprint arXiv:2002.07135
Ban X, Liu S, Chen C, Chua C (2019) A performance evaluation of deep-learnt features for software vulnerability detection. Concurr Comput Pract Exp 31(19):e5103. https://doi.org/10.1002/cpe.5103
Article Google Scholar
Bates S, Cozby P (2017) Methods in behavioral research. McGraw-Hill Education
Braz L, Aeberhard C, Çalikli G, Bacchelli A (2022) Less is more: supporting developers in vulnerability detection during code review. In: Proceedings of the 44th international conference on software engineering, pp 1317–1329
Burkart N, Huber MF (2021) A survey on the explainability of supervised machine learning. J Artif Intell Res 70:245–317
Article MathSciNet Google Scholar
Chernis B, Verma R (2018) Machine learning methods for software vulnerability detection. In: Proceedings of the fourth acm international workshop on security and privacy analytics, pp 31–39. https://doi.org/10.1145/3180445.3180453
Czerwonka J, Greiler M, Tilford J (2015) Code reviews do not find bugs. How the current code review best practice slows us down. In: 2015 IEEE/ACM 37th IEEE International conference on software engineering, vol 2. IEEE, pp 27–28
Duval A (2019) Explainable artificial intelligence (xai). N/A N/A:N/A. https://doi.org/10.13140/RG.2.2.24722.09929
Edmundson A, Holtkamp B, Rivera E, Finifter M, Mettler A, Wagner D (2013) An empirical study on the effectiveness of security code review. In: International symposium on engineering secure software and systems. Springer, pp 197–212
Fan J, Li Y, Wang S, Nguyen TN (2020) Ac/c\(+\) code vulnerability dataset with code changes and cve summaries. In: Proceedings of the 17th international conference on mining software repositories, pp 508–512. https://doi.org/10.1145/3379597.3387501
Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D et al (2020) Codebert: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155
Ghaffarian SM, Shahriari HR (2017) Software vulnerability analysis and discovery using machine-learning and data-mining techniques: a survey. ACM Comput Surv (CSUR) 50(4):1–36. https://doi.org/10.1145/3092566
Article Google Scholar
Gkortzis A, Feitosa D, Spinellis D (2021) Software reuse cuts both ways: an empirical analysis of its relationship with security vulnerabilities. J Syst Softw 172:110653
Article Google Scholar
Grieco G, Grinblat GL, Uzal L, Rawat S, Feist J, Mounier L (2016) Toward large-scale vulnerability discovery using machine learning. In: Proceedings of the sixth acm conference on data and application security and privacy, pp 85–96. https://doi.org/10.1145/2857705.2857720
Harer JA, Kim LY, Russell RL, Ozdemir O, Kosta LR, Rangamani A, Hamilton LH, Centeno GI, Key JR, Ellingwood PM et al (2018) Automated software vulnerability detection with machine learning. arXiv preprint arXiv:1803.04497, https://arxiv.org/abs/1803.04497
Hovsepyan A, Scandariato R, Joosen W, Walden J (2012) Software vulnerability prediction using text analysis techniques. In: Proceedings of the 4th international workshop on Security measurements and metrics, pp 7–10. https://doi.org/10.1145/2372225.2372230
Ijaz M, Durad MH, Ismail M (2019) Static and dynamic malware analysis using machine learning. In: 2019 16th International bhurban conference on applied sciences and technology (IBCAST). IEEE, pp 687–691. https://doi.org/10.1109/IBCAST.2019.8667136
Jie G, Xiao-Hui K, Qiang L (2016) Survey on software vulnerability analysis method based on machine learning. In: 2016 IEEE First International conference on data science in cyberspace (DSC). IEEE, pp 642–647. https://doi.org/10.1109/DSC.2016.33
Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621. https://doi.org/10.1080/01621459.1952.10483441
Article Google Scholar
Li Z, Zou D, Xu S, Chen Z, Zhu Y, Jin H (2021a) Vuldeelocator: a deep learning-based fine-grained vulnerability detector. IEEE Trans Dependable Secure Comput. https://doi.org/10.1109/TDSC.2021.3076142
Article Google Scholar
Li Z, Zou D, Xu S, Jin H, Zhu Y, Chen Z (2021b) Sysevr: a framework for using deep learning to detect software vulnerabilities. IEEE Trans Dependable Secure Comput. https://doi.org/10.1109/TDSC.2021.3051525
Article Google Scholar
Lin G, Zhang J, Luo W, Pan L, De Vel O, Montague P, Xiang Y (2019) Software vulnerability discovery via learning multi-domain knowledge bases. IEEE Trans Dependable Secure Comput. https://doi.org/10.1109/TDSC.2019.2954088
Article Google Scholar
Lin G, Wen S, Han QL, Zhang J, Xiang Y (2020) Software vulnerability detection using deep neural networks: a survey. Proc IEEE 108(10):1825–1848. https://doi.org/10.1109/JPROC.2020.2993293
Article Google Scholar
Lin G, Zhang J, Luo W, Pan L, Xiang Y (2017) Poster: vulnerability discovery with function representation learning from unlabeled projects. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 2539–2541. https://doi.org/10.1145/3133956.3138840
Lipton ZC (2018) The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue 16(3):31–57
Article Google Scholar
Liu S, Lin G, Qu L, Zhang J, De Vel O, Montague P, Xiang Y (2020) Cd-vuld: cross-domain vulnerability discovery based on deep domain adaptation. IEEE Trans Dependable Secure Comput. https://doi.org/10.1109/TDSC.2020.2984505
Article Google Scholar
Liu B, Shi L, Cai Z, Li M (2012) Software vulnerability discovery techniques: a survey. In: 2012 fourth international conference on multimedia information networking and security. IEEE, pp 152–156. https://doi.org/10.1109/MINES.2012.202
Li Z, Zou D, Xu S, Ou X, Jin H, Wang S, Deng Z, Zhong Y (2018) Vuldeepecker: a deep learning-based system for vulnerability detection. arXiv preprint arXiv:1801.01681, https://doi.org/10.14722/ndss.2018.23158
Mäntylä MV, Lassenius C (2008) What types of defects are really discovered in code reviews? IEEE Trans Software Eng 35(3):430–448. https://doi.org/10.1109/TSE.2008.71
Article Google Scholar
Molnar C (2018) A guide for making black box models explainable. https://christophmgithubio/interpretable-ml-book
Mosolygó B, Vándor N, Antal G, Hegedűs P, Ferenc R (2021) Towards a prototype based explainable javascript vulnerability prediction model. In: 2021 International conference on code quality (ICCQ). IEEE, pp 15–25
Murphy KP (2012) Machine learning: a probabilistic perspective. The MIT Press
Google Scholar
Napier K, Bhowmik T, Wang S (2023) An empirical study of text-based machine learning models for vulnerability detection. Empir Softw Eng 28(2):38
Article Google Scholar
Napier K, Bhowmik T (2022) Text-based machine learning models for cross-domain vulnerability prediction: Why they may not be effective? In: 2022 IEEE 23rd International conference on information reuse and integration for data science (IRI). IEEE, pp 158–163
Oliveira D, Rosenthal M, Morin N, Yeh KC, Cappos J, Zhuang Y (2014) It’s the psychology stupid: how heuristics explain software vulnerabilities and how priming can illuminate developer’s blind spots. In: Proceedings of the 30th annual computer security applications conference, pp 296–305
Perl H, Dechand S, Smith M, Arp D, Yamaguchi F, Rieck K, Fahl S, Acar Y (2015) Vccfinder: finding potential vulnerabilities in open-source projects to assist code audits. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp 426–437. https://doi.org/10.1145/2810103.2813604
Ribeiro MT, Singh S, Guestrin C (2016) Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386
Saleem H (2020) Naveed M (2020) Sok: anatomy of data breaches. Proc Priv Enhancing Technol 4:153–174
Article Google Scholar
Scandariato R, Walden J, Hovsepyan A, Joosen W (2014) Predicting vulnerable software components via text mining. IEEE Trans Software Eng 40(10):993–1006. https://doi.org/10.1109/TSE.2014.2340398
Article Google Scholar
Shahid MR, Debar H (2021) Cvss-bert: explainable natural language processing to determine the severity of a computer security vulnerability from its description. In: 2021 20th IEEE International conference on machine learning and applications (ICMLA). IEEE, pp 1600–1607
Shar LK, Briand LC, Tan HBK (2014) Web application vulnerability prediction using hybrid program analysis and machine learning. IEEE Trans Dependable Secure Comput 12(6):688–707. https://doi.org/10.1109/TDSC.2014.2373377
Article Google Scholar
Shin Y, Meneely A, Williams L, Osborne JA (2010) Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans Software Eng 37(6):772–787
Article Google Scholar
Sotgiu A, Pintor M, Biggio B (2022) Explainability-based debugging of machine learning for vulnerability discovery. In: Proceedings of the 17th international conference on availability, reliability and security, pp 1–8
Spreitzenbarth M, Schreck T, Echtler F, Arp D, Hoffmann J (2015) Mobile-sandbox: combining static and dynamic analysis with machine-learning techniques. Int J Inf Secur 14(2):141–153. https://doi.org/10.1007/s10207-014-0250-0
Article Google Scholar
Turpin A, Scholer F (2006) User performance versus precision measures for simple search tasks. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp 11–18
van Rijsbergen C (1979) Information Retrieval, 2nd edn. Butterworths, London
Google Scholar
Yamaguchi F, Lindner F, Rieck K (2011) Vulnerability extrapolation: assisted discovery of vulnerabilities using machine learning. In: Proceedings of the 5th USENIX conference on Offensive technologies, pp 13–13. https://doi.org/10.5555/2028052.2028065
Zeng P, Lin G, Pan L, Tai Y, Zhang J (2020) Software vulnerability analysis and discovery using deep learning techniques: a survey. IEEE Access 8:197158–197172
Article Google Scholar
Zhang Y, Chen X et al (2020) Explainable recommendation: a survey and new perspectives. Foundations and Trends® in Information Retrieval 14(1):1–101
Article Google Scholar
Zhou J, Gandomi AH, Chen F, Holzinger A (2021) Evaluating the quality of machine learning explanations: a survey on methods and metrics. Electronics 10(5):593
Article Google Scholar
Zhou Y, Liu S, Siow J, Du X, Liu Y (2019) Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Adv Neural Inf Process Syst 32
Zhu M (2004) Recall, precision and average precision. Department of Statistics and Actuarial Science, University of Waterloo, Waterloo 2(30):6

Download references

Author information

Authors and Affiliations

Mississippi Artificial Intelligence Network (MAIN), Mississippi Gulf Coast Community College, Perkinston, Mississippi, USA
Kollin Napier
Department of Computer Science and Engineering, Mississippi State University, Mississippi State, Mississippi, USA
Tanmay Bhowmik & Zhiqian Chen

Authors

Kollin Napier
View author publications
You can also search for this author in PubMed Google Scholar
Tanmay Bhowmik
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqian Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kollin Napier.

Ethics declarations

Conflicts of Interest

The authors of this manuscript have no conflicts of interest.

Additional information

Communicated by: Yuan Zhang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Additional Data and Results

Table 25 Average precision scores from Random Forest (RF) model when removing “cross-cutting” features from NC training data for projects

Explaining poor performance of text-based machine learning models for vulnerability detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An empirical study of text-based machine learning models for vulnerability detection

Text mining based an automatic model for software vulnerability severity prediction

Severity Prediction of Software Vulnerabilities Using Textual Data

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Additional information

Publisher's Note

Appendix A: Additional Data and Results

Appendix A: Additional Data and Results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation