Abstract
With an increase of severity in software vulnerabilities, machine learning models are being adopted to combat this threat. Given the possibilities towards usage of such models, research in this area has introduced various approaches. Although models may differ in performance, there is an overall lack of explainability in understanding how a model learns and predicts. Furthermore, recent research suggests that models perform poorly in detecting vulnerabilities when interpreting source code as text, known as “text-based” models. To help explain this poor performance, we explore the dimensions of explainability. From recent studies on text-based models, we experiment with removal of overlapping features present in training and testing datasets, deemed “cross-cutting”. We conduct scenario experiments removing such “cross-cutting” data and reassessing model performance. Based on the results, we examine how removal of these “cross-cutting” features may affect model performance. Our results show that removal of “cross-cutting” features may provide greater performance of models in general, thus leading to explainable dimensions regarding data dependency and agnostic models. Overall, we conclude that model performance can be improved, and explainable aspects of such models can be identified via empirical analysis of the models’ performance.
Similar content being viewed by others
Data Availability
The datasets generated during and/or analyzed during the current study are available in the “explainability_data” repository, https://github.com/krn65/explainability_data
References
Alenezi M, Zarour M (2020) On the relationship between software complexity and security. arXiv preprint arXiv:2002.07135
Ban X, Liu S, Chen C, Chua C (2019) A performance evaluation of deep-learnt features for software vulnerability detection. Concurr Comput Pract Exp 31(19):e5103. https://doi.org/10.1002/cpe.5103
Bates S, Cozby P (2017) Methods in behavioral research. McGraw-Hill Education
Braz L, Aeberhard C, Çalikli G, Bacchelli A (2022) Less is more: supporting developers in vulnerability detection during code review. In: Proceedings of the 44th international conference on software engineering, pp 1317–1329
Burkart N, Huber MF (2021) A survey on the explainability of supervised machine learning. J Artif Intell Res 70:245–317
Chernis B, Verma R (2018) Machine learning methods for software vulnerability detection. In: Proceedings of the fourth acm international workshop on security and privacy analytics, pp 31–39. https://doi.org/10.1145/3180445.3180453
Czerwonka J, Greiler M, Tilford J (2015) Code reviews do not find bugs. How the current code review best practice slows us down. In: 2015 IEEE/ACM 37th IEEE International conference on software engineering, vol 2. IEEE, pp 27–28
Duval A (2019) Explainable artificial intelligence (xai). N/A N/A:N/A. https://doi.org/10.13140/RG.2.2.24722.09929
Edmundson A, Holtkamp B, Rivera E, Finifter M, Mettler A, Wagner D (2013) An empirical study on the effectiveness of security code review. In: International symposium on engineering secure software and systems. Springer, pp 197–212
Fan J, Li Y, Wang S, Nguyen TN (2020) Ac/c\(+\) code vulnerability dataset with code changes and cve summaries. In: Proceedings of the 17th international conference on mining software repositories, pp 508–512. https://doi.org/10.1145/3379597.3387501
Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D et al (2020) Codebert: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155
Ghaffarian SM, Shahriari HR (2017) Software vulnerability analysis and discovery using machine-learning and data-mining techniques: a survey. ACM Comput Surv (CSUR) 50(4):1–36. https://doi.org/10.1145/3092566
Gkortzis A, Feitosa D, Spinellis D (2021) Software reuse cuts both ways: an empirical analysis of its relationship with security vulnerabilities. J Syst Softw 172:110653
Grieco G, Grinblat GL, Uzal L, Rawat S, Feist J, Mounier L (2016) Toward large-scale vulnerability discovery using machine learning. In: Proceedings of the sixth acm conference on data and application security and privacy, pp 85–96. https://doi.org/10.1145/2857705.2857720
Harer JA, Kim LY, Russell RL, Ozdemir O, Kosta LR, Rangamani A, Hamilton LH, Centeno GI, Key JR, Ellingwood PM et al (2018) Automated software vulnerability detection with machine learning. arXiv preprint arXiv:1803.04497, https://arxiv.org/abs/1803.04497
Hovsepyan A, Scandariato R, Joosen W, Walden J (2012) Software vulnerability prediction using text analysis techniques. In: Proceedings of the 4th international workshop on Security measurements and metrics, pp 7–10. https://doi.org/10.1145/2372225.2372230
Ijaz M, Durad MH, Ismail M (2019) Static and dynamic malware analysis using machine learning. In: 2019 16th International bhurban conference on applied sciences and technology (IBCAST). IEEE, pp 687–691. https://doi.org/10.1109/IBCAST.2019.8667136
Jie G, Xiao-Hui K, Qiang L (2016) Survey on software vulnerability analysis method based on machine learning. In: 2016 IEEE First International conference on data science in cyberspace (DSC). IEEE, pp 642–647. https://doi.org/10.1109/DSC.2016.33
Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621. https://doi.org/10.1080/01621459.1952.10483441
Li Z, Zou D, Xu S, Chen Z, Zhu Y, Jin H (2021a) Vuldeelocator: a deep learning-based fine-grained vulnerability detector. IEEE Trans Dependable Secure Comput. https://doi.org/10.1109/TDSC.2021.3076142
Li Z, Zou D, Xu S, Jin H, Zhu Y, Chen Z (2021b) Sysevr: a framework for using deep learning to detect software vulnerabilities. IEEE Trans Dependable Secure Comput. https://doi.org/10.1109/TDSC.2021.3051525
Lin G, Zhang J, Luo W, Pan L, De Vel O, Montague P, Xiang Y (2019) Software vulnerability discovery via learning multi-domain knowledge bases. IEEE Trans Dependable Secure Comput. https://doi.org/10.1109/TDSC.2019.2954088
Lin G, Wen S, Han QL, Zhang J, Xiang Y (2020) Software vulnerability detection using deep neural networks: a survey. Proc IEEE 108(10):1825–1848. https://doi.org/10.1109/JPROC.2020.2993293
Lin G, Zhang J, Luo W, Pan L, Xiang Y (2017) Poster: vulnerability discovery with function representation learning from unlabeled projects. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 2539–2541. https://doi.org/10.1145/3133956.3138840
Lipton ZC (2018) The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue 16(3):31–57
Liu S, Lin G, Qu L, Zhang J, De Vel O, Montague P, Xiang Y (2020) Cd-vuld: cross-domain vulnerability discovery based on deep domain adaptation. IEEE Trans Dependable Secure Comput. https://doi.org/10.1109/TDSC.2020.2984505
Liu B, Shi L, Cai Z, Li M (2012) Software vulnerability discovery techniques: a survey. In: 2012 fourth international conference on multimedia information networking and security. IEEE, pp 152–156. https://doi.org/10.1109/MINES.2012.202
Li Z, Zou D, Xu S, Ou X, Jin H, Wang S, Deng Z, Zhong Y (2018) Vuldeepecker: a deep learning-based system for vulnerability detection. arXiv preprint arXiv:1801.01681, https://doi.org/10.14722/ndss.2018.23158
Mäntylä MV, Lassenius C (2008) What types of defects are really discovered in code reviews? IEEE Trans Software Eng 35(3):430–448. https://doi.org/10.1109/TSE.2008.71
Molnar C (2018) A guide for making black box models explainable. https://christophmgithubio/interpretable-ml-book
Mosolygó B, Vándor N, Antal G, Hegedűs P, Ferenc R (2021) Towards a prototype based explainable javascript vulnerability prediction model. In: 2021 International conference on code quality (ICCQ). IEEE, pp 15–25
Murphy KP (2012) Machine learning: a probabilistic perspective. The MIT Press
Napier K, Bhowmik T, Wang S (2023) An empirical study of text-based machine learning models for vulnerability detection. Empir Softw Eng 28(2):38
Napier K, Bhowmik T (2022) Text-based machine learning models for cross-domain vulnerability prediction: Why they may not be effective? In: 2022 IEEE 23rd International conference on information reuse and integration for data science (IRI). IEEE, pp 158–163
Oliveira D, Rosenthal M, Morin N, Yeh KC, Cappos J, Zhuang Y (2014) It’s the psychology stupid: how heuristics explain software vulnerabilities and how priming can illuminate developer’s blind spots. In: Proceedings of the 30th annual computer security applications conference, pp 296–305
Perl H, Dechand S, Smith M, Arp D, Yamaguchi F, Rieck K, Fahl S, Acar Y (2015) Vccfinder: finding potential vulnerabilities in open-source projects to assist code audits. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp 426–437. https://doi.org/10.1145/2810103.2813604
Ribeiro MT, Singh S, Guestrin C (2016) Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386
Saleem H (2020) Naveed M (2020) Sok: anatomy of data breaches. Proc Priv Enhancing Technol 4:153–174
Scandariato R, Walden J, Hovsepyan A, Joosen W (2014) Predicting vulnerable software components via text mining. IEEE Trans Software Eng 40(10):993–1006. https://doi.org/10.1109/TSE.2014.2340398
Shahid MR, Debar H (2021) Cvss-bert: explainable natural language processing to determine the severity of a computer security vulnerability from its description. In: 2021 20th IEEE International conference on machine learning and applications (ICMLA). IEEE, pp 1600–1607
Shar LK, Briand LC, Tan HBK (2014) Web application vulnerability prediction using hybrid program analysis and machine learning. IEEE Trans Dependable Secure Comput 12(6):688–707. https://doi.org/10.1109/TDSC.2014.2373377
Shin Y, Meneely A, Williams L, Osborne JA (2010) Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans Software Eng 37(6):772–787
Sotgiu A, Pintor M, Biggio B (2022) Explainability-based debugging of machine learning for vulnerability discovery. In: Proceedings of the 17th international conference on availability, reliability and security, pp 1–8
Spreitzenbarth M, Schreck T, Echtler F, Arp D, Hoffmann J (2015) Mobile-sandbox: combining static and dynamic analysis with machine-learning techniques. Int J Inf Secur 14(2):141–153. https://doi.org/10.1007/s10207-014-0250-0
Turpin A, Scholer F (2006) User performance versus precision measures for simple search tasks. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp 11–18
van Rijsbergen C (1979) Information Retrieval, 2nd edn. Butterworths, London
Yamaguchi F, Lindner F, Rieck K (2011) Vulnerability extrapolation: assisted discovery of vulnerabilities using machine learning. In: Proceedings of the 5th USENIX conference on Offensive technologies, pp 13–13. https://doi.org/10.5555/2028052.2028065
Zeng P, Lin G, Pan L, Tai Y, Zhang J (2020) Software vulnerability analysis and discovery using deep learning techniques: a survey. IEEE Access 8:197158–197172
Zhang Y, Chen X et al (2020) Explainable recommendation: a survey and new perspectives. Foundations and Trends® in Information Retrieval 14(1):1–101
Zhou J, Gandomi AH, Chen F, Holzinger A (2021) Evaluating the quality of machine learning explanations: a survey on methods and metrics. Electronics 10(5):593
Zhou Y, Liu S, Siow J, Du X, Liu Y (2019) Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Adv Neural Inf Process Syst 32
Zhu M (2004) Recall, precision and average precision. Department of Statistics and Actuarial Science, University of Waterloo, Waterloo 2(30):6
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of Interest
The authors of this manuscript have no conflicts of interest.
Additional information
Communicated by: Yuan Zhang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Additional Data and Results
Appendix A: Additional Data and Results
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Napier, K., Bhowmik, T. & Chen, Z. Explaining poor performance of text-based machine learning models for vulnerability detection. Empir Software Eng 29, 113 (2024). https://doi.org/10.1007/s10664-024-10519-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-024-10519-8