skip to main content
research-article

Empirical analysis of security vulnerabilities in Python packages

Published: 25 March 2023 Publication History

Abstract

Software ecosystems play an important role in modern software development, providing an open platform of reusable packages that speed up and facilitate development tasks. However, this level of code reusability supported by software ecosystems also makes the discovery of security vulnerabilities much more difficult, as software systems depend on an increasingly high number of packages. Recently, security vulnerabilities in the npm ecosystem, the ecosystem of Node.js packages, have been studied in the literature. As different software ecosystems embody different programming languages and particularities, we argue that it is also important to study other popular programming languages to build stronger empirical evidence about vulnerabilities in software ecosystems. In this paper, we present an empirical study of 1,396 vulnerability reports affecting 698 Python packages in the Python ecosystem (PyPi). In particular, we study the propagation and life span of security vulnerabilities, accounting for how long they take to be discovered and fixed. In addition, vulnerabilities in packages may affect software projects that depend on them (dependent projects), making them vulnerable too. We study a set of 2,224 GitHub Python projects, to better understand the prevalence of vulnerabilities in their dependencies and how fast it takes to update them. Our findings show that the discovered vulnerabilities in Python packages are increasing over time, and they take more than 3 years to be discovered. A large portion of these vulnerabilities (40.86%) are only fixed after being publicly announced, giving ample time for attackers exploitation. Moreover, we find that more than half of the dependent projects rely on at least one vulnerable package, taking a considerably long time (7 months) to update to a non-vulnerable version. We find similarities in some characteristics of vulnerabilities in PyPi and npm and divergences that can be attributed to specific PyPi policies. By leveraging our findings, we provide a series of implications that can help the security of software ecosystems by improving the process of discovering, fixing and managing package vulnerabilities.

References

[1]
Aalen O, Borgan O, and Gjessing H Survival and event history analysis: a process point of view 2008 Berlin Springer Science & Business Media
[2]
Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? an empirical case study on npm. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, pp 385–395
[3]
Abdalkareem R, Oda V, Mujahid S, and Shihab E On the impact of using trivial packages: an empirical case study on npm and PyPI Empir Softw Eng 2020 25 2 1168-1204
[4]
Alfadel M, Costa DE, Shihab E (2020) Dataset: Empirical analysis of security vulnerabilities in Python packages — zenodo. https://zenodo.org/record/4158611. Accessed 29 Oct 2020
[5]
Alfadel M, Costa DE, Shihab E (2021) Empirical analysis of security vulnerabilities in python packages. In: 2021 IEEE international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 446–457
[6]
Alfadel M, Costa DE, Shihab E, Mkhallalati M (2021) On the use of dependabot security pull requests. In: 2021 IEEE/ACM 18th international conference on mining software repositories (MSR). IEEE, pp 254–265
[7]
Allodi L and Massacci F Comparing vulnerability severity and exploits using case-control studies ACM Trans Inf Syst Secur (TISSEC) 2014 17 1 1-20
[8]
Bewick V, Cheek L, Ball J (2004) Statistics review 12: survival analysis. Crit Care 8(5)
[9]
Bisht P, Heim M, Ifland M, Scovetta M, Skinner T (2017) Managing security risks inherent in the use of third-party components. (2017). executive information systems, Inc., White Paper No Eleven
[10]
Bogart C, Kästner C, Herbsleb J (2015) When it breaks, it breaks: How ecosystem developers reason about the stability of dependencies. In: 2015 30th IEEE/ACM international conference on automated software engineering workshop (ASEW), pp 86–89
[11]
Bogart C, Kästner C, Herbsleb J, Thung F (2016) How to break an API: cost negotiation and community values in three software ecosystems. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 109–120
[12]
Borges H and Valente MT What’s in a Github star? understanding repository starring practices in a social coding platform J Syst Softw 2018 146 112-129
[13]
Camilo F, Meneely A, Nagappan M (2015) Do bugs foreshadow vulnerabilities?: a study of the chromium project. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 269–279
[14]
Chinthanet B, Kula RG, McIntosh S, Ishio T, Ihara A, Matsumoto K (2019) Lags in the release, adoption, and propagation of npm vulnerability fixes. Empirical Software Engineering
[15]
Chinthanet B, Kula RG, McIntosh S, Ishio T, Ihara A, and Matsumoto K Lags in the release, adoption, and propagation of npm vulnerability fixes Empir Softw Eng 2021 26 3 1-28
[16]
Chowdhury MAR, Abdalkareem R, Shihab E, Adams B (2021) On the untriviality of trivial packages: An empirical study of npm javascript packages. IEEE Trans Softw Eng
[17]
Constantinou E and Mens T An empirical comparison of developer retention in the rubygems and npm software ecosystems Innov Syst Softw Eng 2017 13 2 101-115
[18]
Cox J, Bouwers E, van Eekelen M, Visser J (2015) Measuring dependency freshness in software systems. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering (ICSE), vol 2. IEEE, pp 109–118
[19]
cwe.mitre (2020) Cwe - cwe-416: Use after free (3.3). https://cwe.mitre.org/data/definitions/416.html. Accessed 10 Oct 2020
[20]
Dabic O, Aghajani E, Bavota G (2021) Sampling projects in Github for MSR studies. In: 2021 IEEE/ACM 18th international conference on mining software repositories (MSR). IEEE, pp 560–564
[21]
Decan A, Mens T (2019) What do package dependencies tell us about semantic versioning? IEEE Trans Softw Eng
[22]
Decan A, Mens T, Claes M (2016) On the topology of package dependency networks: A comparison of three programming language ecosystems. In: Proccedings of the 10th european conference on software architecture workshops, pp 1–4
[23]
Decan A, Mens T, Claes M (2017) An empirical comparison of dependency issues in OSS packaging ecosystems. In: 2017 IEEE 24th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 2–12
[24]
Decan A, Mens T, Constantinou E (2018a) On the impact of security vulnerabilities in the npm package dependency network. In: 2018 IEEE/ACM 15th international conference on mining software repositories (MSR). IEEE, pp 181–191
[25]
Decan A, Mens T, Constantinou E (2018b) On the evolution of technical lag in the npm package dependency network. In: 2018 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 404–414
[26]
Decan A, Mens T, and Grosjean P An empirical comparison of dependency network evolution in seven software packaging ecosystems Empir Softw Eng 2019 24 1 381-416
[27]
Dependabot (2020) https://github.com/dependabot. Accessed 28 Oct 2020
[29]
Derr E, Bugiel S, Fahl S, Acar Y, Backes M (2017) Keep me updated: An empirical study of third-party library updatability on android. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. ACM, pp 2187–2200
[30]
Di Penta M, Cerulo L, and Aversano L The life and death of statically detected vulnerabilities: An empirical study Inf Softw Technol 2009 51 10 1469-1484
[31]
Durumeric Z, Li F, Kasten J, Amann J, Beekman J, Payer M, Weaver N, Adrian D, Paxson V, Bailey M et al (2014) The matter of heartbleed. In: Proceedings of the 2014 conference on internet measurement conference, pp 475–488
[32]
Fard AM, Mesbah A (2017) Javascript: The (un) covered parts. In: 2017 IEEE international conference on software testing, verification and validation (ICST). IEEE, pp 230–240
[33]
Github (2022) Transparency report: January to June — the Github blog. https://github.blog/2022-08-16-2022-transparency-report-january-to-june/. Accessed 31 Oct 2022
[34]
Godefroid P, Levin MY, and Molnar D SAGE: whitebox fuzzing for security testing Commun ACM 2012 55 3 40-44
[35]
Google (2020) Android – google play protect. https://www.android.com/intl/en_ca/play-protect/. Accessed 27 Oct 2020
[36]
Hejderup J (2015) In dependencies we trust: How vulnerable are dependencies in software modules?
[37]
Hejderup J, van Deursen A, Gousios G (2018) Software ecosystem call graph for dependency management. In: 2018 IEEE/ACM 40th international conference on software engineering: new ideas and emerging technologies results (ICSE-NIER). IEEE, pp 101–104
[38]
ISC (2020) Internet systems consortium. https://www.isc.org/#. Accessed 10 Oct 2020
[39]
Johari R, Sharma P (2012) A survey on web application vulnerabilities (SQLIA, XSS) exploitation and security engine for SQL injection. In: 2012 international conference on communication systems and network technologies. IEEE, pp 453–458
[40]
Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining Github. In: Proceedings of the 11th working conference on mining software repositories, MSR ’14. ACM, pp 92–101
[41]
Kaplan EL and Meier P Nonparametric estimation from incomplete observations J Am Stat Assoc 1958 53 282 457-481
[42]
Kula RG, German DM, Ouni A, Ishio T, and Inoue K Do developers update their library dependencies? Empir Softw Eng 2018 23 1 384-417
[43]
Larios-Vargas E, Aniche M, Treude C, Bruntink M, Gousios G (2020) Selecting third-party libraries: The practitioners’ perspective. arXiv:2005.12574
[44]
Li F, Paxson V (2017) A large-scale empirical study of security patches. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. ACM, pp 2201–2215
[45]
Libraries.io (2021) Libraries - the open source discovery service. Accessed 10 Jan 2021
[46]
Lodash (2020) lodash - npm. https://www.npmjs.com/package/lodash. Accessed 10 Oct 2020
[47]
Lu L, Li Z, Wu Z, Lee W, Jiang G (2012) CHEX: statically vetting android apps for component hijacking vulnerabilities. In: Proceedings of the 2012 ACM conference on Computer and communications security, pp 229–240
[48]
MITRE (2020) Cwe. https://cwe.mitre.org/about/index.html. Accessed 10 Oct 2020
[49]
Massacci F, Neuhaus S, Nguyen VH (2011) After-life vulnerabilities: a study on firefox evolution, its vulnerabilities, and fixes. In: International symposium on engineering secure software and systems. Springer, pp 195–208
[50]
Metha N (2022) Heartbleed and shellshock: The new norm in vulnerabilities. https://securityintelligence.com/heartbleed-and-shellshock-the-new-norm-in-vulnerabilities/. Accessed 31 Oct 2022
[51]
Mezzetti G, Møller A, Torp MT (2018) Type regression testing to detect breaking changes in node. js libraries. In: 32nd european conference on object-oriented programming (ECOOP 2018), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik
[52]
NPM (2020a) Reporting a vulnerability in an npm package — npm documentation. https://docs.npmjs.com/reporting-a-vulnerability-in-an-npm-package. Accessed 10 Oct 2020
[53]
NPM (2020b) Auditing package dependencies for security vulnerabilities — npm documentation. https://docs.npmjs.com/auditing-package-dependencies-for-security-vulnerabilities. Accessed 10 Oct 2020
[54]
Nesbitt A, Nickolls B (2018) Libraries.io open source repository and dependency metadata. v1.2.0. Accessed 10 Oct 2020
[55]
Neuhaus S, Zimmermann T (2009) The beauty and the beast: Vulnerabilities in red hat’s packages. In: USENIX annual technical conference
[56]
OWASP (2019) Owasp. https://www.owasp.org/index.php/Main_Page, Accessed 10 Oct 2020
[57]
Pashchenko I, Plate H, Ponta SE, Sabetta A, Massacci F (2018) Vulnerable open source dependencies: Counting those that matter. In: Proceedings of the 12th ACM/IEEE international symposium on empirical software engineering and measurement, pp 1–10
[58]
Pashchenko I, Plate H, Ponta SE, Sabetta A, Massacci F (2020) Vuln4Real: A methodology for counting actually vulnerable dependencies. IEEE Trans Softw Eng
[59]
Pashchenko I, Vu D-L, Massacci F (2020) A qualitative study of dependency management and its security implications. In: Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pp 1513–1531
[60]
Pashchenko I, Vu D-L, Massacci F (2020) A qualitative study of dependency management and its security implications. Proc of CCS’20
[61]
Pham NH, Nguyen TT, Nguyen HA, Nguyen TN (2010) Detection of recurring software vulnerabilities. In: Proceedings of the IEEE/ACM international conference on Automated software engineering. ACM, pp 447–456
[62]
Pillow (2020) Pillow ⋅ pypi. https://pypi.org/project/Pillow/. Accessed 10 Oct 2020
[63]
Ponta SE, Plate H, Sabetta A (2018) Beyond metadata: Code-centric and usage-based analysis of known vulnerabilities in open-source software. In: 2018 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 449–460
[64]
Ponta SE, Plate H, and Sabetta A Detection, assessment and mitigation of vulnerabilities in open source dependencies Empir Softw Eng 2020 25 5 3175-3215
[65]
Prana GAA, Sharma A, Shar LK, Foo D, Santosa AE, Sharma A, and Lo D Out of sight, out of mind? how vulnerable dependencies affect open-source projects Empir Softw Eng 2021 26 4 1-34
[66]
PyPi (2018) Security ⋅ pypi. https://pypi.org/security/. Accessed 10 Oct 2020
[67]
Python (2020) Issue 27863: multiple issues in _elementtree module - python tracker. https://bugs.python.org/issue27863. Accessed 10 Oct 2020
[68]
Ruohonen J (2018) An empirical analysis of vulnerabilities in python packages for web applications. In: 2018 9th international workshop on empirical software engineering in practice (IWESEP). IEEE, pp 25–30
[69]
Sabottke C, Suciu O, Dumitraş T (2015) Vulnerability disclosure in the age of social media: Exploiting twitter for predicting real-world exploits. In: 24th {USENIX} security symposium ({USENIX} security 15), pp 1041–1056
[70]
Semver (2020) semver ⋅ pypi. https://pypi.org/project/semver/. Accessed 10 Oct 2020
[71]
Snyk (2020a) Vulnerability db — Snyk. https://snyk.io/vuln. Accessed 10 Oct 2020
[72]
Snyk (2020b) Scoring security vulnerabilities 101: Introducing cvss for cves — snyk. https://snyk.io/blog/scoring-security-vulnerabilities-101-introducing-cvss-for-cve/. Accessed 10 Oct 2020
[73]
Snyk (2020c) How Snyk finds out about new vulnerabilities – knowledge center — snyk. https://support.snyk.io/hc/en-us/articles/360003923877-How-Snyk-finds-out-about-new-vulnerabilities. Accessed 24 Oct 2020
[74]
Snyk.io (2017) The state of open-source security. https://snyk.io/
[76]
Staicu C-A, Pradel M, Livshits B (2016) Understanding and automatically preventing injection attacks on node. js, tech. rep., Tech. Rep. TUD-CS-2016-14663, TU Darmstadt, Department of Computer Science
[77]
Thomé J, Shar LK, Bianculli D, and Briand L Security slicing for auditing common injection vulnerabilities J Syst Softw 2018 137 766-783
[78]
Thompson HH Why security testing is hard IEEE Secur Priv 2003 1 4 83-86
[79]
Vu D-L, Pashchenko I, Massacci F, Plate H, Sabetta A (2020) Typosquatting and combosquatting attacks on the python ecosystem. In: 2020 IEEE european symposium on security and privacy workshops (EuroS&PW). IEEE, pp 509–514
[80]
Vu D-L, Pashchenko I, Massacci F, Plate H, Sabetta A (2020) Poster: Towards using source code repositories to identify software supply chain attacks. In: CCS ’20
[81]
Walden J (2020) The impact of a major security event on an open source project: The case of OpenSSL. In: Proceedings of the 17th international conference on mining software repositories, pp 409–419
[82]
Wang Y, Chen B, Huang K, Shi B, Xu C, Peng X, Wu Y, Liu Y (2020) An empirical study of usages, updates and risks of third-party libraries in java projects. In: 2020 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 35–45
[83]
Williams J, Dabirsiaghi A (2012) The unfortunate reality of insecure libraries. Asp. Secur. Inc, 1–26
[84]
Wittern E, Suter P, Rajagopalan S (2016) A look at the dynamics of the javascript package ecosystem. In: 2016 IEEE/ACM 13th working conference on mining software repositories (MSR). IEEE, pp 351–361
[85]
Zapata RE, Kula RG, Chinthanet B, Ishio T, Matsumoto K, Ihara A (2018) Towards smoother library migrations: A look at vulnerable dependency migrations at function level for npm javascript packages. In: 2018 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 559–563
[86]
Zapata RE, Kula RG, Chinthanet B, Ishio T, Matsumoto K, Ihara A (2018) Towards smoother library migrations: A look at vulnerable dependency migrations at function level for npm javascript packages. In: 2018 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 559–563
[87]
Zerouali A, Cosentino V, Mens T, Robles G, Gonzalez-Barahona JM (2019) On the impact of outdated and vulnerable javascript packages in docker images. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 619–623
[88]
Zerouali A, Mens T, Decan A, and De Roover C On the impact of security vulnerabilities in the npm and rubygems dependency networks Empir Softw Eng 2022 27 5 1-45
[89]
Zerouali A, Mens T, Robles G, Gonzalez-Barahona JM (2019) On the relation between outdated docker coxntainers, severity vulnerabilities, and bugs. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 491–501
[90]
Zimmermann M, Staicu C-A, Tenny C, Pradel M (2019) Small world with high risks: A study of security threats in the npm ecosystem. In: 28th USENIX security symposium (USENIX security 19), pp 995–1010

Cited By

View all
  • (2024)Vulnerabilities and Security Patches Detection in OSS: A SurveyACM Computing Surveys10.1145/369478257:1(1-37)Online publication date: 9-Sep-2024
  • (2024)Analyzing the Accessibility of GitHub Repositories for PyPI and NPM LibrariesProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661231(345-350)Online publication date: 18-Jun-2024
  • (2024)Dependency-Induced Waste in Continuous Integration: An Empirical Study of Unused Dependencies in the npm EcosystemProceedings of the ACM on Software Engineering10.1145/36608231:FSE(2632-2655)Online publication date: 12-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Empirical Software Engineering
Empirical Software Engineering  Volume 28, Issue 3
May 2023
845 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 25 March 2023
Accepted: 14 December 2022

Author Tags

  1. Python
  2. PyPi
  3. Packages
  4. Vulnerabilities
  5. Empirical studies

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Vulnerabilities and Security Patches Detection in OSS: A SurveyACM Computing Surveys10.1145/369478257:1(1-37)Online publication date: 9-Sep-2024
  • (2024)Analyzing the Accessibility of GitHub Repositories for PyPI and NPM LibrariesProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661231(345-350)Online publication date: 18-Jun-2024
  • (2024)Dependency-Induced Waste in Continuous Integration: An Empirical Study of Unused Dependencies in the npm EcosystemProceedings of the ACM on Software Engineering10.1145/36608231:FSE(2632-2655)Online publication date: 12-Jul-2024
  • (2024)Bloat beneath Python’s Scales: A Fine-Grained Inter-Project Dependency AnalysisProceedings of the ACM on Software Engineering10.1145/36608211:FSE(2584-2607)Online publication date: 12-Jul-2024
  • (2024)The role of library versions in Developer-ChatGPT conversationsProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3645075(172-176)Online publication date: 15-Apr-2024
  • (2024)Empirical Analysis of Vulnerabilities Life Cycle in Golang EcosystemProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639230(1-13)Online publication date: 20-May-2024
  • (2023)An Empirical Study of Malicious Code In PyPI EcosystemProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00135(166-177)Online publication date: 11-Nov-2023
  • (2023)Vulnerability diffusions in software product networksJournal of Operations Management10.1002/joom.127069:8(1342-1370)Online publication date: 16-Jul-2023

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media