Abstract
Modern software development is a complex engineering process where developer code cohabits with an increasingly larger number of external open-source components. Even though these components facilitate sharing and reusing code along with other benefits related to maintenance and code quality, they are often the seeds of vulnerabilities in the software supply chain leading to attacks with severe consequences. Indeed, one common strategy used to conduct attacks is to exploit or inject other security flaws in new versions of dependency packages. It is thus important to keep dependencies updated in a software development project. Unfortunately, several prior studies have highlighted that, to a large extent, developers struggle to keep track of the dependency package updates, and do not quickly incorporate security patches. Therefore, automated dependency-update bots have been proposed to mitigate the impact and the emergence of vulnerabilities in open-source projects. In our study, we focus on Dependabot, a dependency management bot that has gained popularity on GitHub recently. It allows developers to keep a lookout on project dependencies and reduce the effort of monitoring the safety of the software supply chain. We performed a large empirical study on dependency updates and security pull requests to understand: (1) the degree and reasons of Dependabot’s popularity; (2) the patterns of developers’ practices and techniques to deal with vulnerabilities in dependencies; (3) the management of security pull requests (PRs), the threat lifetime, and the fix delay; and (4) the factors that significantly correlate with the acceptance of security PRs and fast merges. To that end, we collected a dataset of 9,916,318 pull request-related issues made in 1,743,035 projects on GitHub for more than 10 different programming languages. In addition to the comprehensive quantitative analysis, we performed a manual qualitative analysis on a representative sample of the dataset, and we substantiated our findings by sending a survey to developers that use dependency management tools. Our study shows that Dependabot dominates more than 65% of dependency management activity, mainly due to its efficiency, accessibility, adaptivity, and availability of support. We also found that developers handle dependency vulnerabilities differently, but mainly rely on the automation of PRs generation to upgrade vulnerable dependencies. Interestingly, Dependabot’s and developers’ security PRs are highly accepted, and the automation allows to accelerate their management, so that fixes are applied in less than one day. However, the threat of dependency vulnerabilities remains hidden for 512 days on average, and patches are disclosed after 362 days due to the reliance on the manual effort of security experts. Also, project characteristics, the amount of PR changes, as well as developer and dependency features seem to be highly correlated with the acceptance and fast merges of security PRs.
Similar content being viewed by others
Materials and/or Code Availability
The datasets that were generated and analyzed in the current study are stored within a Zenodo repository: https://doi.org/10.5281/zenodo.7801356 under Creative Commons Attribution 4.0 International License (CC-BY 4.0). The artifacts and the source code are also hosted on a GitHub repository: https://github.com/HocineREBT/GitHub-Miner.
References
Akoglu H (2018) User’s guide to correlation coefficients. Turk J Emerg Med 18(3):91–93. https://doi.org/10.1016/j.tjem.2018.08.001, https://www.sciencedirect.com/science/article/pii/S2452247318302164
Alfadel M, Costa DE, Shihab E, Mkhallalati M (2021) On the use of dependabot security pull requests. In: 2021 IEEE/ACM 18th International conference on mining software repositories (MSR), pp 254–265. https://doi.org/10.1109/MSR52588.2021.00037
Andreoli A, Lounis A, Debbabi M, Hanna A (2023) On the prevalence of software supply chain attacks: empirical study and investigative framework. Forensic Sci Int: Digital Investigation 44:301508
Angermeir F, Voggenreiter M, Moyón F, Mendez D (2021) Enterprise-driven open source software: a case study on security automation. In: 2021 IEEE/ACM 43rd International conference on software engineering: software engineering in practice (ICSE-SEIP). IEEE, pp 278–287
ben Othmane L, Chehrazi G, Bodden E, Tsalovski P, Brucker AD, Miseldine P (2015) Factors impacting the effort required to fix security vulnerabilities. In: Lopez J, Mitchell CJ (eds) Information security. Springer International Publishing, Cham, pp 102–119
Ben Othmane L, Chehrazi G, Bodden E, Tsalovski P, Brucker AD (2017) Time for addressing software security issues: prediction models and impacting factors. Data Sci Eng 2(2):107–124
Bird C, Gourley A, Devanbu P, Swaminathan A, Hsu G (2007) Open borders? immigration in open source projects. In: Proceedings of the fourth international workshop on mining software repositories. IEEE Computer Society, USA, MSR ’07, p 6. https://doi.org/10.1109/MSR.2007.23
Birsan A (2021) Dependency confusion: how I hacked into apple, microsoft and dozens of other companies. https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610
Boehm C (2023) Supply chain attacks: how to protect against attack and sabotage. https://assets.sentinelone.com/supply-chain-attacks/how-to-protect-against-attack-and-sabotage-en
Calkins KG (2005) Correlation coefficients. https://www.andrews.edu/~calkins/math/edrm611/edrm05.htm, publisher: Andrews University
Canfora G, Di Sorbo A, Forootani S, Pirozzi A, Visaggio CA (2020) Investigating the vulnerability fixing process in oss projects: peculiarities and challenges. Comput Secur 99:102067
Coufalíková A, Klaban I, Šlajs T (2021) Complex strategy against supply chain attacks. In: 2021 International conference on military technologies (ICMT). IEEE, pp 1–5
DeBill E (2019) Module counts. http://www.modulecounts.com/
Decan A, Mens T, Constantinou E (2018) On the impact of security vulnerabilities in the npm package dependency network. In: Proceedings of the 15th international conference on mining software repositories, pp 181–191
Dey T, Mousavi S, Ponce E, Fry T, Vasilescu B, Filippova A, Mockus A (2020) Detecting and characterizing bots that commit code. In: Proceedings of the 17th international conference on mining software repositories, pp 209–219
Duan R, Alrawi O, Kasturi RP, Elder R, Saltaformaggio B, Lee W (2020) Towards measuring supply chain attacks on package managers for interpreted languages. arXiv:2002.01139
Erlenhov L, de Oliveira Neto FG, Scandariato R, Leitner P (2019b) Current and future bots in software development. In: 2019 IEEE/ACM 1st International workshop on bots in software engineering (BotSE). IEEE, pp 7–11
Erlenhov L, Gomes de Oliveira Neto F, Scandariato R, Leitner P (2019a) Current and future bots in software development. In: 2019 IEEE/ACM 1st International workshop on bots in software engineering (BotSE), pp 7–11. https://doi.org/10.1109/BotSE.2019.00009
Garrett K, Ferreira G, Jia L, Sunshine J, Kästner C (2019) Detecting suspicious package updates. In: Proceedings of the 41st International conference on software engineering: new ideas and emerging results. IEEE Press, ICSE-NIER ’19, p 13–16. https://doi.org/10.1109/ICSE-NIER.2019.00012
GitHub (2021) Github rest api. https://docs.github.com/en/rest/reference/search
Gousios G, Pinzger M, Deursen Av (2014a) An exploratory study of the pull-based software development model. In: Proceedings of the 36th international conference on software engineering. Association for computing machinery, New York, NY, USA, ICSE 2014, pp 345–355. https://doi.org/10.1145/2568225.2568260
Gousios G, Pinzger M, Deursen Av (2014b) An exploratory study of the pull-based software development model. In: Proceedings of the 36th international conference on software engineering, pp 345–355
Gousios G, Zaidman A (2014) A dataset for pull-based development research. In: Proceedings of the 11th working conference on mining software repositories. Association for computing machinery, New York, NY, USA, MSR 2014, pp 368–371. https://doi.org/10.1145/2597073.2597122
Groves RM, Fowler FJ Jr, Couper MP, Lepkowski JM, Singer E, Tourangeau R (2011) Survey methodology, vol 561. John Wiley & Sons
Hou F, Jansen S (2023) A systematic literature review on trust in the software ecosystem. Empir Softw Eng 28(1):8
Imtiaz N, Khanom A, Williams L (2022) Open or sneaky? fast or slow? light or heavy?: Investigating security releases of open source packages. IEEE Trans Softw Eng
Jeong G, Kim S, Zimmermann T, Yi K (2009) Improving code review by predicting reviewers and acceptance of patches. Research on software analysis for error-free computing center Tech-Memo (ROSAEC MEMO 2009-006), pp 1–18
Kaczorowski M (2020) Secure at every step: what is software supply chain security and why does it matter? https://github.blog/2020-09-02-secure-your-software-supply-chain-and-protect-against-supply-chain-threats-github-blog/
Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories, pp 92–101
Kononenko O, Rose T, Baysal O, Godfrey M, Theisen D, De Water B (2018a) Studying pull request merges: a case study of shopify’s active merchant. In: Proceedings of the 40th international conference on software engineering: software engineering in practice, pp 124–133
Kononenko O, Rose T, Baysal O, Godfrey M, Theisen D, de Water B (2018b) Studying pull request merges: a case study of shopify’s active merchant. In: 2018 IEEE/ACM 40th International conference on software engineering: software engineering in practice track (ICSE-SEIP), pp 124–133
Ladisa P, Plate H, Martinez M, Barais O (2022) Taxonomy of attacks on open-source software supply chains. arXiv preprint arXiv:2204.04008
Lawall J, Muller G (2018) Coccinelle: 10 years of automated evolution in the Linux kernel. In: Proceedings of the 2018 USENIX conference on usenix annual technical conference. USENIX Association, USA, USENIX ATC ’18, pp 601–613
Lin G, Xiao W, Zhang J, Xiang Y (2019) Deep learning-based vulnerable function detection: a benchmark. In: International conference on information and communications security. Springer, pp 219–232
Lin G, Zhang J, Luo W, Pan L, Xiang Y (2017a) Poster: vulnerability discovery with function representation learning from unlabeled projects. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. Association for computing machinery, New York, NY, USA, CCS ’17, pp 2539–2541. https://doi.org/10.1145/3133956.3138840
Lin G, Zhang J, Luo W, Pan L, Xiang Y (2017b) Poster: vulnerability discovery with function representation learning from unlabeled projects. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 2539–2541
Mirhosseini S, Parnin C (2017) Can automated pull requests encourage software developers to upgrade out-of-date dependencies? In: 2017 32nd IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 84–94
Moguel-Sánchez R, Martínez-Palacios CS, Ocharán-Hernández JO, Limón X, Sánchez-García ÁJ (2022) Bots and their uses in software development: a systematic mapping study. In: 2022 10th International conference in software engineering research and innovation (CONISOFT). IEEE, pp 140–149
Mujahid S, Abdalkareem R, Shihab E (2023) What are the characteristics of highly-selected packages? a case study on the npm ecosystem. J Syst Softw 198:111588
Munaiah N, Kroh S, Cabrey C, Nagappan M (2017) Curating github for engineered software projects. Empir Softw Eng 22:3219–3253
NIST (2021) Vulnerability metrics. https://nvd.nist.gov/vuln-metrics/cvss
Ohm M, Kempf L, Boes F, Meier M (2020a) Supporting the detection of software supply chain attacks through unsupervised signature generation. arXiv preprint arXiv:2011.02235
Ohm M, Kempf L, Boes F, Meier M (2021) Supporting the detection of software supply chain attacks through unsupervised signature generation. arXiv:2011.02235
Ohm M, Plate H, Sykosch A, Meier M (2020b) Backstabber’s knife collection: a review of open source software supply chain attacks. In: International conference on detection of intrusions and malware, and vulnerability assessment. Springer, pp 23–43
Pashchenko I, Vu DL, Massacci F (2020) A qualitative study of dependency management and its security implications. In: Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pp 1513–1531
Peterson K (2013) The github open source development process. http://kevinp.me/github-process-research/github-processresearch.pdf (visited on 05/11/2017)
Pham R, Singer L, Liskin O, Figueira Filho F, Schneider K (2013) Creating a shared understanding of testing culture on a social coding site. In: 2013 35th International conference on software engineering (ICSE). IEEE, pp 112–121
Plumb T (2022) GitHub’s Octoverse report finds 97% of apps use open source software. https://venturebeat.com/programming-development/github-releases-open-source-report-octoverse-2022-says-97-of-apps-use-oss/
Prana GAA, Sharma A, Shar LK, Foo D, Santosa AE, Sharma A, Lo D (2021) Out of sight, out of mind? how vulnerable dependencies affect open-source projects. Empir Softw Eng 26(4):1–34
Preston-Werner T (2021) Semantic versioning 2.0.0. https://semver.org/
Rigby PC, Bird C (2013) Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. Association for Computing Machinery, New York, NY, USA, ESEC/FSE 2013, pp 202–212. https://doi.org/10.1145/2491411.2491444
Russell R, Kim L, Hamilton L, Lazovich T, Harer J, Ozdemir O, Ellingwood P, McConley M (2018) Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 757–762
Santhanam S, Hecking T, Schreiber A, Wagner S (2022) Bots in software engineering: a systematic mapping study. PeerJ Computer Science 8:e866
Soares DM, de Lima Júnior ML, Murta L, Plastino A (2015a) Acceptance factors of pull requests in open-source projects. In: Proceedings of the 30th annual ACM symposium on applied computing. Association for Computing Machinery, New York, USA, SAC ’15, pp 1541–1546. https://doi.org/10.1145/2695664.2695856
Soares DM, de Lima Júnior ML, Murta L, Plastino A (2015b) Acceptance factors of pull requests in open-source projects. In: Proceedings of the 30th annual ACM symposium on applied computing, pp 1541–1546
Soto-Valero C, Durieux T, Baudry B (2021) A longitudinal analysis of bloated java dependencies. In: Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. Association for Computing Machinery, New York, USA, ESEC/FSE 2021, pp 1021–1031. https://doi.org/10.1145/3468264.3468589
Szulik K (2018) Dependency management and your software health. https://blog.tidelift.com/dependency-management-and-your-software-health
Wang H, Ye G, Tang Z, Tan SH, Huang S, Fang D, Feng Y, Bian L, Wang Z (2020a) Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Trans Inf Forensics Secur 16:1943–1958
Wang Y, Chen B, Huang K, Shi B, Xu C, Peng X, Wu Y, Liu Y (2020b) An empirical study of usages, updates and risks of third-party libraries in java projects. In: 2020 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 35–45
Weißgerber P, Neu D, Diehl S (2008) Small patches get in! In: Proceedings of the 2008 international working conference on mining software repositories. Association for Computing Machinery, New York, USA, MSR ’08, pp 67–76. https://doi.org/10.1145/1370750.1370767
Wessel M, De Souza BM, Steinmacher I, Wiese IS, Polato I, Chaves AP, Gerosa MA (2018) The power of bots: characterizing and understanding bots in oss projects. Proc ACM Hum-Comput Interaction 2(CSCW):1–19
Wessel M, Wiese I, Steinmacher I, Gerosa MA (2021) Don’t disturb me: challenges of interacting with software bots on open source software projects. Proc ACM Hum-Comput Interaction 5(CSCW2):1–21
Wessel M, Gerosa MA, Shihab E (2022) Software bots in software engineering: benefits and challenges. In: Proceedings of the 19th International conference on mining software repositories, pp 724–725
Wessel M, Steinmacher I (2020a) The inconvenient side of software bots on pull requests. In: Proceedings of the IEEE/ACM 42nd international conference on software engineering workshops. Association for Computing Machinery, New York, USA, CSEW’20, pp 51–55. https://doi.org/10.1145/3387940.3391504
Wessel M, Steinmacher I (2020b) The inconvenient side of software bots on pull requests. In: Proceedings of the IEEE/ACM 42nd international conference on software engineering workshops, pp 51–55
Yu Y, Wang H, Filkov V, Devanbu P, Vasilescu B (2015) Wait for it: determinants of pull request evaluation latency on github. In: 2015 IEEE/ACM 12th Working conference on mining software repositories, pp 367–371. https://doi.org/10.1109/MSR.2015.42
Zahan N, Zimmermann T, Godefroid P, Murphy B, Maddila C, Williams L (2022) What are weak links in the npm supply chain? In: 2022 IEEE/ACM 44th International conference on software engineering: software engineering in practice (ICSE-SEIP). IEEE, pp 331–340
Zerouali A, Mens T, Decan A, De Roover C (2022) On the impact of security vulnerabilities in the npm and rubygems dependency networks. Empir Softw Eng 27(5):1–45
Zerouali A, Mens T, Decan A, Roover CD (2021) On the impact of security vulnerabilities in the npm and rubygems dependency networks. arXiv:2106.06747
Zhou Y, Liu S, Siow J, Du X, Liu Y (2019) Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Adv Neural Inf Process Syst 32
Zhu J, Zhou M, Mockus A (2016) Effectiveness of code contribution: from patch-based to pull-request-based tools. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. Association for Computing Machinery, New York, USA, FSE 2016, pp 871–882. https://doi.org/10.1145/2950290.2950364
Acknowledgements
This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), and the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 949014).
Funding
This study was supported by (1) the Natural Sciences and Engineering Research Council of Canada (NSERC), and by (2) the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 949014).
Author information
Authors and Affiliations
Contributions
\(\circ \) Conceptualization: Hocine Rebatchi, Tégawendé F. Bissyandé, Naouel Moha \(\circ \) Data curation: Hocine Rebatchi \(\circ \) Formal analysis: Hocine Rebatchi \(\circ \) Investigation: Hocine Rebatchi \(\circ \) Methodology: Hocine Rebatchi, Tégawendé F. Bissyandé, Naouel Moha \(\circ \) Resources: Hocine Rebatchi, Tégawendé F. Bissyandé, Naouel Moha \(\circ \) Software: Hocine Rebatchi \(\circ \) Supervision: Tégawendé F. Bissyandé, Naouel Moha \(\circ \) Visualization: Hocine Rebatchi \(\circ \) Writing – original draft: Hocine Rebatchi \(\circ \) Approval of the final version of the manuscript: Hocine Rebatchi, Tégawendé F. Bissyandé, Naouel Moha
Corresponding author
Ethics declarations
Consent
All the authors give their consent to submit this work.
Data
Our datasets support our results and comply with the field standards.
Ethics Approval
Our manuscript is not submitted to another journal for simultaneous consideration, and our work is original. The authors also declare that this manuscript follows the best scientific standards, in particular, w-r-t to acknowledgment of prior works, honesty of the presentation of results, and focus on the demonstrability of the statements. This manuscript and the work that led to it do not carry any specific ethic issue. As such, it was considered unnecessary to seek formal approval from our institution Ethics Committee specifically for this work.
Competing Interests
The authors declare that they have no known competing financial or non-financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Communicated by: Igor Steinmacher.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Open & Axial Coding Techniques
A Open & Axial Coding Techniques
In this section, we provide an example to illustrate the adoption of the open coding and axial coding to perform our manual analysis in order to answer RQ2. Figure 13 showcases an example of the results of open and axial coding analyses with the derived quotes and codes.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rebatchi, H., Bissyandé, T.F. & Moha, N. Dependabot and security pull requests: large empirical study. Empir Software Eng 29, 128 (2024). https://doi.org/10.1007/s10664-024-10523-y
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-024-10523-y