Skip to main content

Advertisement

Log in

Dependabot and security pull requests: large empirical study

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Modern software development is a complex engineering process where developer code cohabits with an increasingly larger number of external open-source components. Even though these components facilitate sharing and reusing code along with other benefits related to maintenance and code quality, they are often the seeds of vulnerabilities in the software supply chain leading to attacks with severe consequences. Indeed, one common strategy used to conduct attacks is to exploit or inject other security flaws in new versions of dependency packages. It is thus important to keep dependencies updated in a software development project. Unfortunately, several prior studies have highlighted that, to a large extent, developers struggle to keep track of the dependency package updates, and do not quickly incorporate security patches. Therefore, automated dependency-update bots have been proposed to mitigate the impact and the emergence of vulnerabilities in open-source projects. In our study, we focus on Dependabot, a dependency management bot that has gained popularity on GitHub recently. It allows developers to keep a lookout on project dependencies and reduce the effort of monitoring the safety of the software supply chain. We performed a large empirical study on dependency updates and security pull requests to understand: (1) the degree and reasons of Dependabot’s popularity; (2) the patterns of developers’ practices and techniques to deal with vulnerabilities in dependencies; (3) the management of security pull requests (PRs), the threat lifetime, and the fix delay; and (4) the factors that significantly correlate with the acceptance of security PRs and fast merges. To that end, we collected a dataset of 9,916,318 pull request-related issues made in 1,743,035 projects on GitHub for more than 10 different programming languages. In addition to the comprehensive quantitative analysis, we performed a manual qualitative analysis on a representative sample of the dataset, and we substantiated our findings by sending a survey to developers that use dependency management tools. Our study shows that Dependabot dominates more than 65% of dependency management activity, mainly due to its efficiency, accessibility, adaptivity, and availability of support. We also found that developers handle dependency vulnerabilities differently, but mainly rely on the automation of PRs generation to upgrade vulnerable dependencies. Interestingly, Dependabot’s and developers’ security PRs are highly accepted, and the automation allows to accelerate their management, so that fixes are applied in less than one day. However, the threat of dependency vulnerabilities remains hidden for 512 days on average, and patches are disclosed after 362 days due to the reliance on the manual effort of security experts. Also, project characteristics, the amount of PR changes, as well as developer and dependency features seem to be highly correlated with the acceptance and fast merges of security PRs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
€32.70 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Bulgaria)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Materials and/or Code Availability

The datasets that were generated and analyzed in the current study are stored within a Zenodo repository: https://doi.org/10.5281/zenodo.7801356 under Creative Commons Attribution 4.0 International License (CC-BY 4.0). The artifacts and the source code are also hosted on a GitHub repository: https://github.com/HocineREBT/GitHub-Miner.

Notes

  1. https://doi.org/10.5281/zenodo.7801356

  2. https://github.com/HocineREBT/GitHub-Miner

  3. https://github.com/advisories

  4. https://nvd.nist.gov/

  5. https://doi.org/10.5281/zenodo.7801356

  6. https://github.com/HocineREBT/GitHub-Miner

  7. https://lgtm.com/

  8. https://owasp.org/www-project-top-ten/

References

  • Akoglu H (2018) User’s guide to correlation coefficients. Turk J Emerg Med 18(3):91–93. https://doi.org/10.1016/j.tjem.2018.08.001, https://www.sciencedirect.com/science/article/pii/S2452247318302164

  • Alfadel M, Costa DE, Shihab E, Mkhallalati M (2021) On the use of dependabot security pull requests. In: 2021 IEEE/ACM 18th International conference on mining software repositories (MSR), pp 254–265. https://doi.org/10.1109/MSR52588.2021.00037

  • Andreoli A, Lounis A, Debbabi M, Hanna A (2023) On the prevalence of software supply chain attacks: empirical study and investigative framework. Forensic Sci Int: Digital Investigation 44:301508

    Google Scholar 

  • Angermeir F, Voggenreiter M, Moyón F, Mendez D (2021) Enterprise-driven open source software: a case study on security automation. In: 2021 IEEE/ACM 43rd International conference on software engineering: software engineering in practice (ICSE-SEIP). IEEE, pp 278–287

  • ben Othmane L, Chehrazi G, Bodden E, Tsalovski P, Brucker AD, Miseldine P (2015) Factors impacting the effort required to fix security vulnerabilities. In: Lopez J, Mitchell CJ (eds) Information security. Springer International Publishing, Cham, pp 102–119

  • Ben Othmane L, Chehrazi G, Bodden E, Tsalovski P, Brucker AD (2017) Time for addressing software security issues: prediction models and impacting factors. Data Sci Eng 2(2):107–124

    Article  Google Scholar 

  • Bird C, Gourley A, Devanbu P, Swaminathan A, Hsu G (2007) Open borders? immigration in open source projects. In: Proceedings of the fourth international workshop on mining software repositories. IEEE Computer Society, USA, MSR ’07, p 6. https://doi.org/10.1109/MSR.2007.23

  • Birsan A (2021) Dependency confusion: how I hacked into apple, microsoft and dozens of other companies. https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610

  • Boehm C (2023) Supply chain attacks: how to protect against attack and sabotage. https://assets.sentinelone.com/supply-chain-attacks/how-to-protect-against-attack-and-sabotage-en

  • Calkins KG (2005) Correlation coefficients. https://www.andrews.edu/~calkins/math/edrm611/edrm05.htm, publisher: Andrews University

  • Canfora G, Di Sorbo A, Forootani S, Pirozzi A, Visaggio CA (2020) Investigating the vulnerability fixing process in oss projects: peculiarities and challenges. Comput Secur 99:102067

    Article  Google Scholar 

  • Coufalíková A, Klaban I, Šlajs T (2021) Complex strategy against supply chain attacks. In: 2021 International conference on military technologies (ICMT). IEEE, pp 1–5

  • DeBill E (2019) Module counts. http://www.modulecounts.com/

  • Decan A, Mens T, Constantinou E (2018) On the impact of security vulnerabilities in the npm package dependency network. In: Proceedings of the 15th international conference on mining software repositories, pp 181–191

  • Dey T, Mousavi S, Ponce E, Fry T, Vasilescu B, Filippova A, Mockus A (2020) Detecting and characterizing bots that commit code. In: Proceedings of the 17th international conference on mining software repositories, pp 209–219

  • Duan R, Alrawi O, Kasturi RP, Elder R, Saltaformaggio B, Lee W (2020) Towards measuring supply chain attacks on package managers for interpreted languages. arXiv:2002.01139

  • Erlenhov L, de Oliveira Neto FG, Scandariato R, Leitner P (2019b) Current and future bots in software development. In: 2019 IEEE/ACM 1st International workshop on bots in software engineering (BotSE). IEEE, pp 7–11

  • Erlenhov L, Gomes de Oliveira Neto F, Scandariato R, Leitner P (2019a) Current and future bots in software development. In: 2019 IEEE/ACM 1st International workshop on bots in software engineering (BotSE), pp 7–11. https://doi.org/10.1109/BotSE.2019.00009

  • Garrett K, Ferreira G, Jia L, Sunshine J, Kästner C (2019) Detecting suspicious package updates. In: Proceedings of the 41st International conference on software engineering: new ideas and emerging results. IEEE Press, ICSE-NIER ’19, p 13–16. https://doi.org/10.1109/ICSE-NIER.2019.00012

  • GitHub (2021) Github rest api. https://docs.github.com/en/rest/reference/search

  • Gousios G, Pinzger M, Deursen Av (2014a) An exploratory study of the pull-based software development model. In: Proceedings of the 36th international conference on software engineering. Association for computing machinery, New York, NY, USA, ICSE 2014, pp 345–355. https://doi.org/10.1145/2568225.2568260

  • Gousios G, Pinzger M, Deursen Av (2014b) An exploratory study of the pull-based software development model. In: Proceedings of the 36th international conference on software engineering, pp 345–355

  • Gousios G, Zaidman A (2014) A dataset for pull-based development research. In: Proceedings of the 11th working conference on mining software repositories. Association for computing machinery, New York, NY, USA, MSR 2014, pp 368–371. https://doi.org/10.1145/2597073.2597122

  • Groves RM, Fowler FJ Jr, Couper MP, Lepkowski JM, Singer E, Tourangeau R (2011) Survey methodology, vol 561. John Wiley & Sons

    Google Scholar 

  • Hou F, Jansen S (2023) A systematic literature review on trust in the software ecosystem. Empir Softw Eng 28(1):8

    Article  Google Scholar 

  • Imtiaz N, Khanom A, Williams L (2022) Open or sneaky? fast or slow? light or heavy?: Investigating security releases of open source packages. IEEE Trans Softw Eng

  • Jeong G, Kim S, Zimmermann T, Yi K (2009) Improving code review by predicting reviewers and acceptance of patches. Research on software analysis for error-free computing center Tech-Memo (ROSAEC MEMO 2009-006), pp 1–18

  • Kaczorowski M (2020) Secure at every step: what is software supply chain security and why does it matter? https://github.blog/2020-09-02-secure-your-software-supply-chain-and-protect-against-supply-chain-threats-github-blog/

  • Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories, pp 92–101

  • Kononenko O, Rose T, Baysal O, Godfrey M, Theisen D, De Water B (2018a) Studying pull request merges: a case study of shopify’s active merchant. In: Proceedings of the 40th international conference on software engineering: software engineering in practice, pp 124–133

  • Kononenko O, Rose T, Baysal O, Godfrey M, Theisen D, de Water B (2018b) Studying pull request merges: a case study of shopify’s active merchant. In: 2018 IEEE/ACM 40th International conference on software engineering: software engineering in practice track (ICSE-SEIP), pp 124–133

  • Ladisa P, Plate H, Martinez M, Barais O (2022) Taxonomy of attacks on open-source software supply chains. arXiv preprint arXiv:2204.04008

  • Lawall J, Muller G (2018) Coccinelle: 10 years of automated evolution in the Linux kernel. In: Proceedings of the 2018 USENIX conference on usenix annual technical conference. USENIX Association, USA, USENIX ATC ’18, pp 601–613

  • Lin G, Xiao W, Zhang J, Xiang Y (2019) Deep learning-based vulnerable function detection: a benchmark. In: International conference on information and communications security. Springer, pp 219–232

  • Lin G, Zhang J, Luo W, Pan L, Xiang Y (2017a) Poster: vulnerability discovery with function representation learning from unlabeled projects. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. Association for computing machinery, New York, NY, USA, CCS ’17, pp 2539–2541. https://doi.org/10.1145/3133956.3138840

  • Lin G, Zhang J, Luo W, Pan L, Xiang Y (2017b) Poster: vulnerability discovery with function representation learning from unlabeled projects. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 2539–2541

  • Mirhosseini S, Parnin C (2017) Can automated pull requests encourage software developers to upgrade out-of-date dependencies? In: 2017 32nd IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 84–94

  • Moguel-Sánchez R, Martínez-Palacios CS, Ocharán-Hernández JO, Limón X, Sánchez-García ÁJ (2022) Bots and their uses in software development: a systematic mapping study. In: 2022 10th International conference in software engineering research and innovation (CONISOFT). IEEE, pp 140–149

  • Mujahid S, Abdalkareem R, Shihab E (2023) What are the characteristics of highly-selected packages? a case study on the npm ecosystem. J Syst Softw 198:111588

    Article  Google Scholar 

  • Munaiah N, Kroh S, Cabrey C, Nagappan M (2017) Curating github for engineered software projects. Empir Softw Eng 22:3219–3253

    Article  Google Scholar 

  • NIST (2021) Vulnerability metrics. https://nvd.nist.gov/vuln-metrics/cvss

  • Ohm M, Kempf L, Boes F, Meier M (2020a) Supporting the detection of software supply chain attacks through unsupervised signature generation. arXiv preprint arXiv:2011.02235

  • Ohm M, Kempf L, Boes F, Meier M (2021) Supporting the detection of software supply chain attacks through unsupervised signature generation. arXiv:2011.02235

  • Ohm M, Plate H, Sykosch A, Meier M (2020b) Backstabber’s knife collection: a review of open source software supply chain attacks. In: International conference on detection of intrusions and malware, and vulnerability assessment. Springer, pp 23–43

  • Pashchenko I, Vu DL, Massacci F (2020) A qualitative study of dependency management and its security implications. In: Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pp 1513–1531

  • Peterson K (2013) The github open source development process. http://kevinp.me/github-process-research/github-processresearch.pdf (visited on 05/11/2017)

  • Pham R, Singer L, Liskin O, Figueira Filho F, Schneider K (2013) Creating a shared understanding of testing culture on a social coding site. In: 2013 35th International conference on software engineering (ICSE). IEEE, pp 112–121

  • Plumb T (2022) GitHub’s Octoverse report finds 97% of apps use open source software. https://venturebeat.com/programming-development/github-releases-open-source-report-octoverse-2022-says-97-of-apps-use-oss/

  • Prana GAA, Sharma A, Shar LK, Foo D, Santosa AE, Sharma A, Lo D (2021) Out of sight, out of mind? how vulnerable dependencies affect open-source projects. Empir Softw Eng 26(4):1–34

    Article  Google Scholar 

  • Preston-Werner T (2021) Semantic versioning 2.0.0. https://semver.org/

  • Rigby PC, Bird C (2013) Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. Association for Computing Machinery, New York, NY, USA, ESEC/FSE 2013, pp 202–212. https://doi.org/10.1145/2491411.2491444

  • Russell R, Kim L, Hamilton L, Lazovich T, Harer J, Ozdemir O, Ellingwood P, McConley M (2018) Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 757–762

  • Santhanam S, Hecking T, Schreiber A, Wagner S (2022) Bots in software engineering: a systematic mapping study. PeerJ Computer Science 8:e866

    Article  Google Scholar 

  • Soares DM, de Lima Júnior ML, Murta L, Plastino A (2015a) Acceptance factors of pull requests in open-source projects. In: Proceedings of the 30th annual ACM symposium on applied computing. Association for Computing Machinery, New York, USA, SAC ’15, pp 1541–1546. https://doi.org/10.1145/2695664.2695856

  • Soares DM, de Lima Júnior ML, Murta L, Plastino A (2015b) Acceptance factors of pull requests in open-source projects. In: Proceedings of the 30th annual ACM symposium on applied computing, pp 1541–1546

  • Soto-Valero C, Durieux T, Baudry B (2021) A longitudinal analysis of bloated java dependencies. In: Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. Association for Computing Machinery, New York, USA, ESEC/FSE 2021, pp 1021–1031. https://doi.org/10.1145/3468264.3468589

  • Szulik K (2018) Dependency management and your software health. https://blog.tidelift.com/dependency-management-and-your-software-health

  • Wang H, Ye G, Tang Z, Tan SH, Huang S, Fang D, Feng Y, Bian L, Wang Z (2020a) Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Trans Inf Forensics Secur 16:1943–1958

    Article  Google Scholar 

  • Wang Y, Chen B, Huang K, Shi B, Xu C, Peng X, Wu Y, Liu Y (2020b) An empirical study of usages, updates and risks of third-party libraries in java projects. In: 2020 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 35–45

  • Weißgerber P, Neu D, Diehl S (2008) Small patches get in! In: Proceedings of the 2008 international working conference on mining software repositories. Association for Computing Machinery, New York, USA, MSR ’08, pp 67–76. https://doi.org/10.1145/1370750.1370767

  • Wessel M, De Souza BM, Steinmacher I, Wiese IS, Polato I, Chaves AP, Gerosa MA (2018) The power of bots: characterizing and understanding bots in oss projects. Proc ACM Hum-Comput Interaction 2(CSCW):1–19

    Article  Google Scholar 

  • Wessel M, Wiese I, Steinmacher I, Gerosa MA (2021) Don’t disturb me: challenges of interacting with software bots on open source software projects. Proc ACM Hum-Comput Interaction 5(CSCW2):1–21

    Article  Google Scholar 

  • Wessel M, Gerosa MA, Shihab E (2022) Software bots in software engineering: benefits and challenges. In: Proceedings of the 19th International conference on mining software repositories, pp 724–725

  • Wessel M, Steinmacher I (2020a) The inconvenient side of software bots on pull requests. In: Proceedings of the IEEE/ACM 42nd international conference on software engineering workshops. Association for Computing Machinery, New York, USA, CSEW’20, pp 51–55. https://doi.org/10.1145/3387940.3391504

  • Wessel M, Steinmacher I (2020b) The inconvenient side of software bots on pull requests. In: Proceedings of the IEEE/ACM 42nd international conference on software engineering workshops, pp 51–55

  • Yu Y, Wang H, Filkov V, Devanbu P, Vasilescu B (2015) Wait for it: determinants of pull request evaluation latency on github. In: 2015 IEEE/ACM 12th Working conference on mining software repositories, pp 367–371. https://doi.org/10.1109/MSR.2015.42

  • Zahan N, Zimmermann T, Godefroid P, Murphy B, Maddila C, Williams L (2022) What are weak links in the npm supply chain? In: 2022 IEEE/ACM 44th International conference on software engineering: software engineering in practice (ICSE-SEIP). IEEE, pp 331–340

  • Zerouali A, Mens T, Decan A, De Roover C (2022) On the impact of security vulnerabilities in the npm and rubygems dependency networks. Empir Softw Eng 27(5):1–45

    Article  Google Scholar 

  • Zerouali A, Mens T, Decan A, Roover CD (2021) On the impact of security vulnerabilities in the npm and rubygems dependency networks. arXiv:2106.06747

  • Zhou Y, Liu S, Siow J, Du X, Liu Y (2019) Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Adv Neural Inf Process Syst 32

  • Zhu J, Zhou M, Mockus A (2016) Effectiveness of code contribution: from patch-based to pull-request-based tools. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. Association for Computing Machinery, New York, USA, FSE 2016, pp 871–882. https://doi.org/10.1145/2950290.2950364

Download references

Acknowledgements

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), and the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 949014).

Funding

This study was supported by (1) the Natural Sciences and Engineering Research Council of Canada (NSERC), and by (2) the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 949014).

Author information

Authors and Affiliations

Authors

Contributions

\(\circ \) Conceptualization: Hocine Rebatchi, Tégawendé F. Bissyandé, Naouel Moha \(\circ \) Data curation: Hocine Rebatchi \(\circ \) Formal analysis: Hocine Rebatchi \(\circ \) Investigation: Hocine Rebatchi \(\circ \) Methodology: Hocine Rebatchi, Tégawendé F. Bissyandé, Naouel Moha \(\circ \) Resources: Hocine Rebatchi, Tégawendé F. Bissyandé, Naouel Moha \(\circ \) Software: Hocine Rebatchi \(\circ \) Supervision: Tégawendé F. Bissyandé, Naouel Moha \(\circ \) Visualization: Hocine Rebatchi \(\circ \) Writing – original draft: Hocine Rebatchi \(\circ \) Approval of the final version of the manuscript: Hocine Rebatchi, Tégawendé F. Bissyandé, Naouel Moha

Corresponding author

Correspondence to Hocine Rebatchi.

Ethics declarations

Consent

All the authors give their consent to submit this work.

Data

Our datasets support our results and comply with the field standards.

Ethics Approval

Our manuscript is not submitted to another journal for simultaneous consideration, and our work is original. The authors also declare that this manuscript follows the best scientific standards, in particular, w-r-t to acknowledgment of prior works, honesty of the presentation of results, and focus on the demonstrability of the statements. This manuscript and the work that led to it do not carry any specific ethic issue. As such, it was considered unnecessary to seek formal approval from our institution Ethics Committee specifically for this work.

Competing Interests

The authors declare that they have no known competing financial or non-financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Communicated by: Igor Steinmacher.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Open & Axial Coding Techniques

A Open & Axial Coding Techniques

In this section, we provide an example to illustrate the adoption of the open coding and axial coding to perform our manual analysis in order to answer RQ2. Figure 13 showcases an example of the results of open and axial coding analyses with the derived quotes and codes.

Fig. 13
figure 13

Example of open coding and axial coding analyses

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rebatchi, H., Bissyandé, T.F. & Moha, N. Dependabot and security pull requests: large empirical study. Empir Software Eng 29, 128 (2024). https://doi.org/10.1007/s10664-024-10523-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-024-10523-y

Keywords

Navigation