skip to main content
10.1145/2597073.2597122acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
Article

A dataset for pull-based development research

Published: 31 May 2014 Publication History

Abstract

Pull requests form a new method for collaborating in distributed software development. To study the pull request distributed development model, we constructed a dataset of almost 900 projects and 350,000 pull requests, including some of the largest users of pull requests on Github. In this paper, we describe how the project selection was done, we analyze the selected features and present a machine learning tool set for the R statistics environment.

References

[1]
J. Anvik, L. Hiew, and G. C. Murphy. Who should fix this bug? In Proceedings of ICSE ’06, pages 361–370. ACM, 2006.
[2]
O. Baysal, R. Holmes, and M. W. Godfrey. Mining usage data and development artifacts. In Proceedings of MSR ’12, pages 98–107. IEEE.
[3]
C. Bird, A. Gourley, and P. Devanbu. Detecting patch submission and acceptance in oss projects. In Proceedings of MSR ’07, page 26. IEEE Computer Society, 2007.
[4]
C. Bird, A. Gourley, P. Devanbu, A. Swaminathan, and G. Hsu. Open borders? Immigration in open source projects. In Proceedings of MSR ’07, page 6. IEEE Computer Society, 2007.
[5]
L. Dabbish, C. Stuart, J. Tsay, and J. Herbsleb. Social coding in Github: transparency and collaboration in an open software repository. In Proceedings of CSCW ’12, pages 1277–1286. ACM, 2012.
[6]
E. Giger, M. Pinzger, and H. Gall. Predicting the fix time of bugs. In Proceedings of RSSE ’10, pages 52–56. ACM, 2010.
[7]
G. Gousios. The GHTorrent dataset and tool suite. In Proceedings of MSR ’13, pages 233–236. IEEE, 2013.
[8]
G. Gousios, M. Pinzger, and A. van Deursen. An exploration of the pull-based software development model. June 2014. To appear at ICSE 2014.
[9]
K. Hamasaki, R. G. Kula, N. Yoshida, A. E. C. Cruz, K. Fujiwara, and H. Iida. Who does what during a code review? datasets of oss peer review repositories. In Proceedings of MSR ’13, pages 49–52. IEEE, 2013.
[10]
G. Jeong, S. Kim, T. Zimmermann, and K. Yi. Improving code review by predicting reviewers and acceptance of patches. Research on Software Analysis for Error-free Computing Center Tech-Memo (ROSAEC MEMO), 2009.
[11]
M. Mukadam, C. Bird, and P. C. Rigby. Gerrit software code review data from Android. In Proceedings of MSR ’13, pages 45–48. IEEE, 2013.
[12]
N. Nagappan and T. Ball. Use of relative code churn measures to predict system defect density. In Proceedings of ICSE ’05, pages 284–292. ACM, 2005.
[13]
R. Pham, L. Singer, O. Liskin, F. Figueira Filho, and K. Schneider. Creating a shared understanding of testing culture on a social coding site. In Proceedings of ICSE ’13, pages 112–121. IEEE, 2013.
[14]
P. C. Rigby and C. Bird. Convergent software peer review practices. In Proceedings of FSE ’13, 2013.
[15]
P. Weißgerber, D. Neu, and S. Diehl. Small patches get in! In Proceedings of MSR ’08, pages 67–76. ACM, 2008.

Cited By

View all
  • (2024)Evaluation of Version Control Merge ToolsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695075(831-83)Online publication date: 27-Oct-2024
  • (2024)An Exploratory Mixed-methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source SoftwareProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686692(325-336)Online publication date: 24-Oct-2024
  • (2024)Dependabot and security pull requests: large empirical studyEmpirical Software Engineering10.1007/s10664-024-10523-y29:5Online publication date: 30-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR 2014: Proceedings of the 11th Working Conference on Mining Software Repositories
May 2014
427 pages
ISBN:9781450328630
DOI:10.1145/2597073
  • General Chair:
  • Premkumar Devanbu,
  • Program Chairs:
  • Sung Kim,
  • Martin Pinzger
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • TCSE: IEEE Computer Society's Tech. Council on Software Engin.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed software development
  2. empirical software engineering
  3. pull request
  4. pull-based development

Qualifiers

  • Article

Conference

ICSE '14
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)3
Reflects downloads up to 05 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Evaluation of Version Control Merge ToolsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695075(831-83)Online publication date: 27-Oct-2024
  • (2024)An Exploratory Mixed-methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source SoftwareProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686692(325-336)Online publication date: 24-Oct-2024
  • (2024)Dependabot and security pull requests: large empirical studyEmpirical Software Engineering10.1007/s10664-024-10523-y29:5Online publication date: 30-Jul-2024
  • (2024)State‐of‐the‐practice in quality assurance in Java‐based open source software developmentSoftware: Practice and Experience10.1002/spe.332154:8(1408-1446)Online publication date: 4-Mar-2024
  • (2023)Understanding the Helpfulness of Stale Bot for Pull-Based Development: An Empirical Study of 20 Large Open-Source ProjectsACM Transactions on Software Engineering and Methodology10.1145/362473933:2(1-43)Online publication date: 23-Dec-2023
  • (2023)On Wasted Contributions: Understanding the Dynamics of Contributor-Abandoned Pull Requests–A Mixed-Methods Study of 10 Large Open-Source ProjectsACM Transactions on Software Engineering and Methodology10.1145/353078532:1(1-39)Online publication date: 13-Feb-2023
  • (2023)Pull Request Decisions Explained: An Empirical OverviewIEEE Transactions on Software Engineering10.1109/TSE.2022.316505649:2(849-871)Online publication date: 1-Feb-2023
  • (2023)Quality Assurance Awareness in Open Source Software Projects on GitHub2023 IEEE 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM59687.2023.00027(174-185)Online publication date: 2-Oct-2023
  • (2023)DocMine: A Software Documentation-Related Dataset of 950 GitHub Repositories2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00062(407-411)Online publication date: May-2023
  • (2023)Testability Refactoring in Pull Requests: Patterns and Trends2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00131(1508-1519)Online publication date: May-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media