skip to main content
10.1145/3338906.3340457acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Predicting pull request completion time: a case study on large scale cloud services

Published: 12 August 2019 Publication History

Abstract

Effort estimation models have been long studied in software engineering research. Effort estimation models help organizations and individuals plan and track progress of their software projects and individual tasks to help plan delivery milestones better. Towards this end, there is a large body of work that has been done on effort estimation for projects but little work on an individual checkin (Pull Request) level. In this paper we present a methodology that provides effort estimates on individual developer check-ins which is displayed to developers to help them track their work items. Given the cloud development infrastructure pervasive in companies, it has enabled us to deploy our Pull Request Lifetime prediction system to several thousand developers across multiple software families. We observe from our deployment that the pull request lifetime prediction system conservatively helps save 44.61% of the developer time by accelerating Pull Requests to completion.

References

[1]
Azure Batch. https://azure.microsoft.com/en-us/services/batch/.
[2]
Azure DevOps REST API. https://docs.microsoft.com/en-us/rest/api/azure/ devops/?view=azure-devops-rest-5.0.
[3]
GitHub. https://github.com/about.
[4]
V. R. Basili, L. C. Briand, and W. L. Melo. A validation of object-oriented design metrics as quality indicators. IEEE Trans. Softw. Eng., 22(10):751–761, Oct. 1996.
[5]
N. Bettenburg, M. Nagappan, and A. E. Hassan. Towards improving statistical modeling of software engineering data: Think locally, act globally! Empirical Softw. Engg., 20(2):294–335, Apr. 2015.
[6]
B. Boehm, B. Clark, E. Horowitz, J. Westland, R. Madachy, and R. Selby. Cost models for future software life cycle processes: Cocomo 2.0. Annals of Software Engineering, 1:57–94, 12 1995.
[7]
B. W. Boehm. Software engineering economics. IEEE Trans. Softw. Eng., 10(1):4–21, Jan. 1984.
[8]
L. C. Briand, K. El Emam, D. Surmann, I. Wieczorek, and K. D. Maxwell. An assessment and comparison of common software cost estimation modeling techniques. In Proceedings of the 21st International Conference on Software Engineering, ICSE ’99, pages 313–322, New York, NY, USA, 1999. ACM.
[9]
L. C. Briand, J. Wüst, J. W. Daly, and D. V. Porter. Exploring the relationship between design measures and software quality in object-oriented systems. J. Syst. Softw., 51(3):245–273, May 2000.
[10]
L. C. Briand, J. Wust, S. V. Ikonomovski, and H. Lounis. Investigating quality factors in object-oriented designs: an industrial case study. In Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002), pages 345–354, May 1999.
[11]
S. R. Chidamber and C. F. Kemerer. A metrics suite for object oriented design. IEEE Trans. Softw. Eng., 20(6):476–493, June 1994.
[12]
S. Chulani, B. Boehm, and B. Steece. Bayesian analysis of empirical software engineering cost models. IEEE Trans. Softw. Eng., 25(4):573–583, July 1999.
[13]
K. El Emam, S. Benlarbi, N. Goel, and S. N. Rai. The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans. Softw. Eng., 27(7):630–650, July 2001.
[14]
L. Layman, G. Kudrjavets, and N. Nagappan. Iterative identification of fault-prone binaries using in-process metrics. In ESEM, 2008.
[15]
L. Layman, N. Nagappan, S. Guckenheimer, J. Beehler, and A. Begel. Mining software effort data: preliminary analysis of visual studio team system data. pages 43–46, 01 2008.
[16]
L. MacLeod, M. Greiler, M. Storey, C. Bird, and J. Czerwonka. Code reviewing in the trenches: Challenges and best practices. IEEE Software, 35(4):34–42, July 2018.
[17]
T. Menzies, A. Butcher, D. Cok, A. Marcus, L. Layman, F. Shull, B. Turhan, and T. Zimmermann. Local versus global lessons for defect prediction and effort estimation. IEEE Transactions on Software Engineering, 39(6):822–834, June 2013.
[18]
A. Mockus, P. Zhang, and P. L. Li. Predictors of customer perceived software quality. In Proceedings of the 27th International Conference on Software Engineering, ICSE ’05, pages 225–233, New York, NY, USA, 2005. ACM.
[19]
T. J. Ostrand, E. J. Weyuker, and R. M. Bell. Where the bugs are. In Proceedings of the 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA ’04, pages 86–96, New York, NY, USA, 2004. ACM.
[20]
A. Rastogi, N. Nagappan, G. Gousios, and A. van der Hoek. Relationship between geographical location and evaluation of developer contributions in github. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM ’18, pages 22:1–22:8, New York, NY, USA, 2018. ACM.
[21]
D. M. Soares, M. L. de Lima Júnior, L. Murta, and A. Plastino. Acceptance factors of pull requests in open-source projects. In Proceedings of the 30th Annual ACM Symposium on Applied Computing, SAC ’15, pages 1541–1546, New York, NY, USA, 2015. ACM.
[22]
R. Subramanyam and M. S. Krishnan. Empirical analysis of ck metrics for objectoriented design complexity: implications for software defects. IEEE Transactions on Software Engineering, 29(4):297–310, April 2003.
[23]
M.-H. Tang, M.-H. Kao, and M.-H. Chen. An empirical study on object-oriented metrics. In Proceedings of the 6th International Symposium on Software Metrics, METRICS ’99, pages 242–, Washington, DC, USA, 1999. IEEE Computer Society.
[24]
J. Terrell, A. Kofink, J. Middleton, C. Rainear, E. Murphy-Hill, C. Parnin, and J. Stallings. Gender differences and bias in open source: pull request acceptance of women versus men. PeerJ Computer Science, 3:e111, May 2017.
[25]
J. Tsay, L. Dabbish, and J. Herbsleb. Influence of social and technical factors for evaluating contribution in github. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pages 356–366, New York, NY, USA, 2014. ACM.
[26]
M. A. Vouk and K. C. Tai. Some issues in multi-phase software reliability modeling. In Proceedings of the 1993 Conference of the Centre for Advanced Studies on Collaborative Research: Software Engineering - Volume 1, CASCON ’93, pages 513–523. IBM Press, 1993.
[27]
Y. Yu, H. Wang, V. Filkov, P. Devanbu, and B. Vasilescu. Wait for it: Determinants of pull request evaluation latency on github. In Proceedings of the 12th Working Conference on Mining Software Repositories, MSR ’15, pages 367–371, Piscataway, NJ, USA, 2015. IEEE Press.

Cited By

View all
  • (2024)Comparative Study of Reinforcement Learning in GitHub Pull Request Outcome Predictions2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00057(489-500)Online publication date: 12-Mar-2024
  • (2024)Software development metrics: to VR or not to VREmpirical Software Engineering10.1007/s10664-023-10435-329:2Online publication date: 3-Feb-2024
  • (2023)Dynamic Prediction of Delays in Software Projects using Delay Patterns and Bayesian ModelingProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616328(1012-1023)Online publication date: 30-Nov-2023
  • Show More Cited By

Index Terms

  1. Predicting pull request completion time: a case study on large scale cloud services

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
    August 2019
    1264 pages
    ISBN:9781450355728
    DOI:10.1145/3338906
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Case Studies
    2. Effort Estimation
    3. Empirical Studies
    4. Prediction
    5. Software Metrics

    Qualifiers

    • Research-article

    Conference

    ESEC/FSE '19
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 112 of 543 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)55
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 04 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Comparative Study of Reinforcement Learning in GitHub Pull Request Outcome Predictions2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00057(489-500)Online publication date: 12-Mar-2024
    • (2024)Software development metrics: to VR or not to VREmpirical Software Engineering10.1007/s10664-023-10435-329:2Online publication date: 3-Feb-2024
    • (2023)Dynamic Prediction of Delays in Software Projects using Delay Patterns and Bayesian ModelingProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616328(1012-1023)Online publication date: 30-Nov-2023
    • (2023)Nudge: Accelerating Overdue Pull Requests toward CompletionACM Transactions on Software Engineering and Methodology10.1145/354479132:2(1-30)Online publication date: 30-Mar-2023
    • (2023)Understanding the NPM Dependencies Ecosystem of a Project Using Virtual Reality2023 IEEE Working Conference on Software Visualization (VISSOFT)10.1109/VISSOFT60811.2023.00019(84-94)Online publication date: 1-Oct-2023
    • (2023) To Follow or Not to Follow: Understanding Issue/Pull-Request Templates on GitHub IEEE Transactions on Software Engineering10.1109/TSE.2022.322405349:4(2530-2544)Online publication date: 1-Apr-2023
    • (2023)Pull Request Decisions Explained: An Empirical OverviewIEEE Transactions on Software Engineering10.1109/TSE.2022.316505649:2(849-871)Online publication date: 1-Feb-2023
    • (2023)Evaluating Learning-to-Rank Models for Prioritizing Code Review Requests using Process Simulation2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00050(461-472)Online publication date: Mar-2023
    • (2023)Bug characterization in machine learning-based systemsEmpirical Software Engineering10.1007/s10664-023-10400-029:1Online publication date: 5-Dec-2023
    • (2023)More than React: Investigating the Role of Emoji Reaction in GitHub Pull RequestsEmpirical Software Engineering10.1007/s10664-023-10336-528:5Online publication date: 18-Sep-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media