skip to main content
10.5555/2820518.2820601acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

A data set for social diversity studies of GitHub teams

Published: 16 May 2015 Publication History

Abstract

Like any other team oriented activity, the software development process is effected by social diversity in the programmer teams. The effect of team diversity can be significant, but also complex, especially in decentralized teams. Discerning the precise contribution of diversity on teams' effectiveness requires quantitative studies of large data sets.
Here we present for the first time a large data set of social diversity attributes of programmers in GitHub teams. Using alias resolution, location data, and gender inference techniques, we collected a team social diversity data set of 23,493 GitHub projects. We illustrate how the data set can be used in practice with a series of case studies, and we hope its availability will foster more interest in studying diversity issues in software teams.

References

[1]
S. E. Jackson and A. Joshi, "Diversity in social context: a multi-attribute, multilevel analysis of team diversity and sales performance," J. Organ. Behav., vol. 25, no. 6, pp. 675--702, 2004.
[2]
W. E. Watson, K. Kumar, and L. K. Michaelsen, "Cultural diversity's impact on interaction process and performance: Comparing homogeneous and diverse task groups," Acad. Manag. J., vol. 36, no. 3, pp. 590--602, 1993.
[3]
D. de Gilder and H. A. M. Wilke, "Expectation states theory and the motivational determinants of social influence," European Review of Social Psychology, vol. 5, no. 1, pp. 243--269, 1994.
[4]
E. Molleman and J. Slomp, "The impact of team and work characteristics on team functioning," Hum. Factors Ergon. Manuf., vol. 16, no. 1, pp. 1--15, 2006.
[5]
S. K. Horwitz and I. B. Horwitz, "The effects of team diversity on team outcomes: A meta-analytic review of team demography," J. Manag., vol. 33, no. 6, pp. 987--1015, 2007.
[6]
G. K. Stahl, M. L. Maznevski, A. Voigt, and K. Jonsen, "Unraveling the effects of cultural diversity in teams: A meta-analysis of research on multicultural work groups," J. Int. Bus. Stud., vol. 41, no. 4, pp. 690--709, 2010.
[7]
B. Vasilescu, D. Posnett, B. Ray, M. G. J. van den Brand, A. Serebrenik, P. Devanbu, and V. Filkov, "Gender and tenure diversity in GitHub teams," in CHI. ACM, 2015, to appear.
[8]
B. Vasilescu, V. Filkov, and A. Serebrenik, "Perceptions of diversity on GitHub: A user survey," in CHASE. IEEE, 2015, to appear.
[9]
M. Gharehyazie, D. Posnett, B. Vasilescu, and V. Filkov, "Developer initiation and social interactions in OSS: A case study of the Apache Software Foundation," Emp. Softw. Eng., pp. 1--36, 2014.
[10]
N. Bettenburg and A. E. Hassan, "Studying the impact of social structures on software quality," in ICPC. IEEE, 2010, pp. 124--133.
[11]
N. Nagappan, B. Murphy, and V. Basili, "The influence of organizational structure on software quality: an empirical case study," in ICSE. ACM, 2008, pp. 521--530.
[12]
J. T. Tsay, L. Dabbish, and J. D. Herbsleb, "Influence of social and technical factors for evaluating contribution in GitHub," in ICSE. ACM, 2014, pp. 356--366.
[13]
G. Gousios, "The GHTorent dataset and tool suite," in MSR. IEEE, 2013, pp. 233--236.
[14]
G. Gousios, B. Vasilescu, A. Serebrenik, and A. Zaidman, "Lean GHTorrent: GitHub data on demand," in MSR. ACM, 2014, pp. 384--387.
[15]
E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian, "The promises and perils of mining GitHub," in MSR. ACM, 2014, pp. 92--101.
[16]
B. Vasilescu, A. Serebrenik, M. Goeminne, and T. Mens, "On the variation and specialisation of workload--A case study of the Gnome ecosystem community," Emp. Softw. Eng., vol. 19, no. 4, pp. 955--1008, 2014.
[17]
E. Kouters, B. Vasilescu, A. Serebrenik, and M. G. J. van den Brand, "Who's who in Gnome: Using LSA to merge software repository identities," in ICSM, 2012, pp. 592--595.
[18]
M. Goeminne and T. Mens, "A comparison of identity merge algorithms for software repositories," Science of Computer Programming, vol. 78, no. 8, pp. 971--986, 2013.
[19]
B. Vasilescu, A. Capiluppi, and A. Serebrenik, "Gender, representation and online participation: A quantitative study," Interacting with Computers, vol. 26, no. 5, pp. 488--511, 2014.
[20]
G. Robles, L. Arjona-Reina, B. Vasilescu, A. Serebrenik, and J. M. Gonzalez-Barahona, "FLOSS 2013: A survey dataset about free software contributors: challenges for curating, sharing, and combining," in MSR. ACM, 2014, pp. 396--399.
[21]
D. M. Blei, "Probabilistic topic models," CACM, vol. 55, no. 4, pp. 77--84, 2012.
[22]
B. Ray, D. Posnett, V. Filkov, and P. T. Devanbu, "A large scale study of programming languages and code quality in GitHub," in FSE. ACM, 2014, pp. 155--165.
[23]
P. J. Adams, A. Capiluppi, and C. Boldyreff, "Coordination and productivity issues in free software: The role of Brooks' law," in ICSM. IEEE, 2009, pp. 319--328.
[24]
S. Daniel, R. Agarwal, and K. J. Stewart, "The effects of diversity in global, distributed collectives: A study of open source project success," Inform. Syst. Res., vol. 24, no. 2, pp. 312--333, 2013.
[25]
P. M. Blau, Inequality and heterogeneity: A primitive theory of social structure. Free Press New York, 1977, vol. 7.
[26]
J. Chen, Y. Ren, and J. Riedl, "The effects of diversity on group productivity and member withdrawal in online volunteer groups," in CHI. ACM, 2010, pp. 821--830.
[27]
P. D. Allison, "Measures of inequality," American Sociological Review, pp. 865--880, 1978.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '15: Proceedings of the 12th Working Conference on Mining Software Repositories
May 2015
542 pages
ISBN:9780769555942

Sponsors

Publisher

IEEE Press

Publication History

Published: 16 May 2015

Check for updates

Qualifiers

  • Research-article

Conference

ICSE '15
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Women’s Participation in Open Source Software: A Survey of the LiteratureACM Transactions on Software Engineering and Methodology10.1145/351046031:4(1-37)Online publication date: 22-Aug-2022
  • (2021)Please don't goProceedings of the 43rd International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion52605.2021.00131(293-298)Online publication date: 25-May-2021
  • (2021)Please don't goProceedings of the 43rd International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion52605.2021.00059(138-140)Online publication date: 25-May-2021
  • (2020)Towards A Dependency-Driven Taxonomy of Software TypesProceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops10.1145/3387940.3392206(687-694)Online publication date: 27-Jun-2020
  • (2020)Human Languages in Source CodeProceedings of the Seventh ACM Conference on Learning @ Scale10.1145/3386527.3405916(167-174)Online publication date: 12-Aug-2020
  • (2020)A Dataset and an Approach for Identity Resolution of 38 Million Author IDs extracted from 2B Git CommitsProceedings of the 17th International Conference on Mining Software Repositories10.1145/3379597.3387500(518-522)Online publication date: 29-Jun-2020
  • (2020)The Impact of Displaying Diversity Information on the Formation of Self-assembling TeamsProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376654(1-15)Online publication date: 21-Apr-2020
  • (2019)Framework for Assessing Professional Growth of Software DevelopersProceedings of the 2019 5th International Conference on Computer and Technology Applications10.1145/3323933.3324064(46-50)Online publication date: 16-Apr-2019
  • (2019)Standing on shoulders or feet?Proceedings of the 16th International Conference on Mining Software Repositories10.1109/MSR.2019.00085(565-576)Online publication date: 26-May-2019
  • (2019)How often and what StackOverflow posts do developers reference in their GitHub projects?Proceedings of the 16th International Conference on Mining Software Repositories10.1109/MSR.2019.00047(235-239)Online publication date: 26-May-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media