skip to main content
10.1145/1806799.1806821acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Codebook: discovering and exploiting relationships in software repositories

Published: 01 May 2010 Publication History

Abstract

Large-scale software engineering requires communication and collaboration to successfully build and ship products. We conducted a survey with Microsoft engineers on inter-team coordination and found that the most impactful problems concerned finding and keeping track of other engineers. Since engineers are connected by their shared work, a tool that discovers connections in their work-related repositories can help.
Here we describe the Codebook framework for mining software repositories. It is flexible enough to address all of the problems identified by our survey with a single data structure (graph of people and artifacts) and a single algorithm (regular language reachability). Codebook handles a larger variety of problems than prior work, analyzes more kinds of work artifacts, and can be customized by and for end-users. To evaluate our framework's flexibility, we built two applications, Hoozizat and Deep Intellisense. We evaluated these applications with engineers to show effectiveness in addressing multiple inter-team coordination problems.

References

[1]
F. Alkhateeb. Querying RDF(S) with Regular Expressions. PhD thesis, Joseph Fourier University of Grenoble, June 2008.
[2]
M. C. Andrew Cencini. Sql server 2005 full-text search: Internals and enhancements. http://msdn.microsoft.com/en-us/library/ms345119(SQL.90).aspx.
[3]
J. Anvik, L. Hiew, and G. C. Murphy. Who should fix this bug? In Proceedings of ICSE, pages 361--370, 2006.
[4]
J. Aranda and G. Venolia. The secret life of bugs: Going past the errors and omissions in software repositories. In Proceedings of ICSE, pages 298--308, 2009.
[5]
B. Ashok, J. Joy, H. Liang, S. Rajamani, G. Srinivasa, and V. Vangala. Debugadvisor: A recommender system for debugging. In Proceedings of ESEC/FSE '09, August 2009.
[6]
A. Begel and R. DeLine. Codebook: Social networking over code. In Proceedings of ICSE, NIER Track, 2009.
[7]
A. Begel, N. Nagappan, C. Poile, and L. Layman. Coordination in large-scale software teams. In Proceedings of CHASE, pages 1--7, 2009.
[8]
M. Cataldo, D. Damian, P. Devanbu, S. Easterbrook, J. Herbsleb, and A. Mockus. 2nd international workshop on socio-technical congruence, May 2009.
[9]
M. Cataldo, J. D. Herbsleb, and K. M. Carley. Socio-technical congruence: a framework for assessing the impact of technical and work dependencies on software development productivity. In Proceedings of ESEM, pages 2--11, 2008.
[10]
M. Cataldo, P. A. Wagstrom, J. D. Herbsleb, and K. M. Carley. Identification of coordination requirements: implications for the design of collaboration and awareness tools. In Proceedings of CSCW, pages 353--362, 2006.
[11]
D. Cubranic, J. Singer, and K. S. Booth. Hipikat: A project memory for software development. IEEE TSE, 31(6):446--465, 2005. Member-Gail C. Murphy.
[12]
C. de Souza, J. Froehlich, and P. Dourish. Seeking the source: software source code as a social and technical artifact. In Proceedings of GROUP, pages 197--206, 2005.
[13]
C. R. B. de Souza and D. F. Redmiles. An empirical study of software developers' management of dependencies and changes. In Proceedings of ICSE, pages 241--250, New York, NY, USA, 2008. ACM.
[14]
A. E. Hassan. The road ahead for mining software repositories. In Proceedings ICSM, FoSM track, pages 48--57, 2008.
[15]
P. Hinds and C. McGrath. Structures that work: social structure, work structure and coordination ease in geographically distributed teams. In Proceedings of CSCW, pages 343--352, 2006.
[16]
R. Holmes and A. Begel. Deep intellisense: a tool for rehydrating evaporated information. In Proceedings of MSR, pages 23--26, 2008.
[17]
R. C. Holt. Grokking software architecture. In Proceedings of WCRE, pages 5--14, 2008.
[18]
D. Hyland-Wood, D. Carrington, and S. Kaplan. Toward a software maintenance methodology using semantic web techniques. In Proceedings of SOFTWARE-EVOLVABILITY, pages 23--30, 2006.
[19]
H. H. Kagdi, M. L. Collard, and J. I. Maletic. A survey and taxonomy of approaches for mining software repositories in the context of software evolution. Journal of Software Maintenance, 19(2):77--131, 2007.
[20]
C. Kiefer, A. Bernstein, and J. Tappolet. Mining software repositories with iSPARQL and a software evolution ontology. In Proceedings of MSR, page 10, 2007.
[21]
A. J. Ko, R. DeLine, and G. Venolia. Information needs in collocated software development teams. In Proceedings of ICSE, pages 344--353, 2007.
[22]
K. Kochut and M. Janik. Sparqler: Extended sparql for semantic association discovery. In Proceedings of ESWC, pages 145--159, 2007.
[23]
T. D. LaToza, G. Venolia, and R. DeLine. Maintaining mental models: a study of developer work habits. In Proceedings of ICSE, pages 492--501, 2006.
[24]
F. Manola and E. Miller. RDS primer. http://www.w3.org/TR/REC-rdf-syntax/, February 2004.
[25]
A. Mockus and J. D. Herbsleb. Expertise browser: a quantitative approach to identifying expertise. In Proceedings of ICSE, pages 503--512, 2002.
[26]
E. Prud'hommeaux and A. Seaborne. SPARQL query language for RDF. http://www.w3.org/TR/rdf-sparql-query/, January 2008.
[27]
P. Runeson, M. Alexandersson, and O. Nyholm. Detection of duplicate defect reports using natural language processing. In Proceedings of ICSE, pages 499--510, 2007.
[28]
Z. M. Saul, V. Filkov, P. Devanbu, and C. Bird. Recommending random walks. In Proceedings of ESEC-FSE, pages 15--24, 2007.
[29]
A. Tarvo. Mining software history to improve software maintenance quality: A case study. IEEE Software, 26(1):34--40, 2009.
[30]
E. Trainer, S. Quirk, C. de Souza, and D. Redmiles. Bridging the gap between technical and social dependencies with ariadne. In Proceedings of eTX at OOPSLA, pages 26--30, 2005.
[31]
G. Venolia. Textual alusions to artifacts in software-related repositories. In Proceedings of MSR, pages 151--154, 2006.
[32]
T. Zimmermann, P. Weißgerber, S. Diehl, and A. Zeller. Mining version histories to guide software changes. IEEE TSE, 31(6):429--445, 2005.

Cited By

View all
  • (2024)Meta-Manager: A Tool for Collecting and Exploring Meta Information about CodeProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642676(1-17)Online publication date: 11-May-2024
  • (2024)How does parenthood affect an ICT practitioner’s work? A survey study with fathersEmpirical Software Engineering10.1007/s10664-024-10534-929:6Online publication date: 19-Aug-2024
  • (2023)Vehicular Abandoned Object Detection Based on VANET and Edge AI in Road ScenesIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.329650824:12(14254-14266)Online publication date: Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '10: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
May 2010
627 pages
ISBN:9781605587196
DOI:10.1145/1806799
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. inter-team coordination
  2. knowledge management
  3. mining software repositories
  4. regular expression
  5. regular language reachability
  6. social networking

Qualifiers

  • Research-article

Conference

ICSE '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44
  • Downloads (Last 6 weeks)2
Reflects downloads up to 04 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Meta-Manager: A Tool for Collecting and Exploring Meta Information about CodeProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642676(1-17)Online publication date: 11-May-2024
  • (2024)How does parenthood affect an ICT practitioner’s work? A survey study with fathersEmpirical Software Engineering10.1007/s10664-024-10534-929:6Online publication date: 19-Aug-2024
  • (2023)Vehicular Abandoned Object Detection Based on VANET and Edge AI in Road ScenesIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.329650824:12(14254-14266)Online publication date: Dec-2023
  • (2022)Nalanda: a socio-technical graph platform for building software analytics tools at enterprise scaleProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3558949(1246-1256)Online publication date: 7-Nov-2022
  • (2022)Dominoes: An Interactive Exploratory Data Analysis Tool for Software RelationshipsIEEE Transactions on Software Engineering10.1109/TSE.2020.298824148:2(377-396)Online publication date: 1-Feb-2022
  • (2022)Sources of software development task frictionEmpirical Software Engineering10.1007/s10664-022-10187-627:7Online publication date: 1-Dec-2022
  • (2022)Social Community Evolution Analysis and Visualization in Open Source Software ProjectsWeb Information Systems Engineering – WISE 202210.1007/978-3-031-20891-1_4(38-45)Online publication date: 7-Nov-2022
  • (2022)Improving the detection of community smells through socio‐technical and sentiment analysisJournal of Software: Evolution and Process10.1002/smr.250535:6Online publication date: 2-Sep-2022
  • (2021)IIAG: a data-driven and theory-inspired approach for advising how to interact with new remote collaborators in OSS teamsAutomated Software Engineering10.1007/s10515-021-00283-028:2Online publication date: 24-May-2021
  • (2020)A Qualitative Study of Dependency Management and Its Security ImplicationsProceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security10.1145/3372297.3417232(1513-1531)Online publication date: 30-Oct-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media