research-article

Documenting database usages and schema constraints in database-centric applications

Authors:

Mario Linares-Vásquez,

Christopher Vendome,

Denys PoshyvanykAuthors Info & Claims

ISSTA 2016: Proceedings of the 25th International Symposium on Software Testing and Analysis

Pages 270 - 281

https://doi.org/10.1145/2931037.2931072

Published: 18 July 2016 Publication History

Abstract

Database-centric applications (DCAs) usually rely on database operations over a large number of tables and attributes. Understanding how database tables and attributes are used to implement features in DCAs along with the constraints related to these usages is an important component of any DCA’s maintenance. However, manually documenting database related operations and their asynchronously evolving constraints in constantly changing source code is a hard and time-consuming problem. In this paper, we present a novel approach, namely DBScribe, aimed at automatically generating always up-to-date natural language descriptions of database operations and schema constraints in source code methods. DBScribe statically analyzes the code and database schema to detect database usages and then prop- agates these usages and schema constraints through the call-chains implementing database-related features. Finally, each method in these call-chains is automatically documented based on the underlying database usages and constraints.

We evaluated DBScribe in a study with 52 participants analyzing generated documentation for database-related methods in five open-source DCAs. Additionally, we evaluated the descriptions generated by DBScribe on two commercial DCAs involving original developers. The results for the studies involving open-source and commercial DCAs demonstrate that generated descriptions are accurate and useful while understanding database usages and constraints, in particular during maintenance tasks.

References

[1]

Dbscribe online appendix. http: //www.cs.wm.edu/semeru/data/ISSTA16-DBScribe.

[2]

Fina http://sourceforge.net/projects/fina/.

[3]

Jsqlparser. http://jsqlparser.sourceforge.net/.

[4]

Liminal ltda http://www.liminal-it.com/.

[5]

Openemm e-mail & marketing automation http://sourceforge.net/projects/openemm/files/ OpenEMM%20software/OpenEMM%206.0/.

[6]

Qualtrics. http://www.qualtrics.com.

[7]

Risk it repository. https://riskitinsurance.svn.sourceforge.net.

[8]

Umas repository. https://github.com/ University-Management-And-Scheduling.

[9]

Xinco rev 700 http://sourceforge.net/p/xinco/code/700/tree/trunk/.

[10]

Xinco http://sourceforge.net/projects/xinco/.

[11]

R. Agrawal, T. Imieli´ nski, and A. Swami. Mining association rules between sets of items in large databases. In ACM SIGMOD Record, volume 22, pages 207–216. ACM, 1993.

Digital Library

[12]

R. Alhajj. Extracting the extended entity-relationship model from a legacy relational database. Information Systems, 28(6):597–618, 2003.

Digital Library

[13]

D. Alur, D. Malks, and J. Crupi. Core J2EE Patterns: Best Practices and Design Strategies. Prentice Hall Press, Upper Saddle River, NJ, USA, 2nd edition, 2013.

Digital Library

[14]

K. Bakshi. Considerations for big data: Architecture and approach. In Aerospace Conference, 2012 IEEE, pages 1–7. IEEE, 2012.

[15]

R. Buse and W. Weimer. Automatically documenting program changes. In ASE’10, pages 33–42, 2010.

Digital Library

[16]

R. P. Buse and W. R. Weimer. Automatic documentation inference for exceptions. In Proceedings of the 2008 international symposium on Software testing and analysis, pages 273–282. ACM, 2008.

Digital Library

[17]

G. Canfora, L. Cerulo, and M. Di Penta. Ldiff: An enhanced line differencing tool. In Proceedings of the 31st International Conference on Software Engineering, pages 595–598. IEEE Computer Society, 2009.

Digital Library

[18]

A. Cleve, M. Gobert, L. Meurice, J. Maes, and J. Weber. Understanding database schema evolution: A case study. Science of Computer Programming, 97, Part 1:113 – 121, 2015. Special Issue on New Ideas and Emerging Results in Understanding Software.

Digital Library

[19]

L. F. Cortés-Coy, M. Linares-Vásquez, J. Aponte, and D. Poshyvanyk. On automatically generating commit messages via summarization of source code changes. In Source Code Analysis and Manipulation (SCAM), 2014 IEEE 14th International Working Conference on, pages 275–284. IEEE, 2014.

Digital Library

[20]

J. Feigenspan, C. Kästner, J. Liebig, S. Apel, and S. Hanenberg. Measuring programming experience. In ICPC’12, pages 73–82, 2012.

[21]

B. Fluri, M. Wursch, and H. Gall. Do code and comments co-evolve? on the relation between source code and comment changes. In Reverse Engineering, 2007. WCRE 2007. 14th Working Conference on, pages 70–79, Oct 2007.

Digital Library

[22]

B. Fluri, M. Würsch, E. Giger, and H. C. Gall. Analyzing the co-evolution of comments and source code. Software Quality Journal, 17(4):367–394, 2009.

Digital Library

[23]

T. Fritz, D. C. Shepherd, K. Kevic, W. Snipes, and C. Bräunlich. Developers’ code context models for change tasks. In 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, pages 7–18, New York, NY, USA, 2014.

Digital Library

[24]

M. Goeminne, A. Decan, and T. Mens. Co-evolving code-related and database-related changes in a data-intensive software system. In Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), 2014 Software Evolution Week-IEEE Conference on, pages 353–357. IEEE, 2014.

[25]

M. Grechanik, C. Csallner, C. Fu, and Q. Xie. Is data privacy always good for software testing? In ISSRE’10, pages 368–377, 2010.

Digital Library

[26]

D. Jackson and D. A. Ladd. Semantic diff: A tool for summarizing the effects of modifications. In Software Maintenance, 1994. Proceedings., International Conference on, pages 243–252. IEEE, 1994.

Digital Library

[27]

M. Kamimura and G. Murphy. Towards generating human-oriented summaries of unit test cases. In 2013 IEEE 21st International Conference on Program Comprehension (ICPC), pages 215–218, May 2013.

[28]

K. Kevic, T. Fritz, and D. Shepherd. Comogen: An approach to locate relevant task context by combining search and navigation. In IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 61–70, Sept 2014.

Digital Library

[29]

M. Kim and D. Notkin. Discovering and representing systematic code changes. In Proceedings of the 31st International Conference on Software Engineering, pages 309–319, Washington, DC, USA, 2009. IEEE Computer Society.

Digital Library

[30]

M. Kim, D. Notkin, D. Grossman, and G. Wilson. Identifying and summarizing systematic code changes via rule inference. IEEE Transactions on Software Engineering, 39(1):45–62, 2013.

Digital Library

[31]

C. M. Kuok, A. Fu, and M. H. Wong. Mining fuzzy association rules in databases. ACM Sigmod Record, 27(1):41–46, 1998.

Digital Library

[32]

B. Li, M. Grechanik, and D. Poshyvanyk. Sanitizing and minimizing databases for software application test outsourcing. In Software Testing, Verification and Validation (ICST), 2014 IEEE Seventh International Conference on, pages 233–242. IEEE, 2014.

Digital Library

[33]

B. Li, C. Vendome, M. Linares-Vásquez, D. Poshyvanyk, and N. Kraft. Automatically documenting unit test cases. In ICST’16, pages 341–352, 2016.

[34]

D.-Y. Lin and I. Neamtiu. Collateral evolution of applications and databases. In IWPSE-Evol ’09, pages 31–40, 2009.

Digital Library

[35]

M. Linares-Vásquez, L. F. Cortés-Coy, J. Aponte, and D. Poshyvanyk. Changescribe: A tool for automatically generating commit messages. In 37th IEEE/ACM International Conference on Software Engineering (ICSE’15) - Tool Demo Track, pages 709–712. IEEE, 2015.

Digital Library

[36]

M. Linares-Vásquez, B. Li, C. Vendome, and D. Poshyvanyk. How do developers document database usages in source code? In ASE’15 - New Ideas Track, pages 36–41, 2015.

[37]

D. C. Littman, J. Pinto, S. Letovsky, and E. Soloway. Mental models and software maintenance. J. Syst. Softw., 7(4):341–355, Dec. 1987.

Digital Library

[38]

A. Maule, W. Emmerich, and D. S. Rosenblum. Impact analysis of database schema changes. In Proceedings of the 30th international conference on Software engineering, pages 451–460. ACM, 2008.

Digital Library

[39]

P. W. McBurney and C. McMillan. Automatic documentation generation via source code summarization of method context. In ICPC’14, page to appear, 2014.

Digital Library

[40]

Microsoft. Microsoft Application Architecture Guide. Microsoft Press, 2nd edition, 2009.

Digital Library

[41]

L. Moreno, J. Aponte, G. Sridhara, A. Marcus, L. Pollock, and K. Vijay-Shanker. Automatic generation of natural language summaries for java classes. In Program Comprehension (ICPC), 2013 IEEE 21st International Conference on, pages 23–32. IEEE, 2013.

[42]

L. Moreno, G. Bavota, M. D. Penta, R. Oliveto, A. Marcus, and G. Canfora. Automatic generation of release notes. In FSE’14, 2014.

Digital Library

[43]

L. Moreno, A. Marcus, L. Pollock, and K. Vijay-Shanker. Jsummarizer: An automatic generator of natural language summaries for java classes. In Program Comprehension (ICPC), 2013 IEEE 21st International Conference on, pages 230–232. IEEE, 2013.

[44]

H. A. Nguyen, T. T. Nguyen, H. V. Nguyen, and T. N. Nguyen. idiff: Interaction-based program differencing tool. In Automated Software Engineering (ASE), 2011 26th IEEE/ACM International Conference on, pages 572–575. IEEE, 2011.

Digital Library

[45]

S. Panichella, J. Aponte, M. Di Penta, A. Marcus, and G. Canfora. Mining source code descriptions from developer communications. In 2012 IEEE 20th International Conference on Program Comprehension (ICPC), pages 63–72, June 2012.

[46]

S. Panichella, A. Panichella, M. Bella, A. Zaidman, and H. Gall. The impact of test case summaries on bug fixing performance: An empirical investigation. In 38th International Conference on Software Engineering (ICSE 2016), page to appear, 2016.

Digital Library

[47]

C. Parnin and C. Görg. Improving change descriptions with change contexts. In Proceedings of the 2008 international working conference on Mining software repositories, pages 51–60. ACM, 2008.

Digital Library

[48]

J.-M. Petit, F. Toumani, J.-F. Boulicaut, and J. Kouloumdjian. Towards the reverse engineering of renormalized relational databases. In Data Engineering, 1996. Proceedings of the Twelfth International Conference on, pages 218–227. IEEE, 1996.

Digital Library

[49]

D. Qiu, B. Li, and Z. Su. An empirical analysis of the co-evolution of schema and code in database applications. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pages 125–135. ACM, 2013.

Digital Library

[50]

S. Rastkar. Summarizing software concerns. In Software Engineering, 2010 ACM/IEEE 32nd International Conference on, volume 2, pages 527–528, May 2010.

Digital Library

[51]

S. Rastkar, G. Murphy, and A. Bradley. Generating natural language summaries for crosscutting source code concerns. In 27th IEEE International Conference on Software Maintenance (ICSM), pages 103–112, Sept 2011.

Digital Library

[52]

S. Rastkar, G. C. Murphy, and G. Murray. Automatic summarization of bug reports. IEEE Trans. Software Eng, 40(4):366–380, 2014.

Digital Library

[53]

D. Sjøberg. Quantifying schema evolution. Information and Software Technology, 35(1):35–44, 1993.

[54]

G. Sridhara, E. Hill, D. Muppaneni, L. Pollock, and K. Vijay-Shanker. Towards automatically generating summary comments for java methods. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE’10), pages 43–52, 2010.

Digital Library

[55]

G. Sridhara, E. Hill, D. Muppaneni, L. Pollock, and K. Vijay-Shanker. Towards automatically generating summary comments for java methods. In Proceedings of the IEEE/ACM international conference on Automated software engineering, pages 43–52. ACM, 2010.

Digital Library

[56]

C. Vassallo, S. Panichella, M. Di Penta, and G. Canfora. Codes: Mining source code descriptions from developers discussions. In 22Nd International Conference on Program Comprehension, pages 106–109, New York, NY, USA, 2014. ACM.

Digital Library

[57]

A. T. T. Ying and M. P. Robillard. Code fragment summarization. In ESEC/FSE’13, 2013.

Digital Library

[58]

A. T. T. Ying and M. P. Robillard. Selection and presentation practices for code example summarization. In 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 460–471, 2014.

Digital Library

Cited By

Zohdinasab TRiccio VTonella P(2023)An Empirical Study on Low- and High-Level Explanations of Deep Learning Misbehaviours2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)10.1109/ESEM56168.2023.10304866(1-11)Online publication date: 26-Oct-2023
https://doi.org/10.1109/ESEM56168.2023.10304866
Moran KYachnes APurnell GMahmud JTufano MCardenas CPoshyvanyk DH'Doubler Z(2022)An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00069(514-525)Online publication date: Mar-2022
https://doi.org/10.1109/SANER53432.2022.00069
Grgurina ISkvorc D(2021)Simplified Evaluation Framework for Query Extraction Techniques2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO)10.23919/MIPRO52101.2021.9596923(1648-1653)Online publication date: 27-Sep-2021
https://doi.org/10.23919/MIPRO52101.2021.9596923
Show More Cited By

Index Terms

Documenting database usages and schema constraints in database-centric applications
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Documentation
  2. Software notations and tools
    1. Software maintenance tools

Recommendations

The Effectiveness of Test Coverage Criteria for Relational Database Schema Integrity Constraints

Despite industry advice to the contrary, there has been little work that has sought to test that a relational database's schema has correctly specified integrity constraints. These critically important constraints ensure the coherence of data in a ...
Oracle Database 12c SQL
Database Systems: A Practical Approach to Design, Implementation and Management

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2016: Proceedings of the 25th International Symposium on Software Testing and Analysis

July 2016

452 pages

ISBN:9781450343909

DOI:10.1145/2931037

General Chair:
Andreas Zeller
Saarland University, Germany
,
Program Chair:
Abhik Roychoudhury
National University of Singapore, Singapore

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISSTA '16

Sponsor:

SIGSOFT

ISSTA '16: International Symposium on Software Testing and Analysis

July 18 - 20, 2016

Saarbrücken, Germany

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '24

Sponsor:
sigsoft

33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 16 - 20, 2024

Vienna , Austria

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
240
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)1

Reflects downloads up to 04 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zohdinasab TRiccio VTonella P(2023)An Empirical Study on Low- and High-Level Explanations of Deep Learning Misbehaviours2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)10.1109/ESEM56168.2023.10304866(1-11)Online publication date: 26-Oct-2023
https://doi.org/10.1109/ESEM56168.2023.10304866
Moran KYachnes APurnell GMahmud JTufano MCardenas CPoshyvanyk DH'Doubler Z(2022)An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00069(514-525)Online publication date: Mar-2022
https://doi.org/10.1109/SANER53432.2022.00069
Grgurina ISkvorc D(2021)Simplified Evaluation Framework for Query Extraction Techniques2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO)10.23919/MIPRO52101.2021.9596923(1648-1653)Online publication date: 27-Sep-2021
https://doi.org/10.23919/MIPRO52101.2021.9596923
Lyu YVolokh SHalfond WTripp OCadar CZhang X(2021)SAND: a static analysis approach for detecting SQL antipatternsProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464818(270-282)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3460319.3464818
Wu SZhou YWang X(2021)Exploring User Experience of Automatic Documentation ToolsExtended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411763.3451606(1-6)Online publication date: 8-May-2021
https://dl.acm.org/doi/10.1145/3411763.3451606
Aghajani EBavota GLinares-Vasquez MLanza M(2021)Automated Documentation of Android AppsIEEE Transactions on Software Engineering10.1109/TSE.2018.289065247:1(204-220)Online publication date: 1-Jan-2021
https://doi.org/10.1109/TSE.2018.2890652
Vicente AEtcheverry LSabiguero A(2021)An RDBMS-only architecture for web applications2021 XLVII Latin American Computing Conference (CLEI)10.1109/CLEI53233.2021.9640017(1-9)Online publication date: 25-Oct-2021
https://doi.org/10.1109/CLEI53233.2021.9640017
Benats PGobert MMeurice LNagy CCleve A(2021)An Empirical Study of (Multi-) Database Models in Open-Source ProjectsConceptual Modeling10.1007/978-3-030-89022-3_8(87-101)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1007/978-3-030-89022-3_8
Alsharif AKapfhammer GMcMinn P(2020)Hybrid Methods for Reducing Database Schema Test SuitesProceedings of the IEEE/ACM 1st International Conference on Automation of Software Test10.1145/3387903.3389305(41-50)Online publication date: 7-Oct-2020
https://dl.acm.org/doi/10.1145/3387903.3389305
Aghajani ENagy CLinares-Vásquez MMoreno LBavota GLanza MShepherd DRothermel GBae D(2020)Software documentationProceedings of the ACM/IEEE 42nd International Conference on Software Engineering10.1145/3377811.3380405(590-601)Online publication date: 27-Jun-2020
https://dl.acm.org/doi/10.1145/3377811.3380405
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents