skip to main content
research-article

Scalable and systematic detection of buggy inconsistencies in source code

Published: 17 October 2010 Publication History

Abstract

Software developers often duplicate source code to replicate functionality. This practice can hinder the maintenance of a software project: bugs may arise when two identical code segments are edited inconsistently. This paper presents DejaVu, a highly scalable system for detecting these general syntactic inconsistency bugs.
DejaVu operates in two phases. Given a target code base, a parallel /inconsistent clone analysis/ first enumerates all groups of source code fragments that are similar but not identical. Next, an extensible /buggy change analysis/ framework refines these results, separating each group of inconsistent fragments into a fine-grained set of inconsistent changes and classifying each as benign or buggy.
On a 75+ million line pre-production commercial code base, DejaVu executed in under five hours and produced a report of over 8,000 potential bugs. Our analysis of a sizable random sample suggests with high likelihood that at this report contains at least 2,000 true bugs and 1,000 code smells. These bugs draw from a diverse class of software defects and are often simple to correct: syntactic inconsistencies both indicate problems and suggest solutions.

References

[1]
}}A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proceedings of FOCS ’06, 2006.
[2]
}}B. S. Baker. On finding duplication and near-duplication in large software systems. In WCRE ’95: Proceedings of the Second Working Conference on Reverse Engineering, 1995.
[3]
}}I. D. Baxter, A. Yahin, L. Moura, M. Sant’Anna, and L. Bier. Clone detection using abstract syntax trees. In ICSM, 1998.
[4]
}}A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. In Selected papers from the sixth international conference on World Wide Web, pages 1157--1166, 1997.
[5]
}}A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler. An empirical study of operating systems errors. In SOSP ’01: Proceedings of the eighteenth ACM symposium on Operating systems principles, 2001.
[6]
}}E. Duala-Ekoko and M. P. Robillard. Tracking code clones in evolving software. In ICSE ’07: Proceedings of the 29th international conference on Software Engineering, 2007.
[7]
}}D. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specific, programmer-written compiler extensions. In OSDI, 2000.
[8]
}}D. R. Engler, D. Y. Chen, and A. Chou. Bugs as inconsistent behavior: A general approach to inferring errors in systems code. In SOSP, 2001.
[9]
}}D. Evans, J. Guttag, J. Horning, and Y. M. Tan. Lclint: a tool for using specifications to check code. In SIGSOFT FSE, 1994.
[10]
}}J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3), 1987.
[11]
}}M. Gabel, L. Jiang, and Z. Su. Scalable detection of semantic clones. In ICSE ’08: Proceedings of the 30th international conference on Software engineering, 2008.
[12]
}}A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proc. VLDB, 1999.
[13]
}}S. Hallem, B. Chelf, Y. Xie, and D. Engler. A system and language for building system-specific, static analyses. In Proc. PLDI ’02, 2002.
[14]
}}D. Hovemeyer and W. Pugh. Finding bugs is easy. In OOPSLA ’04, 2004.
[15]
}}L. Jiang, G. Misherghi, Z. Su, and S. Glondu. Deckard: Scalable and accurate tree-based detection of code clones. In Proceedings of ICSE, 2007.
[16]
}}L. Jiang, Z. Su, and E. Chiu. Context-based detection of clone-related bugs. In ESEC-FSE ’07, 2007.
[17]
}}E. Juergens, F. Deissenboeck, B. Hummel, and S. Wagner. Do code clones matter? In ICSE ’09: Proceedings of the 31st international conference on Software engineering, 2009.
[18]
}}T. Kamiya, S. Kusumoto, and K. Inoue. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. TSE, 28(7), 2002.
[19]
}}C. Kapser and M. W. Godfrey. "Cloning considered harmful" considered harmful. In Proc. WCRE ’06, pages 19--28, Washington, DC, USA, 2006. IEEE Computer Society.
[20]
}}M. Kim, V. Sazawal, D. Notkin, and G. Murphy. An empirical study of code clone genealogies. In ESEC/FSE-13, 2005.
[21]
}}Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: A tool for finding copy-paste and related bugs in operating system code. In OSDI, 2004.
[22]
}}S. Livieri, Y. Higo, M. Matushita, and K. Inoue. Very-large scale code clone analysis and visualization of open source programs using distributed CCFinder: D-CCFinder. In ICSE ’07, 2007.
[23]
}}T. T. Nguyen, H. Nguyen, N. Pham, J. Al-Kofahi, and T. Nguyen. Cleman: Comprehensive clone group evolution management. Automated Software Engineering (ASE), 2008., Sept. 2008.
[24]
}}J. W. Ratcliff and D. Metzener. Pattern matching: The gestalt approach. Dr. Dobb’s Journal, July 1988.
[25]
}}C. K. Roy and J. R. Cordy. An empirical study of function clones in open source software. In WCRE ’08: Proceedings of the 2008 15th Working Conference on Reverse Engineering, 2008.
[26]
}}S. Schleimer, D. S. Wilkerson, and A. Aiken. Winnowing: local algorithms for document fingerprinting. In SIGMOD, 2003.
[27]
}}M. Toomim, A. Begel, and S. Graham. Managing duplicated code with linked editing. In Proc. IEEE Symp. Visual Languages: Human Centric Computing, 2004.
[28]
}}R. Yang, P. Kalnis, and A. K. H. Tung. Similarity evaluation on tree-structured data. In SIGMOD, 2005.

Cited By

View all
  • (2021)A Systematic Literature Review on Bad Smells–5 W's: Which, When, What, Who, WhereIEEE Transactions on Software Engineering10.1109/TSE.2018.288097747:1(17-66)Online publication date: 1-Jan-2021
  • (2018)Benchmarks for software clone detection: A ten-year retrospective2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER.2018.8330194(26-37)Online publication date: Mar-2018
  • (2017)A Study of Security Vulnerabilities on Docker HubProceedings of the Seventh ACM on Conference on Data and Application Security and Privacy10.1145/3029806.3029832(269-280)Online publication date: 22-Mar-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 45, Issue 10
OOPSLA '10
October 2010
957 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1932682
Issue’s Table of Contents
  • cover image ACM Conferences
    OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applications
    October 2010
    984 pages
    ISBN:9781450302036
    DOI:10.1145/1869459
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2010
Published in SIGPLAN Volume 45, Issue 10

Check for updates

Author Tags

  1. bug detection
  2. clone detection
  3. static analysis

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)5
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)A Systematic Literature Review on Bad Smells–5 W's: Which, When, What, Who, WhereIEEE Transactions on Software Engineering10.1109/TSE.2018.288097747:1(17-66)Online publication date: 1-Jan-2021
  • (2018)Benchmarks for software clone detection: A ten-year retrospective2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER.2018.8330194(26-37)Online publication date: Mar-2018
  • (2017)A Study of Security Vulnerabilities on Docker HubProceedings of the Seventh ACM on Conference on Data and Application Security and Privacy10.1145/3029806.3029832(269-280)Online publication date: 22-Mar-2017
  • (2016)RIDACM SIGARCH Computer Architecture News10.1145/2980024.287238944:2(531-544)Online publication date: 25-Mar-2016
  • (2016)RIDACM SIGOPS Operating Systems Review10.1145/2954680.287238950:2(531-544)Online publication date: 25-Mar-2016
  • (2016)RIDACM SIGPLAN Notices10.1145/2954679.287238951:4(531-544)Online publication date: 25-Mar-2016
  • (2016)RIDProceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2872362.2872389(531-544)Online publication date: 25-Mar-2016
  • (2016)CLORIFIConcurrency and Computation: Practice & Experience10.1002/cpe.353228:6(1900-1917)Online publication date: 25-Apr-2016
  • (2014)Achieving accuracy and scalability simultaneously in detecting application clones on Android marketsProceedings of the 36th International Conference on Software Engineering10.1145/2568225.2568286(175-186)Online publication date: 31-May-2014
  • (2024)Effective Bug Detection with Unused DefinitionsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629576(720-735)Online publication date: 22-Apr-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media