skip to main content
10.1145/2318202.2318207acmotherconferencesArticle/Chapter ViewAbstractPublication PagesecoopConference Proceedingsconference-collections
research-article

A type system for regular expressions

Published: 12 June 2012 Publication History
  • Get Citation Alerts
  • Abstract

    Regular expressions are used to match and extract text. It is easy for developers to make syntactic mistakes when writing regular expressions, because regular expressions are often complex and different across programming languages. Such errors result in exceptions at run time, and there is currently no static support for preventing them.
    This paper describes practical experience designing and using a type system for regular expressions. This type system validates regular expression syntax and capturing group usage at compile time instead of at run time---ensuring the absence of PatternSyntaxExceptions from invalid syntax and IndexOutOfBoundsExceptions from accessing invalid capturing groups.
    Our implementation is publicly available and supports the full Java language. In an evaluation on five open-source Java applications (480kLOC), the type system was easy to use, required less than one annotation per two thousand lines, and found 56 previously-unknown bugs.

    References

    [1]
    Apache Chukwa website. http://incubator.apache.org/chukwa/.
    [2]
    Apache Lucene website. http://lucene.apache.org/.
    [3]
    V. Benzaken, G. Castagna, and A. Frisch. CDuce: an XML-centric general-purpose language. In ICFP, pages 51--63, 2003.
    [4]
    N. Broberg, A. Farre, and J. Svenningsson. Regular expression patterns. In ICFP, pages 67--78, 2004.
    [5]
    R. Bubel, R. Hähnle, and U. Geilmann. A formalisation of Java strings for program specification and verification. In SEFM, pages 90--105, 2011.
    [6]
    G. Castagna, D. Colazzo, and A. Frisch. Error mining for regular expression patterns. In Theoretical Computer Science, volume 3701 of LNCS, pages 160--172. Springer, 2005.
    [7]
    Checker Framework website. http://types.cs.washington.edu/checker-framework/.
    [8]
    Checkstyle website. http://checkstyle.sourceforge.net/.
    [9]
    T. Copeland. PMD Applied. Centennial Books, Nov. 2005.
    [10]
    Daikon website. http://groups.csail.mit.edu/pag/daikon/.
    [11]
    W. Dietl, S. Dietzel, M. D. Ernst, K. Muşlu, and T. Schiller. Building and using pluggable type-checkers. In ICSE, pages 681--690, May 2011.
    [12]
    M. D. Ernst. Type Annotations specification (JSR 308). http://types.cs.washington.edu/jsr308/, Oct. 2011.
    [13]
    M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. IEEE TSE, 27(2):99--123, Feb. 2001.
    [14]
    M. D. Ernst, J. H. Perkins, P. J. Guo, S. McCamant, C. Pacheco, M. S. Tschantz, and C. Xiao. The Daikon system for dynamic detection of likely invariants. Sci. Comput. Programming, 69(1-3):35--45, Dec. 2007.
    [15]
    FindBugs website. http://findbugs.sourceforge.net/.
    [16]
    D. Fisher and O. Shivers. Static analysis for syntax objects. In ICFP, pages 111--121, 2006.
    [17]
    Hammurapi website. http://www.hammurapi.biz/.
    [18]
    H. Hosoya and B. C. Pierce. XDuce: A typed XML processing language (preliminary report). In Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases, pages 226--244, 2001.
    [19]
    D. Hovemeyer and W. Pugh. Finding bugs is easy. In OOPSLA Companion, pages 132--136, Oct. 2004.
    [20]
    D. Hovemeyer and W. Pugh. Status report on JSR-305: Annotations for software defect detection. In OOPSLA Companion, pages 799--800, Oct. 2007.
    [21]
    JLint website. http://jlint.sourceforge.net/.
    [22]
    A. Kieżun, V. Ganesh, P. J. Guo, P. Hooimeijer, and M. D. Ernst. HAMPI: A solver for string constraints. In ISSTA, pages 105--116, July 2009.
    [23]
    A. Kieżun, P. J. Guo, K. Jayaraman, and M. D. Ernst. Automatic creation of SQL injection and cross-site scripting attacks. In ICSE, pages 199--209, May 2009.
    [24]
    Macker website. http://innig.net/macker/.
    [25]
    G. Mainland. Why it's nice to be quoted: quasiquoting for Haskell. In ACM SIGPLAN workshop on Haskell, pages 73--82, 2007.
    [26]
    Y. Minamide. Static approximation of dynamically generated web pages. In World Wide Web, pages 432--441, 2005.
    [27]
    M. M. Papi, M. Ali, T. L. Correa Jr., J. H. Perkins, and M. D. Ernst. Practical pluggable types for Java. In ISSTA, pages 201--212, July 2008.
    [28]
    Plume-lib website. http://plume-lib.googlecode.com/.
    [29]
    PMD website. http://pmd.sourceforge.net/.
    [30]
    D. Spinellis and P. Louridas. A framework for the static verification of API calls. Journal of Systems and Software, 80(7): 1156--1168, 2007.
    [31]
    T. Tateishi, M. Pistoia, and O. Tripp. Path- and index-sensitive string analysis based on monadic second-order logic. In ISSTA, pages 166--176, 2011.
    [32]
    G. Wassermann, C. Gould, Z. Su, and P. Devanbu. Static checking of dynamically generated queries in database applications. ACM Trans. Softw. Eng. Methodol., 16(4), 2007.
    [33]
    A. Wilk and W. Drabent. A Prototype of a Descriptive Type System for Xcerpt. In Principles and Practice of Semantic Web Reasoning, volume 4187 of LNCS, pages 262--275. Springer, 2006.
    [34]
    H. Xi and F. Pfenning. Dependent types in practical programming. In POPL, pages 214--227, Jan. 1999.
    [35]
    YaCy website. http://yacy.net/en/.
    [36]
    F. Yu, T. Bultan, M. Cova, and O. Ibarra. Symbolic string verification: An automata-based approach. In Model Checking Software, volume 5156 of LNCS, pages 306--324. Springer, 2008.

    Cited By

    View all
    • (2024)Understanding Regular Expression Denial of Service (ReDoS): Insights from LLM-Generated Regexes and Developer ForumsProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644424(190-201)Online publication date: 15-Apr-2024
    • (2024)Typing Requirement Model as CoroutinesIEEE Access10.1109/ACCESS.2024.335211512(8449-8460)Online publication date: 2024
    • (2023)InfeRE: Step-by-Step Regex Generation via Chain of Inference2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00111(1505-1515)Online publication date: 11-Sep-2023
    • Show More Cited By

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    FTfJP '12: Proceedings of the 14th Workshop on Formal Techniques for Java-like Programs
    June 2012
    53 pages
    ISBN:9781450312721
    DOI:10.1145/2318202
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • AITO: Assoc Internationale por les Technologies Objects

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 June 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. case studies
    2. checker framework
    3. regex
    4. regexp
    5. regular expressions
    6. type system

    Qualifiers

    • Research-article

    Conference

    ECOOP'12
    Sponsor:
    • AITO

    Acceptance Rates

    Overall Acceptance Rate 51 of 75 submissions, 68%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 14 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Understanding Regular Expression Denial of Service (ReDoS): Insights from LLM-Generated Regexes and Developer ForumsProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644424(190-201)Online publication date: 15-Apr-2024
    • (2024)Typing Requirement Model as CoroutinesIEEE Access10.1109/ACCESS.2024.335211512(8449-8460)Online publication date: 2024
    • (2023)InfeRE: Step-by-Step Regex Generation via Chain of Inference2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00111(1505-1515)Online publication date: 11-Sep-2023
    • (2023)Deducing Matching Strings for Real-World Regular ExpressionsDependable Software Engineering. Theories, Tools, and Applications10.1007/978-981-99-8664-4_19(331-350)Online publication date: 15-Dec-2023
    • (2022)Type-safe regular expressionsProceedings of the Scala Symposium10.1145/3550198.3550425(1-8)Online publication date: 6-Jun-2022
    • (2021)Interpretable Program SynthesisProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445646(1-16)Online publication date: 6-May-2021
    • (2021)Arext: Automatic Regular Expression Testing Tool Based on Generating Strings With Full Coverage2021 13th International Conference on Knowledge and Systems Engineering (KSE)10.1109/KSE53942.2021.9648604(1-6)Online publication date: 10-Nov-2021
    • (2021)TransRegexProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00111(1210-1222)Online publication date: 22-May-2021
    • (2021)Ensuring the Correctness of Regular Expressions: A ReviewInternational Journal of Automation and Computing10.1007/s11633-021-1301-4Online publication date: 4-Jun-2021
    • (2021)Demystifying regular expression bugsEmpirical Software Engineering10.1007/s10664-021-10033-127:1Online publication date: 5-Nov-2021
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media