skip to main content
10.1145/3236024.3236027acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Public Access

The impact of regular expression denial of service (ReDoS) in practice: an empirical study at the ecosystem scale

Published: 26 October 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Regular expressions (regexes) are a popular and powerful means of automatically manipulating text. Regexes are also an understudied denial of service vector (ReDoS). If a regex has super-linear worst-case complexity, an attacker may be able to trigger this complexity, exhausting the victim’s CPU resources and causing denial of service. Existing research has shown how to detect these superlinear regexes, and practitioners have identified super-linear regex anti-pattern heuristics that may lead to such complexity.
    In this paper, we empirically study three major aspects of ReDoS that have hitherto been unexplored: the incidence of super-linear regexes in practice, how they can be prevented, and how they can be repaired. In the ecosystems of two of the most popular programming languages — JavaScript and Python – we detected thousands of super-linear regexes affecting over 10,000 modules across diverse application domains. We also found that the conventional wisdom for super-linear regex anti-patterns has few false negatives but many false positives; these anti-patterns appear to be necessary, but not sufficient, signals of super-linear behavior. Finally, we found that when faced with a super-linear regex, developers favor revising it over truncating input or developing a custom parser, regardless of whether they had been shown examples of all three fix strategies. These findings motivate further research into ReDoS, since many modules are vulnerable to it and existing mechanisms to avoid it are insufficient. We believe that ReDoS vulnerabilities are a larger threat in practice than might have been guessed.

    References

    [1]
    Rabe Abdalkareem, Olivier Nourry, Sultan Wehaibi, Suhaib Mujahid, and Emad Shihab. 2017. Why Do Developers Use Trivial Packages? An Empirical Case Study on npm. In Foundations of Software Engineering (FSE). 1145/3106237.3106267
    [2]
    Alasdair Allan. 2012. Learning iOS Programming: From Xcode to App Store. O’Reilly Media.
    [3]
    Adam Baldwin. 2016.
    [4]
    Regular Expression Denial of Service affecting Express.js. http://web.archive.org/ web/20170116160113/https://medium.com/node-security/ regular-expression-denial-of-service-affecting-express-js-9c397c164c43.
    [5]
    Martin Berglund, Frank Drewes, and Brink Van Der Merwe. 2014. Analyzing Catastrophic Backtracking Behavior in Practical Regular Expression Matching. EPTCS: Automata and Formal Languages 2014 151 (2014), 109–123. org/10.4204/EPTCS.151.7
    [6]
    Carl Chapman and Kathryn T Stolee. 2016. Exploring regular expression usage and context in Python. In International Symposium on Software Testing and Analysis (ISSTA).
    [7]
    Carl Chapman, Peipei Wang, and Kathryn T Stolee. 2017. Exploring Regular Expression Comprehension. In Automated Software Engineering (ASE).
    [8]
    Checkmarx. 2016. The Node.js Highway - Attacks are at Full Throttle. In BlackHat USA. https://www.youtube.com/watch?v=-HzCUZDLXTc
    [9]
    Russ Cox. 2007. Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...). https://swtch.com/~rsc/regexp/regexp1.html
    [10]
    Scott Crosby. 2003. Denial of service through regular expressions. USENIX Security work in progress report (2003).
    [11]
    Scott A Crosby and Dan S Wallach. 2003. Denial of Service via Algorithmic Complexity Attacks. In USENIX Security.
    [12]
    James C Davis, Eric R Williamson, and Dongyoon Lee. 2018. A Sense of Time for JavaScript and Node.js: First-Class Timeouts as a Cure for Event Handler Poisoning. In USENIX Security Symposium (USENIX Security).
    [13]
    Erik DeBill. 2010. Module Counts. http://web.archive.org/web/20180114183225/ http://www.modulecounts.com/.
    [14]
    Stack Exchange. 2016. Outage Postmortem. http://web.archive. org/web/20180801005940/http://stackstatus.net/post/147710624694/ outage-postmortem-july-20-2016.
    [15]
    Maximiliano Firtman. 2013. Programming the Mobile Web: Reaching Users on iPhone, Android, BlackBerry, Windows Phone, and more. O’Reilly Media.
    [16]
    Michael Fitzgerald. 2012. Introducing regular expressions. O’Reilly Media, Inc.
    [17]
    Jeffrey EF Friedl. 2002. Mastering regular expressions. O’Reilly Media, Inc.
    [18]
    Patrice Godefroid. 1995. Partial-Order Methods for the Verification of Concurrent Systems. Ph.D. Dissertation. University of Liege. 3-540-60761-7
    [19]
    Jan Goyvaerts. 2006. Regular Expressions: The Complete Tutorial. Lulu Press.
    [20]
    Jan Goyvaerts and Steven Levithan. 2012. Regular expressions cookbook. O’Reilly Media, Inc.
    [21]
    James Kirrage, Asiri Rathnayake, and Hayo Thielecke. 2013. Static Analysis for Regular Expression Denial-of-Service Attacks. In International Conference on Network and System Security (NSS). 35–148.
    [22]
    Konstantinos Manikas and Klaus Marius Hansen. 2013. Software ecosystems-A systematic literature review. Journal of Systems and Software 86, 5 (2013), 1294–1306.
    [23]
    A Ojamaa and K Duuna. 2012. Assessing the security of Node.js platform. In 7th International Conference for Internet Technology and Secured Transactions (ICITST).
    [24]
    Theoolos Petsios, Jason Zhao, Angelos D Keromytis, and Suman Jana. 2017. SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities. In Computer and Communications Security (CCS).
    [25]
    Asiri Rathnayake and Hayo Thielecke. 2014. Static Analysis for Regular Expression Exponential Runtime via Substructural Logics. Technical Report.
    [26]
    Randy J. Ray and Pavel Kulchenko. 2002. Programming Web Services with Perl. O’Reilly Media.
    [27]
    Alex Roichman and Adar Weidman. 2009. VAC - ReDoS: Regular Expression Denial Of Service. Open Web Application Security Project (OWASP) (2009).
    [28]
    Yuju Shen, Yanyan Jiang, Chang Xu, Ping Yu, Xiaoxing Ma, and Jian Lu. 2018. ReScue: Crafting Regular Expression DoS Attacks. In Automated Software Engineering (ASE).
    [29]
    Michael Sipser. 2006. Introduction to the Theory of Computation. Vol. 2. Thomson Course Technology Boston.
    [30]
    Snyk.io. 2018. Vulnerability DB. http://web.archive.org/web/20180801010155/ https://snyk.io/vuln.
    [31]
    Henry Spencer. 1994. A regular-expression matcher. In Software solutions in C. 35–71.
    [32]
    Cristian-Alexandru Staicu and Michael Pradel. 2018. Freezing the Web: A Study of ReDoS Vulnerabilities in JavaScript-based Web Servers. In USENIX Security Symposium (USENIX Security). https://www.npmjs.com/package/safe-regexhttp: //mp.binaervarianz.de/ReDoS_TR_Dec2017.pdf
    [33]
    substack. 2013. safe-regex. https://web.archive.org/web/20180801003748/https: //github.com/substack/safe-regex.
    [34]
    Bryan Sullivan. 2010. Security Briefs - Regular Expression Denial of Service Attacks and Defenses. Technical Report. https://msdn.microsoft.com/en-us/magazine/ ff646973.aspx
    [35]
    Ken Thompson. 1968. Regular Expression Search Algorithm. Communications of the ACM (CACM) (1968).
    [36]
    https://www.fing.edu.uy/inco/cursos/intropln/material/ p419-thompson.pdf
    [37]
    Brink Van Der Merwe, Nicolaas Weideman, and Martin Berglund. 2017. Turning Evil Regexes Harmless. In SAICSIT.
    [38]
    Peipei Wang and Kathryn T Stolee. 2018. How well are regular expressions tested in the wild?. In Foundations of Software Engineering (FSE).
    [39]
    Nicolaas Weideman, Brink van der Merwe, Martin Berglund, and Bruce Watson. 2016. Analyzing matching time behavior of backtracking regular expression matchers by using ambiguity of NFA. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 9705. 322–334.
    [40]
    Nicolaas Hendrik Weideman. 2017. Static Analysis of Regular Expressions. MS. Stellenbosch University.
    [41]
    Wikipedia contributors. 2018. .NET Framework version history — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=.NET_ Framework_version_history&oldid=851937435. {Online; accessed 1-August- 2018}.
    [42]
    Wikipedia contributors. 2018. Regular expression — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Regular_expression& oldid=852858998. {Online; accessed 1-August-2018}.

    Cited By

    View all
    • (2024)Analyzing and Discovering Spatial Algorithm Complexity Vulnerabilities in RecursionApplied Sciences10.3390/app1405185514:5(1855)Online publication date: 23-Feb-2024
    • (2024)A Coq Mechanization of JavaScript Regular Expression SemanticsProceedings of the ACM on Programming Languages10.1145/36746668:ICFP(1003-1031)Online publication date: 15-Aug-2024
    • (2024)Static Analysis for Checking the Disambiguation Robustness of Regular ExpressionsProceedings of the ACM on Programming Languages10.1145/36564618:PLDI(2073-2097)Online publication date: 20-Jun-2024
    • Show More Cited By

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
    October 2018
    987 pages
    ISBN:9781450355735
    DOI:10.1145/3236024
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. ReDoS
    2. Regular expressions
    3. catastrophic backtracking
    4. empirical software engineering
    5. mining software repositories

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ESEC/FSE '18
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 112 of 543 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)293
    • Downloads (Last 6 weeks)36
    Reflects downloads up to 14 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Analyzing and Discovering Spatial Algorithm Complexity Vulnerabilities in RecursionApplied Sciences10.3390/app1405185514:5(1855)Online publication date: 23-Feb-2024
    • (2024)A Coq Mechanization of JavaScript Regular Expression SemanticsProceedings of the ACM on Programming Languages10.1145/36746668:ICFP(1003-1031)Online publication date: 15-Aug-2024
    • (2024)Static Analysis for Checking the Disambiguation Robustness of Regular ExpressionsProceedings of the ACM on Programming Languages10.1145/36564618:PLDI(2073-2097)Online publication date: 20-Jun-2024
    • (2024)Linear Matching of JavaScript Regular ExpressionsProceedings of the ACM on Programming Languages10.1145/36564318:PLDI(1336-1360)Online publication date: 20-Jun-2024
    • (2024)Understanding Regular Expression Denial of Service (ReDoS): Insights from LLM-Generated Regexes and Developer ForumsProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644424(190-201)Online publication date: 15-Apr-2024
    • (2024)Re(gEx|DoS)Eval: Evaluating Generated Regular Expressions and their Proneness to DoS AttacksProceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results10.1145/3639476.3639757(52-56)Online publication date: 14-Apr-2024
    • (2024)OneSpace: Detecting cross-language clones by learning a common embedding spaceJournal of Systems and Software10.1016/j.jss.2023.111911208(111911)Online publication date: Feb-2024
    • (2024)Structure and design of multimodal dataset for automatic regex synthesis methods in Roman UrduInternational Journal of Data Science and Analytics10.1007/s41060-024-00612-yOnline publication date: 23-Jul-2024
    • (2024)On the Decidability of Infix Inclusion ProblemTheory of Computing Systems10.1007/s00224-023-10160-w68:3(301-321)Online publication date: 13-Jan-2024
    • (2023)Bilingual problemsProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620580(6133-6150)Online publication date: 9-Aug-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media