skip to main content
10.1145/3238147.3238159acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

ReScue: crafting regular expression DoS attacks

Published: 03 September 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Regular expression (regex) with modern extensions is one of the most popular string processing tools. However, poorly-designed regexes can yield exponentially many matching steps, and lead to regex Denial-of-Service (ReDoS) attacks under well-conceived string inputs. This paper presents Rescue, a three-phase gray-box analytical technique, to automatically generate ReDoS strings to highlight vulnerabilities of given regexes. Rescue systematically seeds (by a genetic search), incubates (by another genetic search), and finally pumps (by a regex-dedicated algorithm) for generating strings with maximized search time. We implemenmted the Rescue tool and evaluated it against 29,088 practical regexes in real-world projects. The evaluation results show that Rescue found 49% more attack strings compared with the best existing technique, and applying Rescue to popular GitHub projects discovered ten previously unknown ReDoS vulnerabilities.

    References

    [1]
    Michela Becchi and Srihari Cadambi. 2007. Memory-efficient regular expression search using state merging. In Proceedings of the 26th IEEE International Conference on Computer Communications (INFOCOM ’07). 1064–1072.
    [2]
    1109/INFCOM.2007.128
    [3]
    Udi Ben-Porat, Anat Bremler-Barr, Hanoch Levy, and Bernhard Plattner. 2012. On the vulnerability of hardware hash tables to sophisticated attacks. In International Conference on Research in Networking (Networking ’12). 135–148.
    [4]
    Martin Berglund, Frank Drewes, and Brink Van Der Merwe. 2014. Analyzing Catastrophic Backtracking Behavior in Practical Regular Expression Matching. In Proceedings of the 14th International Conference on Automata and Formal Languages (AFL ’14). 109–123.
    [5]
    Martin Berglund and Brink van der Merwe. 2017. On the semantics of regular expression parsing in the wild. Theoretical Computer Science 679 (May 2017), 69–82.
    [6]
    Anne Brüggemann-Klein. 1993. Regular expressions into finite automata. Theoretical Computer Science 120, 2 (Nov. 1993), 197–213. 0304-3975(93)90287-4
    [7]
    Cezar Câmpeanu, Kai Salomaa, and Sheng Yu. 2003. A formal study of practical regular expressions. International Journal of Foundations of Computer Science 14, 06 (Nov. 2003), 1007–1018.
    [8]
    Carl Chapman and Kathryn T Stolee. 2016. Exploring regular expression usage and context in Python. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA ’16). 282–293. 2931037.2931073
    [9]
    Noam Chomsky. 1956. Three models for the description of language. IRE Transactions on Information Theory 2, 3 (Oct. 1956), 113–124. TIT.1956.1056813
    [10]
    John Clarke, Jose Javier Dolado, Mark Harman, Rob Hierons, Bryan Jones, Mary Lumkin, Brian Mitchell, Spiros Mancoridis, Kearton Rees, Marc Roper, et al. 2003. Reformulating software engineering as a search problem. IEE Proceedings - Software 150, 3 (June 2003), 161–175.
    [11]
    Brendan Cody-Kenny, Michael Fenton, Adrian Ronayne, Eoghan Considine, Thomas McGuire, and Michael O’Neill. 2017. A search for improved performance in regular expressions. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’17). 1280–1287.
    [12]
    Erik Corry, Christian P Hansen, and Lasse R H Nielsen. 2009. Irregexp, Google Chrome’s New Regexp Implementation. https://blog.chromium.org/2009/02/ irregexp-google-chromes-new-regexp.html
    [13]
    Russ Cox. 2007. Regular expression matching can be simple and fast (but is slow in Java, Perl, Php, Python, Ruby, ...). Technical Report.
    [14]
    Scott A Crosby and Dan S Wallach. 2003. Denial of Service via Algorithmic Complexity Attacks. In Proceedings of the 12th USENIX Security Symposium (USENIX Security ’03). 29–44. https://dl.acm.org/citation.cfm?id=1251356
    [15]
    Yuan-Shun Dai, Min Xie, Kim-Leng Poh, and Bo Yang. 2003. Optimal testingresource allocation with genetic algorithm for modular software systems. Journal of Systems and Software 66, 1 (April 2003), 47–55. S0164-1212(02)00062-6
    [16]
    Python Software Foundation. 2018. regex (PyPI). https://pypi.org/project/regex/
    [17]
    Jeffrey E F Friedl. 2006. Mastering Regular Expressions: Understand Your Data and Be More Productive (3th ed.).
    [18]
    Patrice Godefroid, Michael Y Levin, David A Molnar, et al. 2008. Automated whitebox fuzz testing. In Proceedings of the 16th Annual Network and Distributed System Security Symposium (NDSS ’08). 151–166. http://www.isoc.org/isoc/ conferences/ndss/08/papers/10_automated_whitebox_fuzz.pdf
    [19]
    Jan Goyvaerts. 2018. Popular Tools, Utilities and Programming Languages That Support Regular Expressions. https://www.regular-expressions.info/tools.html
    [20]
    Jan Goyvaerts. 2018. Runaway Regular Expressions: Catastrophic Backtracking. https://www.regular-expressions.info/catastrophic.html
    [21]
    Mark Harman and Bryan F Jones. 2001. Search-based software engineering. Information and Software Technology 43, 14 (Dec. 2001), 833–839.
    [22]
    Mark Harman, S Afshin Mansouri, and Yuanyuan Zhang. 2012. Search-based software engineering: Trends, techniques and applications. ACM Computing Surveys (CSUR) 45, 1, Article 11 (Nov. 2012), 61 pages. 2379776.2379787
    [23]
    Philip Hazel. 2017. Backtracking Limit of PCRE Match. http://www.pcre.org/ current/doc/html/pcre2api.html
    [24]
    John E Hopcroft, Rajeev Motwani, and Jeffrey D Ullman. 2006. Introduction to Automata Theory, Languages, and Computation (3rd ed.).
    [25]
    Mohd Ehmer Khan, Farmeena Khan, et al. 2012. A comparative study of white box, black box and grey box testing techniques. International Journal of Advanced Computer Science and Applications 3, 6 (2012), 12–15. ijacsa.2012.030603
    [26]
    James Kirrage, Asiri Rathnayake, and Hayo Thielecke. 2013. Static analysis for regular expression denial-of-service attacks. In Proceedings of the 7th International Conference on Network and System Security (NSS ’13). 135–148.
    [27]
    Stephen Cole Kleene. 1951. Representation of Events in Nerve Nets and Finite Automata. Technical Report.
    [28]
    Andrew M Kuchling. 2018. Regular Expression How-To. https://docs.python. org/3/howto/regex.html
    [29]
    Cheng-Hung Lin, Chen-Hsiung Liu, and Shih-Chieh Chang. 2011. Accelerating regular expression matching using hierarchical parallel machines on GPU. In Proceedings of the 2011 IEEE Global Telecommunications Conference (GLOBECOM ’11). IEEE, 1–5.
    [30]
    Phil McMinn. 2004. Search-based software test data generation: A survey. Software Testing, Verification and Reliability 14, 2 (June 2004), 105–156.
    [31]
    Carlos Pacheco, Shuvendu K Lahiri, Michael D Ernst, and Thomas Ball. 2007. Feedback-directed random test generation. In Proceedings of the 29th International Conference on Software Engineering (ICSE ’07). 75–84. ICSE.2007.37
    [32]
    Theofilos Petsios, Jason Zhao, Angelos D Keromytis, and Suman Jana. 2017. Slowfuzz: Automated domain-independent detection of algorithmic complexity vulnerabilities. In Proceedings of the International Conference on Computer and Communications Security (CCS ’17). 2155–2168.
    [33]
    [34]
    LLVM Project. 2018. libFuzzer: A library for coverage-guided fuzz testing. https: //llvm.org/docs/LibFuzzer.html
    [35]
    Michael O Rabin and Dana Scott. 1959. Finite automata and their decision problems. IBM Journal of Research and Development 3, 2 (April 1959), 114–125.
    [36]
    Asiri Rathnayake. 2015. Semantics, analysis and security of backtracking regular expression matchers. Ph.D. Dissertation. Advisor(s) Thielecke, Hayo. http: //etheses.bham.ac.uk/6011/
    [37]
    Asiri Rathnayake and Hayo Thielecke. 2014. Static analysis for regular expression exponential runtime via substructural logics. (2014). arXiv: arXiv:1405.7058
    [38]
    Juraj Hajduch (SK). 2007. RegExLib: A regular expression library. http://www. regexlib.com/REDetails.aspx?regexp_id=2598
    [39]
    Randy Smith, Cristian Estan, and Somesh Jha. 2006. Backtracking algorithmic complexity attacks against a NIDS. In Proceedings of the 22nd Annual Computer Security Applications Conference (ACSAC ’06). 89–98. ACSAC.2006.17
    [40]
    Mozhan Soltani, Annibale Panichella, and Arie van Deursen. 2017. A guided genetic algorithm for automated crash reproduction. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). 209–220.
    [41]
    Praveen Ranjan Srivastava and Tai-hoon Kim. 2009. Application of genetic algorithm in software testing. International Journal of Software Engineering and Its Applications 3, 4 (Oct. 2009), 87–96.
    [42]
    Cristian-Alexandru Staicu and Michael Pradel. 2018. Freezing the web: A study of redos vulnerabilities in javascript-based web servers. In 27th USENIX Security Symposium (USENIX Security ’18).
    [43]
    Satoshi Sugiyama and Yasuhiko Minamide. 2014. Checking time linearity of regular expression matching based on backtracking. IPSJ Online Transactions 7 (Nov. 2014), 82–92.
    [44]
    Bryan Sullivan. 2010. New Tool: SDL Regex Fuzzer. https://cloudblogs.microsoft. com/microsoftsecure/2010/10/12/new-tool-sdl-regex-fuzzer
    [45]
    Bryan Sullivan. 2010. Regular Expression Denial of Service Attacks and Defenses. https://msdn.microsoft.com/en-us/magazine/ff646973.aspx
    [46]
    Ken Thompson. 1968. Programming techniques: Regular expression search algorithm. Commun. ACM 11, 6 (June 1968), 419–422. 363347.363387
    [47]
    Nicolaas Weideman, Brink van der Merwe, Martin Berglund, and Bruce Watson. 2016. Analyzing matching time behavior of backtracking regular expression matchers by using ambiguity of NFA. In Proceedings of the International Conference on Implementation and Application of Automata (CIAA ’16). 322–334.
    [48]
    Adar Weidman. 2017. Regular Expression Denial of Service - ReDoS. https: //www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS
    [49]
    Darrell Whitley. 1994. A genetic algorithm tutorial. Statistics and Computing 4, 2 (June 1994), 65–85.
    [50]
    Valentin Wüstholz, Oswaldo Olivo, Marijn JH Heule, and Isil Dillig. 2017. Static detection of DoS vulnerabilities in programs that use regular expressions. In Proceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS ’17). 3–20. 978-3-662-54580-5_1

    Cited By

    View all
    • (2024)Analyzing and Discovering Spatial Algorithm Complexity Vulnerabilities in RecursionApplied Sciences10.3390/app1405185514:5(1855)Online publication date: 23-Feb-2024
    • (2024)Linear Matching of JavaScript Regular ExpressionsProceedings of the ACM on Programming Languages10.1145/36564318:PLDI(1336-1360)Online publication date: 20-Jun-2024
    • (2024)Understanding Regular Expression Denial of Service (ReDoS): Insights from LLM-Generated Regexes and Developer ForumsProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644424(190-201)Online publication date: 15-Apr-2024
    • Show More Cited By

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering
    September 2018
    955 pages
    ISBN:9781450359375
    DOI:10.1145/3238147
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 September 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    • Distinguished Paper

    Author Tags

    1. ReDoS
    2. denial of service
    3. egular expression
    4. genetic algorithm

    Qualifiers

    • Research-article

    Conference

    ASE '18
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 82 of 337 submissions, 24%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)108
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 14 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Analyzing and Discovering Spatial Algorithm Complexity Vulnerabilities in RecursionApplied Sciences10.3390/app1405185514:5(1855)Online publication date: 23-Feb-2024
    • (2024)Linear Matching of JavaScript Regular ExpressionsProceedings of the ACM on Programming Languages10.1145/36564318:PLDI(1336-1360)Online publication date: 20-Jun-2024
    • (2024)Understanding Regular Expression Denial of Service (ReDoS): Insights from LLM-Generated Regexes and Developer ForumsProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644424(190-201)Online publication date: 15-Apr-2024
    • (2024)Re(gEx|DoS)Eval: Evaluating Generated Regular Expressions and their Proneness to DoS AttacksProceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results10.1145/3639476.3639757(52-56)Online publication date: 14-Apr-2024
    • (2024)Structure and design of multimodal dataset for automatic regex synthesis methods in Roman UrduInternational Journal of Data Science and Analytics10.1007/s41060-024-00612-yOnline publication date: 23-Jul-2024
    • (2024)Efficient Matching with Memoization for Regexes with Look-around and Atomic GroupingProgramming Languages and Systems10.1007/978-3-031-57267-8_4(90-118)Online publication date: 6-Apr-2024
    • (2023)Algorithmic Complexity Attacks on Dynamic Learned IndexesProceedings of the VLDB Endowment10.14778/3636218.363623217:4(780-793)Online publication date: 1-Dec-2023
    • (2023)Improving Developers’ Understanding of Regex Denial of Service Tools through Anti-Patterns and Fix Strategies2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179442(1238-1255)Online publication date: May-2023
    • (2023)Effective ReDoS Detection by Principled Vulnerability Modeling and Exploit Generation2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179328(2427-2443)Online publication date: May-2023
    • (2023)EndWatch: A Practical Method for Detecting Non-Termination in Real-World Software2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00061(686-697)Online publication date: 11-Sep-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media