Flashregex: Deducing anti-redos regexes from examples

Y Li, Z Xu, J Cao, H Chen, T Ge, SC Cheung… - Proceedings of the 35th …, 2020 - dl.acm.org
Proceedings of the 35th IEEE/ACM International Conference on Automated …, 2020dl.acm.org
Regular expressions (regexes) are widely used in different fields of computer science such
as programming languages, string processing and databases. However, existing tools for
synthesizing or repairing regexes were not designed to be resilient to Regex Denial of
Service (ReDoS) attacks. Specifically, if a regex has super-linear (SL) worst-case
complexity, an attacker could provide carefully-crafted inputs to launch ReDoS attacks.
Therefore, in this paper, we propose a programming-by-example framework, FlashRegex …
Regular expressions (regexes) are widely used in different fields of computer science such as programming languages, string processing and databases. However, existing tools for synthesizing or repairing regexes were not designed to be resilient to Regex Denial of Service (ReDoS) attacks. Specifically, if a regex has super-linear (SL) worst-case complexity, an attacker could provide carefully-crafted inputs to launch ReDoS attacks. Therefore, in this paper, we propose a programming-by-example framework, FlashRegex, for generating anti-ReDoS regexes by either synthesizing or repairing from given examples. It is the first framework that integrates regex synthesis and repair with the awareness of ReDoS-vulnerabilities. We present novel algorithms to deduce anti-ReDoS regexes by reducing the ambiguity of these regexes and by using Boolean Satisfiability (SAT) or Neighborhood Search (NS) techniques. We evaluate FlashRegex with five related state-of-the-art tools. The evaluation results show that our work can effectively and efficiently generate anti-ReDoS regexes from given examples, and also reveal that existing synthesis and repair tools have neglected ReDoS-vulnerabilities of regexes. Specifically, the existing synthesis and repair tools generated up to 394 ReDoS-vulnerable regex within few seconds to more than one hour, while FlashRegex generated no SL regex within around five seconds. Furthermore, the evaluation results on ReDoS-vulnerable regex repair also show that FlashRegex has better capability than existing repair tools and even human experts, achieving 4 more ReDoS-invulnerable regex after repair without trimming and resorting, highlighting the usefulness of FlashRegex in terms of the generality, automation and user-friendliness.
ACM Digital Library