The impact of regular expression denial of service (ReDoS) in practice: an empirical study at the ecosystem scale

JC Davis, CA Coghlan, F Servant, D Lee - … of the 2018 26th ACM joint …, 2018 - dl.acm.org
Proceedings of the 2018 26th ACM joint meeting on european software …, 2018dl.acm.org
Regular expressions (regexes) are a popular and powerful means of automatically
manipulating text. Regexes are also an understudied denial of service vector (ReDoS). If a
regex has super-linear worst-case complexity, an attacker may be able to trigger this
complexity, exhausting the victim's CPU resources and causing denial of service. Existing
research has shown how to detect these superlinear regexes, and practitioners have
identified super-linear regex anti-pattern heuristics that may lead to such complexity. In this …
Regular expressions (regexes) are a popular and powerful means of automatically manipulating text. Regexes are also an understudied denial of service vector (ReDoS). If a regex has super-linear worst-case complexity, an attacker may be able to trigger this complexity, exhausting the victim’s CPU resources and causing denial of service. Existing research has shown how to detect these superlinear regexes, and practitioners have identified super-linear regex anti-pattern heuristics that may lead to such complexity.
In this paper, we empirically study three major aspects of ReDoS that have hitherto been unexplored: the incidence of super-linear regexes in practice, how they can be prevented, and how they can be repaired. In the ecosystems of two of the most popular programming languages — JavaScript and Python – we detected thousands of super-linear regexes affecting over 10,000 modules across diverse application domains. We also found that the conventional wisdom for super-linear regex anti-patterns has few false negatives but many false positives; these anti-patterns appear to be necessary, but not sufficient, signals of super-linear behavior. Finally, we found that when faced with a super-linear regex, developers favor revising it over truncating input or developing a custom parser, regardless of whether they had been shown examples of all three fix strategies. These findings motivate further research into ReDoS, since many modules are vulnerable to it and existing mechanisms to avoid it are insufficient. We believe that ReDoS vulnerabilities are a larger threat in practice than might have been guessed.
ACM Digital Library