Exploring regular expression usage and context in Python

C Chapman, KT Stolee - … of the 25th International Symposium on …, 2016 - dl.acm.org
C Chapman, KT Stolee
Proceedings of the 25th International Symposium on Software Testing and Analysis, 2016dl.acm.org
Due to the popularity and pervasive use of regular expressions, researchers have created
tools to support their creation, validation, and use. However, little is known about the context
in which regular expressions are used, the features that are most common, and how
behaviorally similar regular expressions are to one another. In this paper, we explore the
context in which regular expressions are used through a combination of developer surveys
and repository analysis. We survey 18 professional developers about their regular …
Due to the popularity and pervasive use of regular expressions, researchers have created tools to support their creation, validation, and use. However, little is known about the context in which regular expressions are used, the features that are most common, and how behaviorally similar regular expressions are to one another.
In this paper, we explore the context in which regular expressions are used through a combination of developer surveys and repository analysis. We survey 18 professional developers about their regular expression usage and pain points. Then, we analyze nearly 4,000 open source Python projects from GitHub and extract nearly 14,000 unique regular expression patterns. We map the most common features used in regular expressions to those features supported by four major regex research efforts from industry and academia: brics, Hampi, RE2, and Rex. Using similarity analysis of regular expressions across projects, we identify six common behavioral clusters that describe how regular expressions are often used in practice. This is the first rigorous examination of regex usage and it provides empirical evidence to support design decisions by regex tool builders. It also points to areas of needed future work, such as refactoring regular expressions to increase regex understandability and context-specific tool support for common regex usages.
ACM Digital Library