Exploring regular expression comprehension

C Chapman, P Wang, KT Stolee - 2017 32nd IEEE/ACM …, 2017 - ieeexplore.ieee.org
C Chapman, P Wang, KT Stolee
2017 32nd IEEE/ACM International Conference on Automated Software …, 2017ieeexplore.ieee.org
The regular expression (regex) is a powerful tool employed in a large variety of software
engineering tasks. However, prior work has shown that regexes can be very complex and
that it could be difficult for developers to compose and understand them. This work seeks to
identify code smells that impact comprehension. We conduct an empirical study on 42 pairs
of behaviorally equivalent but syntactically different regexes using 180 participants and
evaluate the understandability of various regex language features. We further analyze …
The regular expression (regex) is a powerful tool employed in a large variety of software engineering tasks. However, prior work has shown that regexes can be very complex and that it could be difficult for developers to compose and understand them. This work seeks to identify code smells that impact comprehension. We conduct an empirical study on 42 pairs of behaviorally equivalent but syntactically different regexes using 180 participants and evaluate the understandability of various regex language features. We further analyze regexes in GitHub to find the community standards or the common usages of various features. We found that some regex expression representations are more understandable than others. For example, using a range (e.g., [0-9]) is often more understandable than a default character class (e.g., [\d]). We also found that the DFA size of a regex significantly affects comprehension for the regexes studied. The larger the DFA of a regex (up to size eight), the more understandable it was. Finally, we identify smelly and non-smelly regex representations based on a combination of community standards and understandability metrics.
ieeexplore.ieee.org