Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular expressions with POSIX character class [:ascii:] matches code points > 127 #4544

Closed
kipcole9 opened this issue Feb 23, 2021 · 0 comments · Fixed by #4551
Closed

Regular expressions with POSIX character class [:ascii:] matches code points > 127 #4544

kipcole9 opened this issue Feb 23, 2021 · 0 comments · Fixed by #4551
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Milestone

Comments

@kipcole9
Copy link

kipcole9 commented Feb 23, 2021

Description

  • According to the docs for the re module, the POSIX class ascii should match Character codes 0-127.

  • Examples show it matches characters 0..255. This is different behaviour to other PCRE-based regex implementations.

To Reproduce

Eshell V10.7  (abort with ^G)
1> re:run(<<"ü"/utf8>>, "[[:ascii:]]", [unicode]).
{match,[{0,2}]}
2> re:run(<<"ü">>, "[[:ascii:]]").
{match,[{0,1}]}
3> re:run(<<"ü"/utf8>>, "[[:ascii:]]").
{match,[{0,1}]}
4> re:run("ü", "[[:ascii:]]").         
{match,[{0,1}]}

Expected behaviour

The ascii POSIX character class should match only code points in the range 0..127 as documented and as consistent with other PCRE-based implementations. In these implementations, [[:ascii:]] is equivalent to \p{block=basic_latin} which is the range 0..127.

Environment

Reproduced on:

Erlang/OTP 22 [erts-10.7] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:1] [hipe]
Erlang/OTP 23 [erts-11.1.6] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [hipe]
@kipcole9 kipcole9 added the bug Issue is reported as a bug label Feb 23, 2021
@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Feb 23, 2021
@jhogberg jhogberg self-assigned this Feb 24, 2021
@jhogberg jhogberg added this to the OTP-24.0 milestone Feb 24, 2021
@jhogberg jhogberg linked a pull request Feb 25, 2021 that will close this issue
jhogberg added a commit that referenced this issue Feb 26, 2021
…ss-snafu/GH-4544/OTP-17222

re: Document [:ascii:] character class deficiency
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants