-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SRE ignores the ASCII flag on character ranges with non-BMP upper bound #126505
Comments
I think this could be caused by this line: Line 301 in a1c57bc
When the pattern is being compiled in |
cc @serhiy-storchaka as the RE expert |
Thank you for your report @jirkamarsik. Actually, this is a more complex issue. Currently, the result in the following examples does not depend on the upper bound if it is larger enough. >>> import re
>>> re.match(r'[N-\uffff]', 'A', re.I|re.A)
<re.Match object; span=(0, 1), match='A'>
>>> re.match(r'[n-\uffff]', 'Z', re.I|re.A)
<re.Match object; span=(0, 1), match='Z'>
>>> re.match(r'[N-\U00010000]', 'A', re.I|re.A)
<re.Match object; span=(0, 1), match='A'>
>>> re.match(r'[n-\U00010000]', 'Z', re.I|re.A)
<re.Match object; span=(0, 1), match='Z'> But with the proposed fix the last two matches will return None. I am working on this. |
I have found also other bug: |
…sses * upper-case non-BMP character was ignored * the ASCII flag was ignored when matching a character range whose upper bound is beyond the BMP region
…H-126557) * upper-case non-BMP character was ignored * the ASCII flag was ignored when matching a character range whose upper bound is beyond the BMP region
…sses (pythonGH-126557) * upper-case non-BMP character was ignored * the ASCII flag was ignored when matching a character range whose upper bound is beyond the BMP region (cherry picked from commit 819830f) Co-authored-by: Serhiy Storchaka <[email protected]>
…sses (pythonGH-126557) * upper-case non-BMP character was ignored * the ASCII flag was ignored when matching a character range whose upper bound is beyond the BMP region (cherry picked from commit 819830f) Co-authored-by: Serhiy Storchaka <[email protected]>
…asses (GH-126557) (GH-126690) * upper-case non-BMP character was ignored * the ASCII flag was ignored when matching a character range whose upper bound is beyond the BMP region (cherry picked from commit 819830f) Co-authored-by: Serhiy Storchaka <[email protected]>
…asses (GH-126557) (GH-126689) * upper-case non-BMP character was ignored * the ASCII flag was ignored when matching a character range whose upper bound is beyond the BMP region (cherry picked from commit 819830f) Co-authored-by: Serhiy Storchaka <[email protected]>
Bug report
Bug description:
It seems like SRE ignores the ASCII flag when parsing a character range whose upper bound is beyond the BMP region:
CPython versions tested on:
3.12
Operating systems tested on:
Linux
Linked PRs
The text was updated successfully, but these errors were encountered: