Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failure found by TestOperations.testGetRandomAcceptedString #14227

Merged
merged 2 commits into from
Feb 13, 2025

Conversation

rmuir
Copy link
Member

@rmuir rmuir commented Feb 12, 2025

string: ?+½]+]+Ř*+[\]ᖴﴁ.

expected: before #14193

java.lang.IllegalArgumentException: expected ']' at position 17

actual: after #14193

REGEXP_CONCATENATION
  REGEXP_CONCATENATION
    REGEXP_CONCATENATION
      REGEXP_CONCATENATION
        REGEXP_CONCATENATION
          REGEXP_CONCATENATION
            REGEXP_CONCATENATION
              REGEXP_CONCATENATION
                REGEXP_REPEAT_MIN min=1
                  REGEXP_CHAR char=?
                REGEXP_CHAR char=½
              REGEXP_REPEAT_MIN min=1
                REGEXP_CHAR char=]
            REGEXP_CHAR char=
          REGEXP_REPEAT_MIN min=1
            REGEXP_CHAR char=]
        REGEXP_REPEAT_MIN min=1
          REGEXP_REPEAT
            REGEXP_CHAR char=Ř
      REGEXP_CHAR_CLASS starts=[] ends=[]
    REGEXP_STRING string=ᖴﴁ
  REGEXP_ANYCHAR

Problem is caused by RegExp accepting too much rather than throwing exceptions like it should have. The lenience in the parser comes from expandPreDefined() which invades on escape character parsing for character classes (e.g. \s). This one adds a lot of complexity to parsing.

Don't invoke expandPreDefined(), except for the set of characters that it explicitly handles. This is also consistent with the way expandPreDefined()'s complexity is managed elsewhere in the parser, such as in parseSimpleExp().

Add parsing tests for testEmptyClass(), which is unchanged by this PR, but should be there, and testEscapedInvalidClass(), which fails without the change.

Closes #14224

string: ?+½]+]+Ř*+[\]ᖴﴁ.

expected: before apache#14193
java.lang.IllegalArgumentException: expected ']' at position 17

actual: after apache#14193
REGEXP_CONCATENATION
  REGEXP_CONCATENATION
    REGEXP_CONCATENATION
      REGEXP_CONCATENATION
        REGEXP_CONCATENATION
          REGEXP_CONCATENATION
            REGEXP_CONCATENATION
              REGEXP_CONCATENATION
                REGEXP_REPEAT_MIN min=1
                  REGEXP_CHAR char=?
                REGEXP_CHAR char=½
              REGEXP_REPEAT_MIN min=1
                REGEXP_CHAR char=]
            REGEXP_CHAR char=
          REGEXP_REPEAT_MIN min=1
            REGEXP_CHAR char=]
        REGEXP_REPEAT_MIN min=1
          REGEXP_REPEAT
            REGEXP_CHAR char=Ř
      REGEXP_CHAR_CLASS starts=[] ends=[]
    REGEXP_STRING string=ᖴﴁ
  REGEXP_ANYCHAR

Problem is caused by RegExp accepting too much rather than throwing
exceptions like it should have. The lenience in the parser comes from
"expandPreDefined" which invades on escape character parsing for character
classes (e.g. \s). This one adds a lot of complexity to parsing.

Don't invoke expandPreDefined(), except for the set of characters that it
explicitly handles. This is also consistent with the way
expandPreDefined()'s complexity is managed elsewhere in the parser, such
as in parseSimpleExp().

Add parsing tests for testEmptyClass(), which is unchanged by this PR,
but should be there, and testEscapedInvalidClass(), which fails without
the change.
When we consume an escape, either it matches the predefined special
logic, or we emit range for next(), no other logic: it is an escape.

Add test for escaped '-' in a character class.
@rmuir rmuir merged commit 19be0e5 into apache:main Feb 13, 2025
6 checks passed
asfgit pushed a commit that referenced this pull request Feb 13, 2025
* Fix failure found by TestOperations.testGetRandomAcceptedString

string: ?+½]+]+Ř*+[\]ᖴﴁ.

expected: before #14193
java.lang.IllegalArgumentException: expected ']' at position 17

actual: after #14193
REGEXP_CONCATENATION
  REGEXP_CONCATENATION
    REGEXP_CONCATENATION
      REGEXP_CONCATENATION
        REGEXP_CONCATENATION
          REGEXP_CONCATENATION
            REGEXP_CONCATENATION
              REGEXP_CONCATENATION
                REGEXP_REPEAT_MIN min=1
                  REGEXP_CHAR char=?
                REGEXP_CHAR char=½
              REGEXP_REPEAT_MIN min=1
                REGEXP_CHAR char=]
            REGEXP_CHAR char=
          REGEXP_REPEAT_MIN min=1
            REGEXP_CHAR char=]
        REGEXP_REPEAT_MIN min=1
          REGEXP_REPEAT
            REGEXP_CHAR char=Ř
      REGEXP_CHAR_CLASS starts=[] ends=[]
    REGEXP_STRING string=ᖴﴁ
  REGEXP_ANYCHAR

Problem is caused by RegExp accepting too much rather than throwing
exceptions like it should have. The lenience in the parser comes from
"expandPreDefined" which invades on escape character parsing for character
classes (e.g. \s). This one adds a lot of complexity to parsing.

Don't invoke expandPreDefined(), except for the set of characters that it
explicitly handles. This is also consistent with the way
expandPreDefined()'s complexity is managed elsewhere in the parser, such
as in parseSimpleExp().

Add parsing tests for testEmptyClass(), which is unchanged by this PR,
but should be there, and testEscapedInvalidClass(), which fails without
the change.

When we consume an escape, either it matches the predefined special
logic, or we emit range for next(), no other logic: it is an escape.

Add test for escaped '-' in a character class.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestOperations.testGetRandomAcceptedString failing
2 participants