Fix failure found by TestOperations.testGetRandomAcceptedString #14227

rmuir · 2025-02-12T22:48:39Z

string: ?+½]+]+Ř*+[\]ᖴﴁ.

expected: before #14193

java.lang.IllegalArgumentException: expected ']' at position 17

actual: after #14193

REGEXP_CONCATENATION
  REGEXP_CONCATENATION
    REGEXP_CONCATENATION
      REGEXP_CONCATENATION
        REGEXP_CONCATENATION
          REGEXP_CONCATENATION
            REGEXP_CONCATENATION
              REGEXP_CONCATENATION
                REGEXP_REPEAT_MIN min=1
                  REGEXP_CHAR char=?
                REGEXP_CHAR char=½
              REGEXP_REPEAT_MIN min=1
                REGEXP_CHAR char=]
            REGEXP_CHAR char=
          REGEXP_REPEAT_MIN min=1
            REGEXP_CHAR char=]
        REGEXP_REPEAT_MIN min=1
          REGEXP_REPEAT
            REGEXP_CHAR char=Ř
      REGEXP_CHAR_CLASS starts=[] ends=[]
    REGEXP_STRING string=ᖴﴁ
  REGEXP_ANYCHAR

Problem is caused by RegExp accepting too much rather than throwing exceptions like it should have. The lenience in the parser comes from expandPreDefined() which invades on escape character parsing for character classes (e.g. \s). This one adds a lot of complexity to parsing.

Don't invoke expandPreDefined(), except for the set of characters that it explicitly handles. This is also consistent with the way expandPreDefined()'s complexity is managed elsewhere in the parser, such as in parseSimpleExp().

Add parsing tests for testEmptyClass(), which is unchanged by this PR, but should be there, and testEscapedInvalidClass(), which fails without the change.

Closes #14224

string: ?+½]+]+Ř*+[\]ᖴﴁ. expected: before apache#14193 java.lang.IllegalArgumentException: expected ']' at position 17 actual: after apache#14193 REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_REPEAT_MIN min=1 REGEXP_CHAR char=? REGEXP_CHAR char=½ REGEXP_REPEAT_MIN min=1 REGEXP_CHAR char=] REGEXP_CHAR char= REGEXP_REPEAT_MIN min=1 REGEXP_CHAR char=] REGEXP_REPEAT_MIN min=1 REGEXP_REPEAT REGEXP_CHAR char=Ř REGEXP_CHAR_CLASS starts=[] ends=[] REGEXP_STRING string=ᖴﴁ REGEXP_ANYCHAR Problem is caused by RegExp accepting too much rather than throwing exceptions like it should have. The lenience in the parser comes from "expandPreDefined" which invades on escape character parsing for character classes (e.g. \s). This one adds a lot of complexity to parsing. Don't invoke expandPreDefined(), except for the set of characters that it explicitly handles. This is also consistent with the way expandPreDefined()'s complexity is managed elsewhere in the parser, such as in parseSimpleExp(). Add parsing tests for testEmptyClass(), which is unchanged by this PR, but should be there, and testEscapedInvalidClass(), which fails without the change.

When we consume an escape, either it matches the predefined special logic, or we emit range for next(), no other logic: it is an escape. Add test for escaped '-' in a character class.

* Fix failure found by TestOperations.testGetRandomAcceptedString string: ?+½]+]+Ř*+[\]ᖴﴁ. expected: before #14193 java.lang.IllegalArgumentException: expected ']' at position 17 actual: after #14193 REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_REPEAT_MIN min=1 REGEXP_CHAR char=? REGEXP_CHAR char=½ REGEXP_REPEAT_MIN min=1 REGEXP_CHAR char=] REGEXP_CHAR char= REGEXP_REPEAT_MIN min=1 REGEXP_CHAR char=] REGEXP_REPEAT_MIN min=1 REGEXP_REPEAT REGEXP_CHAR char=Ř REGEXP_CHAR_CLASS starts=[] ends=[] REGEXP_STRING string=ᖴﴁ REGEXP_ANYCHAR Problem is caused by RegExp accepting too much rather than throwing exceptions like it should have. The lenience in the parser comes from "expandPreDefined" which invades on escape character parsing for character classes (e.g. \s). This one adds a lot of complexity to parsing. Don't invoke expandPreDefined(), except for the set of characters that it explicitly handles. This is also consistent with the way expandPreDefined()'s complexity is managed elsewhere in the parser, such as in parseSimpleExp(). Add parsing tests for testEmptyClass(), which is unchanged by this PR, but should be there, and testEscapedInvalidClass(), which fails without the change. When we consume an escape, either it matches the predefined special logic, or we emit range for next(), no other logic: it is an escape. Add test for escaped '-' in a character class.

github-actions bot added the module:core/other label Feb 12, 2025

rmuir mentioned this pull request Feb 12, 2025

TestOperations.testGetRandomAcceptedString failing #14224

Closed

Improve escaping in character classes

67339be

When we consume an escape, either it matches the predefined special logic, or we emit range for next(), no other logic: it is an escape. Add test for escaped '-' in a character class.

dweiss approved these changes Feb 13, 2025

View reviewed changes

rmuir merged commit 19be0e5 into apache:main Feb 13, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix failure found by TestOperations.testGetRandomAcceptedString #14227

Fix failure found by TestOperations.testGetRandomAcceptedString #14227

rmuir commented Feb 12, 2025

Fix failure found by TestOperations.testGetRandomAcceptedString #14227

Fix failure found by TestOperations.testGetRandomAcceptedString #14227

Conversation

rmuir commented Feb 12, 2025