-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raw string literal testing disagrees with documentation #14
Comments
While the example from the specification you are referring to is indeed incorrect, the spirit of the
I think you have addressed this point in the following pull-request. However I do not see any conflicting examples in the tests/literal.json you linked to. May I suggest that you are incorrectly interpreting the example you are referring to which in fact, does not contain any backslash character at all, but indeed contains a linefeed? 🤔
|
That makes sense to me, but the second bullet point is contradicted by the last test case at jmespath.test/tests/literal.json Lines 183 to 197 in aa6fb5f
{
- "comment": "Backslash not followed by single quote is treated as any other character",
+ "comment": "Can escape backslash",
"expression": "'\\\\'",
- "result": "\\\\"
+ "result": "\\"
}
I don't think so, and I'll demonstrate using your table:
|
I realize that control characters are not currently specified as valid for Now I understand the last example as well. The JMESPath expression is a I agree with your assessment. |
Should we really allow control characters in Like: raw-string = "'" *raw-string-char "'"
raw-string-char = (%x00-26 / %x28-5B / %x5D-10FFFF) / preserved-escape / raw-string-escape
preserved-escape = escape (%x00-26 / %x28-5B / %x5D-10FFFF)
raw-string-escape = escape ("'" / escape) That feels somehow... Uncanny 😲 |
Or maybe just a subset of those supported as valid escape sequences in JSON: raw-string = "'" *raw-string-char "'"
raw-string-char = (%x08-09 / %x0A / %x0C-0D / %x20-26 / %x28-5B / %x5D-10FFFF) / preserved-escape / raw-string-escape
preserved-escape = escape (%x08-09 / %x0A / %x0C-0D / %x20-26 / %x28-5B / %x5D-10FFFF)
raw-string-escape = escape ("'" / escape) |
I would prefer to prohibit C0 control characters in raw strings, but if any are allowed then they all should be (and I don't know your comfort level with backwards-incompatible changes to the test suite, even if they are already supported by the formal grammar). Alternatively, if it is a goal for all possible strings to be representable as raw literals (which also implies interpretation of |
I agree to forbid usage of control character in JMESPath expressions. |
Oh, thanks for digging that up! Its examples actually contain an explicit rebuttal:
And even more usefully, the "Disallow single quotes in a raw string" alternative documents its intent to fully subsume JSON string literals:
The JEP-12 proposed grammar was missing an escape for All things considered, I think we have a clear picture that raw string literals were intended to fully subsume the expressiveness of JSON string backtick literals and treat backslashes literally except for the two specific cases of
Footnotes
|
This makes me realize that JEP-12 while having been accepted is actually incomplete. Thanks for pointing out all the subequent commits that collectively allude to the thought process at the time. I think an addendum is required for JEP-12 |
However, allowing So I’m starting to be convinced that the raw-string = "'" *raw-string-char "'"
; The first grouping matches any character other than "'" and "\"
raw-string-char = (%x00-26 / %x28-5B / %x5D-10FFFF) / preserved-escape /raw-string-escape
; The second grouping matches any character other than "'"
preserved-escape = escape (%x20-26 / %x28-10FFFF)
raw-string-escape = escape "'" Making those expressions valid
That said, I’m not comfortable allowing C0 control characters in a |
What would be the point of a
(which in the absence of special treatment for
Maybe not all of them, but remember that tab and line feed are included in that set. And it's also not too hard to get text containing ANSI escape sequences (which start with 0x1B). |
OK, I agree that the current grammar is consistent with the last example from that page:
That example clearly illustrates usage of two consecutive backslash characters to escape a single resulting backslash character in the output string. |
To be clear, I’m explicitly referring to exanding the set of characters from I can’t see a good use case for making an exception here in raw string and not in, say |
I can make that case: |
I rest my case your honor 😏 |
For the record, to fix this issue, the specification needs to be updated to: raw-string = "'" *raw-string-char "'"
raw-string-char = (%x00-26 / %x28-5B / %x5D-10FFFF) / preserved-escape / raw-string-escape
preserved-escape = escape (%x00-26 / %x28-5B / %x5D-10FFFF)
raw-string-escape = escape ("'" / escape) I must admit that I do not understand the purpose of the |
I think I can explain it... the
|
I think this issue has been solved. |
jmespath.site Grammar defines the grammar for raw string literals like
and Raw String Literals includes a
search('\\', "") -> "\\"
example implying that\\
is an escape sequence representing a single U+005C REVERSE SOLIDUS just like\'
is an escape sequence representing a single U+0027 APOSTROPHE.However, tests/literal.json in this repository includes test cases contradicting the above, such as
{ "expression": "'\n'", "result": "\n" }
(a raw string literal including a U+000A LINE FEED, which is not covered byraw-string-char
) and{ "comment": "Backslash not followed by single quote is treated as any other character", "expression": "'\\\\'", "result": "\\\\" }
(added in 2016 by c0f7923).Both disagreements ultimately affect whether or not raw string literals are capable of representing all possible sequences of code points (respectively those including C0 control characters and those including U+005C REVERSE SOLIDUS), and the latter is especially odd—I've never heard of any other escaping approach in which the escape prefix itself is unrepresentable.
The text was updated successfully, but these errors were encountered: