Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(parser): fix lexing escape sequences in block strings #638

Merged
merged 2 commits into from
Aug 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions crates/apollo-parser/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,17 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm

## Documentation -->

# [0.6.1](https://crates.io/crates/apollo-parser/0.6.1) (unreleased)
## Fixes
- **fix lexing escape-sequence-like text in block strings - [goto-bus-stop], [pull/638], [issue/632]**
Fixes a regression in 0.6.0 that could cause apollo-parser to reject valid input if a
block string contained backslashes. Block strings do not support escape sequences so
backslashes are normally literal, but 0.6.0 tried to lex them as escape sequences,
which could be invalid (eg. `\W` is not a supported escape sequence).

Now block strings are lexed like in 0.5.3. Only the `\"""` sequence is treated as an
escape sequence.

# [0.6.0](https://crates.io/crates/apollo-parser/0.6.0) - 2023-08-18
## Features
- **zero-alloc lexer - [allancalix], [pull/322]**
Expand Down
103 changes: 65 additions & 38 deletions crates/apollo-parser/src/lexer/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@ enum State {
StringLiteralEscapedUnicode(usize),
StringLiteral,
StringLiteralStart,
BlockStringLiteralEscapedUnicode(usize),
BlockStringLiteral,
BlockStringLiteralBackslash,
StringLiteralBackslash,
Expand Down Expand Up @@ -308,33 +307,6 @@ impl<'a> Cursor<'a> {
continue;
}
},
State::BlockStringLiteralEscapedUnicode(remaining) => match c {
'"' => {
self.add_err(Error::new(
"incomplete unicode escape sequence",
c.to_string(),
));
token.data = self.current_str();
state = State::Done;

break;
}
c if !c.is_ascii_hexdigit() => {
self.add_err(Error::new("invalid unicode escape sequence", c.to_string()));
state = State::BlockStringLiteral;

continue;
}
_ => {
if remaining <= 1 {
state = State::BlockStringLiteral;

continue;
}

state = State::BlockStringLiteralEscapedUnicode(remaining - 1)
}
},
State::StringLiteralEscapedUnicode(remaining) => match c {
'"' => {
self.add_err(Error::new(
Expand Down Expand Up @@ -379,19 +351,17 @@ impl<'a> Cursor<'a> {
},
State::BlockStringLiteralBackslash => match c {
'"' => {
while self.eatc('"') {}
// If this is \""", we need to eat 3 in total, and then continue parsing.
// The lexer does not un-escape escape sequences so it's OK
// if we take this path for \"", even if that is technically not an escape
// sequence.
if self.eatc('"') {
self.eatc('"');
}
Copy link
Member Author

@goto-bus-stop goto-bus-stop Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a test below to make sure """\"""""" is correctly lexed as a block string containing only \""". Previously it would eat too many ".


state = State::BlockStringLiteral;
}
curr if is_escaped_char(curr) => {
state = State::BlockStringLiteral;
}
'u' => {
state = State::BlockStringLiteralEscapedUnicode(4);
}
_ => {
self.add_err(Error::new("unexpected escaped character", c.to_string()));

state = State::BlockStringLiteral;
}
},
Expand Down Expand Up @@ -520,7 +490,7 @@ impl<'a> Cursor<'a> {
curr.to_string(),
))
}
State::StringLiteral => {
State::StringLiteral | State::BlockStringLiteral => {
let curr = self.drain();

Err(Error::with_loc(
Expand Down Expand Up @@ -673,4 +643,61 @@ type Query {

assert_eq!(schema, processed_schema);
}

#[test]
fn quoted_block_comment() {
let input = r#"
"""
Not an escape character:
'/\W/'
Escape character:
\"""
\"""\"""
Not escape characters:
\" \""
Escape character followed by a quote:
\""""
"""
"#;

let (tokens, errors) = Lexer::new(input).lex();
assert!(errors.is_empty());
// The token data should be literally the source text.
assert_eq!(
tokens[1].data,
r#"
"""
Not an escape character:
'/\W/'
Escape character:
\"""
\"""\"""
Not escape characters:
\" \""
Escape character followed by a quote:
\""""
"""
"#
.trim(),
);

let input = r#"
# String contents: """
"""\""""""
# Unclosed block string
"""\"""
"#;
let (tokens, errors) = Lexer::new(input).lex();
assert_eq!(tokens[3].data, r#""""\"""""""#);
assert_eq!(
errors,
&[Error::with_loc(
"unterminated string value",
r#""""\"""
"#
.to_string(),
59,
)]
);
}
}