Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peggy can't parse unmatched braces in embedded JavaScript #345

Closed
gamesaucer opened this issue Feb 22, 2023 · 3 comments
Closed

Peggy can't parse unmatched braces in embedded JavaScript #345

gamesaucer opened this issue Feb 22, 2023 · 3 comments

Comments

@gamesaucer
Copy link

Steps to reproduce: Parse the following grammar:

Start = .* { return /}/, "}" /* } */ }

Expected result: It is a valid Peggy grammar, since it consists of a valid rule and the action is valid JavaScript.
Actual result: It is not a valid Peggy grammar.

The issue here is that Peggy assumes code contains matching braces. That sounds reasonable, but it does not always hold. In particular, unmatched braces can appear in strings, regexes, or comments. Unless Peggy can handle these edge cases, there will be valid JavaScript that it will not accept in initialisers, actions and predicates.

I'm happy to contribute code to fix this. If it's decided that it won't be fixed, I want to at least suggest that this is added as a limitation in the header comment of the Peggy grammar.

@hildjj
Copy link
Contributor

hildjj commented Feb 22, 2023

This is documented behavior. See https://peggyjs.org/documentation.html#grammar-syntax-and-semantics-parsing-expression-types and look for "action" and "predicate".

In order to fix this, I think we'd need a full JS grammar embedded in the core peggy grammar, as far as I can tell. That would definitely have a negative impact on size and performance, and would be an ongoing maintenance hassle as ECMA-262 keeps adding features.

There is a work-around, which is to add /*{{{*/ before your return statement in your example.

@gamesaucer
Copy link
Author

There doesn't need to be a full JS grammar. Peggy shouldn't be concerned with whether the JS is valid, just with whether the braces appear inside or outside symbols where they're allowed to be unmatched. For strings + comments, something like this should be sufficient, I think:

Code
  = $CodePart*

CodePart
  = "{" Code "}"
  / CodeStringLiteral
  / Comment
  / (!([{}'"`] / "//" / "/*" ) SourceCharacter)+

CodeStringLiteral
  = StringLiteral 
  / "`" TickStringCharacter* "`"

TickStringCharacter
  = "${" Code "}"
  / "\\" SourceCharacter
  / !"`" SourceCharacter

I haven't performance-tested this, though.

@hildjj
Copy link
Contributor

hildjj commented Feb 22, 2023

Regardless, this is a dup of #238. Let's continue discussion there.

@hildjj hildjj closed this as completed Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants