Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammar contains several invalid regex with lookbehind assertions that are not a fixed length #415

Closed
lildude opened this issue Sep 2, 2022 · 8 comments
Assignees
Labels

Comments

@lildude
Copy link

lildude commented Sep 2, 2022

Godot version

N/A

VS Code version

N/A

Godot Tools VS Code extension version

master

System information

N/A

Issue description

👋 I'm the lead maintainer of the https://github.com/github/linguist library which is used for language detection and providing the syntax highlighting for languages on GitHub.com, and we use this grammar.

Our grammar compiler has found several problems with the grammars which I thought I'd let you know about:

  • Invalid regex in grammar: source.gdscript (in syntaxes/GDScript.tmLanguage.json) contains a malformed regex (regex "\b(?<=for\s[\w]*\s)(in)\b": lookbehind assertion is not fixed length (at offset 18))
  • Invalid regex in grammar: source.gdshader (in syntaxes/GDShader.tmLanguage.json) contains a malformed regex (regex "(?<=[.]\s*)(?:[xyzw]{2,4}|[rgba]...": lookbehind assertion is not fixed length (at offset 10))
  • Invalid regex in grammar: source.gdshader (in syntaxes/GDShader.tmLanguage.json) contains a malformed regex (regex "(?<=[.]\s*)[a-zA-Z_]\w": lookbehind assertion is not fixed length (at offset 10))

Steps to reproduce

View each of the regexes in turn in regex101 and note the errors:

@lildude lildude added the bug label Sep 2, 2022
@Calinou
Copy link
Member

Calinou commented Sep 2, 2022

cc @DaelonSuzuka

PS: I didn't know this repository was used as a basis for Linguist. This is good to know, as we'll need to mention this in the contributing guidelines since it has a few implications.

@lildude
Copy link
Author

lildude commented Sep 2, 2022

We've been using it for GDScript since 2020 thanks to github-linguist/linguist#5000 😁

@DaelonSuzuka
Copy link
Collaborator

Thanks for reporting this. I'll try to look into these issues this weekend.

@lildude Is there documentation somewhere for how I can run your grammar compiler myself, so I can try and catch issues sooner?

@lildude
Copy link
Author

lildude commented Sep 5, 2022

@lildude Is there documentation somewhere for how I can run your grammar compiler myself, so I can try and catch issues sooner?

We don't have any docs other than the CONTRIBUTING.md for adding/replacing grammars and the steps I take when putting together a release.

The compiler doesn't have functionality to check just a specific grammar, but you can effectively do this by attempting to add it. This will attempt to compile it and report any problems. To do this, clone the repo and run script/bootstrap to download and all the grammars. You can then check an individual grammar using: script/grammar-compiler add vendor/grammars/godot-vscode-plugin/:

$ script/grammar-compiler add vendor/grammars/godot-vscode-plugin/
latest: Pulling from linguist/grammar-compiler
Digest: sha256:203514bbf04b83d3594245338fc08097301e54139ccec20ab05861302c23ab64
Status: Image is up to date for linguist/grammar-compiler:latest
docker.io/linguist/grammar-compiler:latest
3 errors found in new grammar 'repository `vendor/grammars/godot-vscode-plugin/` (from https://github.com/godotengine/godot-vscode-plugin)':
- Invalid regex in grammar: `source.gdscript` (in `syntaxes/GDScript.tmLanguage.json`) contains a malformed regex (regex "`\b(?<=for\s[\w]*\s)(in)\b`": lookbehind assertion is not fixed length (at offset 18))
- Invalid regex in grammar: `source.gdshader` (in `syntaxes/GDShader.tmLanguage.json`) contains a malformed regex (regex "`(?<=[.]\s*)[a-zA-Z_]\w`": lookbehind assertion is not fixed length (at offset 10))
- Invalid regex in grammar: `source.gdshader` (in `syntaxes/GDShader.tmLanguage.json`) contains a malformed regex (regex "`(?<=[.]\s*)(?:[xyzw]{2,4}|[rgba]`...": lookbehind assertion is not fixed length (at offset 10))

Compilation failed. Aborting

$

The compiler is really simple and can be found in the github/linguist repo here. It basically checks the grammar is using valid PCRE regexes (GitHub uses PCRE instead of oniguruma for performance reasons) and then "compiles" them into JSON or protobuf files.

@DaelonSuzuka
Copy link
Collaborator

DaelonSuzuka commented Sep 5, 2022

That's fantastic, thanks for the write-up. When I get these fixes merged, I'll comment in github-linguist/linguist#3924 to let you know.

@DaelonSuzuka
Copy link
Collaborator

To do this, clone the repo and run script/bootstrap to download and all the grammars. You can then check an individual grammar using: script/grammar-compiler add vendor/grammars/godot-vscode-plugin/:

It might be worth mentioning to install all of linguist's dependencies, including bundler, before trying to run the bootstrap script. I'm dumb enough that it took me about ten minutes to figure that out.

@DaelonSuzuka
Copy link
Collaborator

DaelonSuzuka commented Dec 21, 2022

This issue is resolved by PR #416, but I don't believe everything has been propagated through Linguist to Github itself, yet.

@lildude should I create an issue over at linguist to try and track this?

@lildude
Copy link
Author

lildude commented Dec 21, 2022

No. Please don't. It'll happen when it happens as per the docs here and here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants