Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create the Json Schema lexer using pygments #1

Merged
merged 13 commits into from
Feb 28, 2024

Conversation

sudo-jarvis
Copy link
Contributor

This pull request implements the JSON Schema lexer functionality discussed in bowtie-json-schema/bowtie#863.

@sudo-jarvis
Copy link
Contributor Author

Hi @Julian , please have a look.

@Julian
Copy link
Member

Julian commented Feb 27, 2024

I'll take a closer look at this in a bit but I'm surprised you went with the regex lexer rather than building on top of (I assume an existing) JSON one.

That probably makes it quite a bit harder to know when something isn't a keyword at all, which is why in your screenshot you had keywords being highlighted even though they were inside string descriptions.

Did you not find anything in pygments for building on the existing JSON stuff?

@sudo-jarvis
Copy link
Contributor Author

I was not able to find anything on directly building on top of some preexisting lexer. So I used the code from the JSON lexer and added the extra stuff to it myself. Anyways the JSON lexer was a pretty simple one and didnt had any concept of keyword, it just highlighted the key and value in the key value pair separately.

Ig I can fix the keywords being highlighted in the middle of the string, its probably due to double quotes inside double quotes. Removing the double quotes fixes it. So I think need to make a minor change in the regex for handling this edge case

@sudo-jarvis
Copy link
Contributor Author

Also, currently the list of keywords being used are static and are from the latest dialect. I am not sure how to support multiple dialects

@sudo-jarvis
Copy link
Contributor Author

This was the JSON lexer that I used for reference: https://github.com/orb/pygments-json

@Julian
Copy link
Member

Julian commented Feb 27, 2024

That project has no license! We definitely cannot use it or build on top of it without one!

@Julian
Copy link
Member

Julian commented Feb 27, 2024

Now that I'm back at a computer, yeah, what's wrong with the Pygments JSON lexer that made you look for something else? I would assume what we'd use should be quite similar to e.g. the JSON-LD lexer which builds on top of the JSON one.

@sudo-jarvis
Copy link
Contributor Author

That project has no license! We definitely cannot use it or build on top of it without one!

I am sorry, I wasnt aware of this

@sudo-jarvis
Copy link
Contributor Author

sudo-jarvis commented Feb 28, 2024

Now that I'm back at a computer, yeah, what's wrong with the Pygments JSON lexer that made you look for something else? I would assume what we'd use should be quite similar to e.g. the JSON-LD lexer which builds on top of the JSON one.

When I searched for the pygments JSON lexer, actually this result popped up at the top and I misunderstood it to be similar to the actual lexer that pygments uses and therefore I used that as a reference.

Should have looked in the pygments code itself, my bad.
I am sorry for that. Would ensure this doesnt happen again.

@sudo-jarvis
Copy link
Contributor Author

@Julian , you were right, working on top of the existing JSON lexer is much easier, just need to perform the list of keywords and stuff to it and it handles all the hardlifting.

Thanks a lot and sorry again for not picking this very approach in the beginning itself

@sudo-jarvis
Copy link
Contributor Author

With this too, the rest of the cases work fine however it misbehaves when encountering \" within a string.

Tried it with the metaschema for 2020-12 dialect:

1. JSON Schema lexer on top of JSON lexer:
Screenshot 2024-02-28 at 10 33 32 AM

2. Simple JSON lexer:
Screenshot 2024-02-28 at 10 33 44 AM

@Julian
Copy link
Member

Julian commented Feb 28, 2024

I am sorry, I wasnt aware of this

All fine, but definitely a good lesson to learn -- it's nearly the first thing you should do when you find a project. Otherwise you even can make your life complicated if you look at the code even if you don't directly use it as then someone could claim you were remembering it and regurgitating it when you wrote your own!

@Julian , you were right, working on top of the existing JSON lexer is much easier, just need to perform the list of keywords and stuff to it and it handles all the hardlifting.

Cool! OK glad to hear.

With this too, the rest of the cases work fine however it misbehaves when encountering " within a string.

Aha, OK -- that's not the worst I guess -- but regardless it's a pygments bug! You could perhaps check whether it's a known one (and even better you could try to send a PR to them, but just finding or filing an issue is certainly enough as far as I'm concerned), but no matter what I think we should move forward in this way, because whenever that bug is fixed it will be fixed for us too.

Anyways -- I think this looks really good! I'll merge it immediately so we can all play with it, and I have 50 ideas for more things we could do, (and I see @jdesrosiers left us some notes in #2, yay!) -- but can you also please have a look at adding a test for this? You probably will need to look at either the pygments docs or the pygments codebase to see how that's done.

But overall, very good!

@Julian Julian merged commit 6d7330b into python-jsonschema:main Feb 28, 2024
28 checks passed
Julian pushed a commit that referenced this pull request Mar 3, 2024
Merge main into create-lexer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants