-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Edge cases and Gotchas #2
Comments
@jdesrosiers , the current implementation as in #1 , is supporting only the latest dialect, not multiple dialects or previous dialects. Any idea how to dynamically fetch the keywords for each dialect ? |
There isn't a convenient list anywhere you can just fetch. You'll need to build the lists yourself from the spec or meta-schemas or whatever other source you can find. |
The simple list of keywords is something that my plan is probably to eventually live in the jsonschema-specifications project, which essentially represents "give me the JSON Schema specifications in Python at runtime". But that plan includes also writing type annotations for them, so it's a bit medium term. For now simply copying / writing them down is the right thing. |
(Oh and definitely awesome! Thanks again Jason for sharing your learnings!) |
@Julian To add support for multiple schemas what we could do is that once the lexer gives us a list of tokens we can iterate from left to right and maintain a stack using which we will find for each keyword which is its nearest We'll fill the stack with each token and once we encounter a Then once we know it we'll check the dict of that particular schema if the token is to be treated as a keyword or not. |
Does pygments's JSON lexer not already handle the recursion? It presumably must, since it's noticing when an object literal is being encountered, so the stack you're talking about must already be there. "All" we should have to do is intercept that object literal parsing once it's done, look at the |
@Julian, Yes you are right we can do that when the whole document has the same schema. However, I was talking about the case when say in the outer object we have As mentioned by @jdesrosiers here:
|
Yes I know that bit of course, but I forgot Pygments doesn't do any AST parsing, just a flat list of tokens, so it doesn't tell us where objects start and end... OK, that's unfortunate, but what you say sounds fine then. And you can get the list of keywords for each dialect by adding a dependency on |
It's a little more complicated than that. {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"prefixItems": [true],
"additionalItems": false, // <- not a keyword
"items": false,
"$defs": {
"foo": {
"$schema": "http://json-schema.org/draft-07/schema#", // <- no $id, so this keyword has no effect
"prefixItems": [true],
"additionalItems": false, // <- not a keyword
"items": false,
"definitions": {} // <- not a keyword
}
}
} Keep in mind that you can't just look for {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"prefixItems": [true],
"additionalItems": false, // <- not a keyword
"items": false,
"$defs": {
"foo": {
"$schema": "http://json-schema.org/draft-04/schema#", // <- no id, so this keyword has no effect
"$id": "https://example.com/schema/embedded", // <- $id doesn't apply for draft-04
"prefixItems": [true],
"additionalItems": false, // <- not a keyword
"items": false,
"definitions": {} // <- not a keyword
}
}
} Unfortunately, there's an ambiguous situation that you're going to have to figure out how to deal with. Imagine that {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"prefixItems": [true],
"additionalItems": false, // <- not a keyword
"items": false,
"$defs": {
"foo": {
"$schema": "https://example.com/unknown-dialect",
"$id": "https://example.com/schema/embedded", // <- is this an identifier or not?
"prefixItems": [true], // <- is this a keyword or not?
"additionalItems": false, // <- is this a keyword or not?
"items": false, // <- is this a keyword or not?
"definitions": {} // <- is this a keyword or not?
}
}
} If you don't understand the dialect, you don't know what keyword is used for identifying a schema resource. Therefore, it's ambiguous whether {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"prefixItems": [true],
"additionalItems": false, // <- not a keyword
"items": false,
"$defs": {
"foo": {
"$schema": "https://example.com/unknown-dialect",
"$id": "https://example.com/schema/embedded", // <- not a keyword
"prefixItems": [true], // <- not a keyword
"additionalItems": false, // <- not a keyword
"items": false, // <- not a keyword
"definitions": {} // <- not a keyword
}
}
} |
@jdesrosiers , So basically first we need to look at the dialect, and then that dialect would specify if |
Correct, but don't forget to also handle the case where you don't know the dialect that's specified (the ambiguous situation described in my last comment). |
Here are a few things to look out for when implementing something like this.
The set of properties that are considered keywords depends on the dialect
In the following example,
additionalItems
should not be highlighted as a keyword because it was removed in 2020-12.When we change the dialect, the properties that are considered keywords changes.
Properties are only keywords inside schemas
Not every object in a JSON Schema document is a schema, so you need to know when you're in a schema and when you're not. Here are a couple examples.
In the next example,
$id
isn't considered a keyword becausedefinitions
isn't a keyword in 2020-12. Therefore, their values aren't schemas and the properties of those values shouldn't be considered keywords.Embedded schemas can have a different dialect
It's possible for embedded schemas to have a different dialect than their parent schema. In the following example, the same keywords are highlighted differently depending on which schema resource the keyword appears in.
The text was updated successfully, but these errors were encountered: