-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPTIMADE JSON Lines specification appendix #531
base: develop
Are you sure you want to change the base?
Conversation
Fix code block formatting Update optimade.rst Apply suggestions from code review Apply suggestions from code review Try to fix formatting again Fix formatting and add note about provider fields
6929f10
to
c5f1f53
Compare
96b0231
to
78e158c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ml-evs, and sorry for the delays. Now had some time to take a look. I like the proposal and it can be used for optimade-maker as well. But I wrote some comments for consideration.
|
||
The `JSON Lines <https://jsonlines.org/>`__ format enforces the following rules: | ||
|
||
- each line is a valid JSON object, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- each line is a valid JSON object, | |
- each line is a valid JSON value, |
Is this more precise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not sure, to me value implies something stored under a key, whereas object is the smallest chunk of data that can be parsed entirely as JSON?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on https://jsonlines.org/ they mention "values" (as even e.g. just strings or integers are supported). But I agree that in the vast majority of cases, people will use objects.
|
||
- The next line MAY contain a standard OPTIMADE ``meta`` object, following the same rules described in `JSON Response Schema: Common Fields`_, where every MUST and SHOULD rule can be reinterpreted as a MAY rule. | ||
- The next block of lines provides the ``info`` endpoint responses. | ||
- First the base info response MUST be provided, following the description at `Base Info Endpoint`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MUST
This breaks current optimade-maker jsonl files (well, just the li-ion-conductors one). Would a MAY, in principle, work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite important IMO, we don't provide any other way of identifying the file (and in fact we might want to even expand this with a version of the JSONL specification like {'x-optimade': {'api_version': 1.1, 'jsonl_version': 1.2}
if we want the JSONL file to be able to store OPTIMADE data of any version. Would you be against simply adding it to the files where it is missing atm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not sure if my comment was precise enough. I was commenting on the base info endpoint, which doesn't contain any crucial info as far as i understand, but I see how it can be useful to know what follows in the next lines.
- The next line MAY contain a standard OPTIMADE ``meta`` object, following the same rules described in `JSON Response Schema: Common Fields`_, where every MUST and SHOULD rule can be reinterpreted as a MAY rule. | ||
- The next block of lines provides the ``info`` endpoint responses. | ||
- First the base info response MUST be provided, following the description at `Base Info Endpoint`_. | ||
- The next lines MUST contain the entry info endpoint responses for the all entry types present later in the file, as described in `Entry Listing Info Endpoints`_. These MUST appear in alphabetical order by entry type name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These MUST appear in alphabetical order by entry type name.
Fine for me, but is it needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to relax this constraint, probably we'll never need to worry about an OPTIMADE JSONL file containing 100s of entry types (in which case you could use alphabetical order to short-circuit the search for a given entry type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i see, but anyway, fine for me. good to keep consistent if we require the entry blocks to be sorted, where this is more valuable, i think.
{"type": "structures", "id": "1", "attributes": {...}} | ||
{"type": "references", "id": "2", "attributes": {...}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
references
block should be ahead of structures
, as required above. But again, is the alphabetical ordering needed?
Co-authored-by: Kristjan Eimre <[email protected]>
|
||
- The next line MAY contain a standard OPTIMADE ``meta`` object, following the same rules described in `JSON Response Schema: Common Fields`_, where every MUST and SHOULD rule can be reinterpreted as a MAY rule. | ||
- The next block of lines provides the ``info`` endpoint responses. | ||
- First the base info response MUST be provided, following the description at `Base Info Endpoint`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not sure if my comment was precise enough. I was commenting on the base info endpoint, which doesn't contain any crucial info as far as i understand, but I see how it can be useful to know what follows in the next lines.
- The next line MAY contain a standard OPTIMADE ``meta`` object, following the same rules described in `JSON Response Schema: Common Fields`_, where every MUST and SHOULD rule can be reinterpreted as a MAY rule. | ||
- The next block of lines provides the ``info`` endpoint responses. | ||
- First the base info response MUST be provided, following the description at `Base Info Endpoint`_. | ||
- The next lines MUST contain the entry info endpoint responses for the all entry types present later in the file, as described in `Entry Listing Info Endpoints`_. These MUST appear in alphabetical order by entry type name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i see, but anyway, fine for me. good to keep consistent if we require the entry blocks to be sorted, where this is more valuable, i think.
{"type": "structures", "id": "1", "attributes": {...}} | ||
{"type": "references", "id": "2", "attributes": {...}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{"type": "structures", "id": "1", "attributes": {...}} | |
{"type": "references", "id": "2", "attributes": {...}} | |
{"type": "references", "id": "1", "attributes": {...}} | |
{"type": "structures", "id": "2", "attributes": {...}} |
switched em
This PR begins the work on #471, by defining the conventions on top of JSON Lines for deserializing an entire OPTIMADE API into a file.