-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define what the service info will contain #39
Comments
Conclusion from today's discussion: At this point, the schema is probably the only thing. In the future, if we specify some ways to do some of the optional things, like:
The |
I think this is actually the same as issue #3 Or, maybe put another way: what is the relationship between |
Sequence collection's service info needs to comply with https://github.com/ga4gh-discovery/ga4gh-service-info/blob/master/service-info.yaml and therefore needs some GA4GH content as well as some content specific to sequence collection. {
"id": "uk.ebi.eva.eva-seqcol",
"name": "Implementation of the GA4GH Sequence collection retrieval API",
"type": {
"group": "org.ga4gh",
"artifact": "seqcol",
"version": "0.0.0"
},
"organization": {
"name": "European Variation Archive",
"url": "https://www.ebi.ac.uk/eva"
},
"contactUrl": "mailto:[email protected]",
"documentationUrl": "https://www.ebi.ac.uk/eva/....",
"updatedAt": "date",
"environment": "dev",
"version": "tagged version"
} If threre is nothing else but the schema to specify, the sequence collection specific content could look like: "seqcol": {
"description": "A collection of sequences, representing biological sequences including nucleotide or amino acid sequences. For example, a sequence collection may represent the set of chromosomes in a reference genome, the set of transcripts in a transcriptome, a collection of reference microbial sequences of a specific gene, or any other set of sequences.",
"type": "object",
"properties":{
"lengths":{
...
} But we might want to allow for other properties to define implementation specific feature in the future "seqcol": {
"schema": {
"description": "A collection of sequences, representing biological sequences including nucleotide or amino acid sequences. For example, a sequence collection may represent the set of chromosomes in a reference genome, the set of transcripts in a transcriptome, a collection of reference microbial sequences of a specific gene, or any other set of sequences.",
"type": "object",
"properties":{
"lengths":{
}
}
},
"other properties": { }
} Also do we expect implementation to declare all the properties they are using in the schema or to refer to our published schema ? |
How do those two things go together? is it like this, parallel? {
"id": "uk.ebi.eva.eva-seqcol",
...
"seqcol": { ... }
} I think the seqcol section put "seqcol": {
"schema": {
"description": "A collection of sequences, representing biological sequences including nucleotide or amino acid sequences. For example, a sequence collection may represent the set of chromosomes in a reference genome, the set of transcripts in a transcriptome, a collection of reference microbial sequences of a specific gene, or any other set of sequences.",
"type": "object",
"properties":{
"lengths":{
}
}
}
"sorted_name_length_pairs": true
} |
Yes that's right: See how it is done for refget in cram registry or the reference implementation |
After the last call, I realize @nsheff was right in that I misremembered a bit how See e.g. : https://json-schema.org/understanding-json-schema/structuring.html#ref However, as was also mentioned, you can make use of So you could do, for instance (in YAML): $schema: https://json-schema.org/draft/2020-12/schema
allOf:
- $ref: https://raw.github.com/ga4gh/seqcol-spec/.../core_input_schema.json
- $ref: https://raw.github.com/ga4gh/seqcol-spec/.../topology_schema.json
- type: object
properties:
- custom_array:
type: array
collated: true
description: Custom array
items:
type: string
required:
- sequences
- topologies
inherent:
- custom_array Where the $schema: https://json-schema.org/draft/2020-12/schema
type: object
properties:
- topologies:
type: array
collated: true
description: Topology of the sequence
items:
type: string
enum:
- linear
- circular
inherent:
- topologies This adds the pre-defined and standard While very useful, this patterns leads to a problem regarding how we specify the |
So this is finally an attempt to write down some thoughts I have regarding a restructuring of the schema. First, some of the problems I see with the current suggestion:
|
So my suggestion is quite simple.
|
Here's an example ref resolver we wrote in my group: |
Just a very brief recap of the essence of the discussion today. I think there are at least three use cases of service info (
Basically, I argued for splitting 1. from 2. and 3. by using a plain JSON Schema for validation, and a custom JSON structure for 2. and 3. (with an extra JSON Schema for use with validating that a server complies with our custom JSON structure). The counterargument here is that our custom JSON structure would then end up looking very much like the JSON Schema anyway, especially when allowing for future hierarchical extensions, such as for the pangenome. In practice, there will be a duplication of mostly the same information in two different structures. This is a good counterargument that I don't really have a good response to. Say then, that we do want to use the same JSON Schema structure for all of 1., 2., and 3. Then there is the issue of whether we should allow A related issue that was discussed was how to manage the relationship between a JSON Schema that is provided with the specification – defining the standard attributes, which of these are required and so on – with a JSON Schema that describes the attributes that are actually supported by a particular server. For validation (use case 1) one way to manage this is through the use of a However, if I think that is the gist of it!... |
regarding extending JSON schema: https://json-schema.org/draft/2020-12/json-schema-core#section-6.5
I interpret this to mean what we're doing to add |
Getting back to the original point here -- @tcezard would be be willing to revisit your original concern here, and confirm that everything we need is defined as such in the spec? For example, do we have everything covered from this comment from you: #39 (comment) |
The service-info for sequence collections will need to inherit from this specification but additional fields can be added.
This issue is to discuss what are the fields that should be declared in the seqcol's service info.
For examples, we could add the seqCol schema in the service-info.
The text was updated successfully, but these errors were encountered: