diff --git a/.github/workflows/cqa.yaml b/.github/workflows/cqa.yaml new file mode 100644 index 00000000..9c6902cc --- /dev/null +++ b/.github/workflows/cqa.yaml @@ -0,0 +1,24 @@ +name: checks +on: [push, pull_request] +jobs: + precommit_hooks: + runs-on: ubuntu-latest + strategy: + matrix: + cmd: + - "check-added-large-files" + - "trailing-whitespace" + - "end-of-file-fixer" + - "mixed-line-ending" + - "update-json-def-files" + steps: + - uses: actions/checkout@v4 + + - name: Set up Python 3.12 + uses: actions/setup-python@v5 + with: + python-version: 3.12 + + - uses: pre-commit/action@v3.0.1 + with: + extra_args: ${{ matrix.cmd }} --all-files diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index adc166be..bc77fdec 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -6,6 +6,8 @@ repos: - id: detect-private-key - id: trailing-whitespace - id: end-of-file-fixer + - id: mixed-line-ending + args: [ --fix=lf ] - repo: local hooks: - id: update-json-def-files diff --git a/.readthedocs.yaml b/.readthedocs.yaml index ec005df2..2aca6942 100644 --- a/.readthedocs.yaml +++ b/.readthedocs.yaml @@ -13,4 +13,4 @@ sphinx: python: install: - - requirements: docs/source/requirements.txt \ No newline at end of file + - requirements: docs/source/requirements.txt diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 8a64d80d..d43d95fe 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,17 +1,17 @@ # Contributing Contributions to this repository are intended to follow the VRS [development process](https://vrs.ga4gh.org/en/stable/appendices/development_process.html). -The additional information presented here are guidelines for issues, -branches, commits, and pull requests. Before adding documentation, +The additional information presented here are guidelines for issues, +branches, commits, and pull requests. Before adding documentation, please also review the [docs style guide](docs/source/style.rst). ## Discussions -[Discussions](https://github.com/ga4gh/vrs/discussions) are for feature +[Discussions](https://github.com/ga4gh/vrs/discussions) are for feature requests, release candidate discussions, and questions. ## Issues [Issues](https://github.com/ga4gh/vrs/issues) are for bug -reports, and planned feature descriptions. When creating an issue, use +reports, and planned feature descriptions. When creating an issue, use sentence case for the issue title and avoid the use of periods at the end of titles. @@ -25,12 +25,12 @@ branch for [issue 250](https://github.com/ga4gh/vrs/issues/250) could be `250-contributing`. ## Pull Requests -[Pull Requests](https://github.com/ga4gh/vrs/pulls) (PRs) for new -features should target the `main` branch. For version +[Pull Requests](https://github.com/ga4gh/vrs/pulls) (PRs) for new +features should target the `main` branch. For version patches, the PR should target the appropriate minor version branch. PRs must be approved by at least one project maintainer before they may be merged. PR titles must reflect the issue associated with the PR. For -example, the associated PR title for +example, the associated PR title for [issue 250](https://github.com/ga4gh/vrs/issues/250) would be -`#250: Add CONTRIBUTING.md`, as seen in +`#250: Add CONTRIBUTING.md`, as seen in [PR #253](https://github.com/ga4gh/vrs/pull/253). diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md index 8a056944..363dc00f 100644 --- a/CONTRIBUTORS.md +++ b/CONTRIBUTORS.md @@ -42,7 +42,7 @@ |Brian Walsh | [[10](#10)] | |Andrew D Yates | [[8](#8)] | -See also +See also [VRS contributors](https://github.com/ga4gh/vrs/graphs/contributors) and [VRS Python contributors](https://github.com/ga4gh/vrs-python/graphs/contributors). diff --git a/README.md b/README.md index 53a17394..fa606841 100644 --- a/README.md +++ b/README.md @@ -32,7 +32,7 @@ The VRS model is the product of the [GA4GH Variation Representation group](https ## Using the schema -The schema is available in the [schema/](./schema/) directory, in both yaml and json versions. +The schema is available in the [schema/](./schema/) directory, in both yaml and json versions. It conforms to JSON Schema Draft 2020-12. For a list of libraries that support JSON schema, see [JSONSchema>Tools](https://json-schema.org/tools). diff --git a/TODO b/TODO index 43d60131..6eac6d15 100644 --- a/TODO +++ b/TODO @@ -1,5 +1,5 @@ # Docs see doc-updates branch * Standardize quoting: '**blah**' → ``blah`` -* Investigate -https://pypi.org/project/sphinx-jsonschema/ \ No newline at end of file +* Investigate +https://pypi.org/project/sphinx-jsonschema/ diff --git a/docs/source/appendices/design_decisions.rst b/docs/source/appendices/design_decisions.rst index 1a53bbb8..d7d386a0 100644 --- a/docs/source/appendices/design_decisions.rst +++ b/docs/source/appendices/design_decisions.rst @@ -9,14 +9,14 @@ GA4GH Inherent Properties over Value Objects -------------------------------------------- In VRS 1.0 we operated under the principle that all identifiable objects in VRS (e.g. Allele, SequenceLocation, etc.) -would be *value objects*. This meant that they should be immutable and contain only required fields that are +would be *value objects*. This meant that they should be immutable and contain only required fields that are necessary to uniquely identify the object. This approach somewhat simplified the ability to genertate the digests by allowing the computation of the digest to be based on the entire object. An exception was made for properties with a leading underscore (namely, the *_id* property), which was removed from the object before a digest was calculated. In VRS 2.0 we extended the principle of excepting designated attributes by explicitly defining *inherent properties* -that constitute the properties used to compute an object digest. This was done to enable expressivity of VRS, -enabling implementations to pass common, descriptive metadata as part of the identifiable objects without sacrificing +that constitute the properties used to compute an object digest. This was done to enable expressivity of VRS, +enabling implementations to pass common, descriptive metadata as part of the identifiable objects without sacrificing the ability to create globally unique, federated identifiers from VRS 1.3. As a result, we had to introduce a new field in the digest model called *ga4gh.inherent* which is described in detail @@ -25,13 +25,13 @@ in the section on :ref:`ga4gh-inherent-properties`. IRIs over CURIEs ---------------- -In VRS 2.0 we moved away from the use of CURIEs in favor of :ref:`iriReference`. Several factors played a role in +In VRS 2.0 we moved away from the use of CURIEs in favor of :ref:`iriReference`. Several factors played a role in this decision. -JSON Schema, the default data model for GKS specifications, does not allow for encoding of CURIE namespaces as is done -in other frameworks such as JSON-LD or XML. As a result, namespaces must be captured from custom data structures, API +JSON Schema, the default data model for GKS specifications, does not allow for encoding of CURIE namespaces as is done +in other frameworks such as JSON-LD or XML. As a result, namespaces must be captured from custom data structures, API endpoints, or documentation that may not persist as messages are exchanged between systems. To address this, references -in GKS specs now use IRIs to reference objects explicitly. +in GKS specs now use IRIs to reference objects explicitly. IRI-References over IRIs ------------------------ @@ -44,7 +44,7 @@ VRS identifier syntax and versioning The :ref:`versioning` section describes the versioning and release naming conventions for the VRS product. Approved releases will be assigned to the version number alone, but connect, ballot and snapshot releases will -include the context term and date in addition to the target version number. +include the context term and date in addition to the target version number. During the GA4GH Connect April 2023 meeting the maturity model was discussed at length and the following proposal was presented for instance and class GKS identifiers. @@ -64,13 +64,13 @@ As an example, the Github JSON Schema URL ($id) for the VRS 2.0.0 Allele is: } During the **release and versioning** discussion at the GA4GH Connect April 2023 meeting the proposal -delved into the idea of including the major version number in the VRS identifier itself. Proponents of -this approach cited concern for the change in digests (and their derived identifiers) between major -versions of the same VRS object, which would become clearly visible in the identifier itself if the +delved into the idea of including the major version number in the VRS identifier itself. Proponents of +this approach cited concern for the change in digests (and their derived identifiers) between major +versions of the same VRS object, which would become clearly visible in the identifier itself if the major version was included. -Opponents of this approach argued that new identifiers would be required for every type of VRS object -for every major version release. Meaning that even if a given type of object has no change that would +Opponents of this approach argued that new identifiers would be required for every type of VRS object +for every major version release. Meaning that even if a given type of object has no change that would result in a new digest, a new identifier would still be required for the new major version. After much discussion, the decision was made to NOT include the major version number in the VRS identifier @@ -88,4 +88,3 @@ the following syntax: .. code-block:: https://w3id.org/ga4gh/vrs/VA.Oop4kjdTtKcg1kiZjIJAAR3bp7qi4aNT - diff --git a/docs/source/appendices/ga4gh_identifiers.rst b/docs/source/appendices/ga4gh_identifiers.rst index ca4e9445..b15401ce 100644 --- a/docs/source/appendices/ga4gh_identifiers.rst +++ b/docs/source/appendices/ga4gh_identifiers.rst @@ -79,8 +79,8 @@ GA4GH Inherent Properties implementations. VRS 2.0 addresses this limitation with the designation of inherent properties for use with the computed identifier algorithm. -When creating computed identifiers from objects, VRS uses a custom schema attribute, -*ga4gh.inherent*, that contains the property names used for computing digests. For example, +When creating computed identifiers from objects, VRS uses a custom schema attribute, +*ga4gh.inherent*, that contains the property names used for computing digests. For example, the Allele JSON Schema: .. parsed-literal:: @@ -105,7 +105,7 @@ the Allele JSON Schema: .. note:: - The `ga4gh` JSON Schema namespace is aligned with the Sequence Collections effort + The `ga4gh` JSON Schema namespace is aligned with the Sequence Collections effort (see `SeqCol#84 `_). GA4GH Type Prefixes diff --git a/docs/source/appendices/maturity_model.rst b/docs/source/appendices/maturity_model.rst index d677fe9d..cdaedf6e 100644 --- a/docs/source/appendices/maturity_model.rst +++ b/docs/source/appendices/maturity_model.rst @@ -3,44 +3,44 @@ GKS Maturity Model !!!!!!!!!!!!!!!!!! -The Genomic Knowledge Standards work stream is developing semantic data exchange -standards for federated genomic knowledge sharing. To address this, new technical -specifications are required, such as the VRS standard, which must be developed -and iterated upon through application across community implementations. This -creates a tension between the need to create products with enough stability for -initial community adoption, while ensuring that they can evolve with minimal -disruption to interoperate smoothly across a diverse set of genomic knowledge -resources. Mechanisms for communicating the stability, uptake, and development -of technical specifications are therefore of paramount importance to addressing +The Genomic Knowledge Standards work stream is developing semantic data exchange +standards for federated genomic knowledge sharing. To address this, new technical +specifications are required, such as the VRS standard, which must be developed +and iterated upon through application across community implementations. This +creates a tension between the need to create products with enough stability for +initial community adoption, while ensuring that they can evolve with minimal +disruption to interoperate smoothly across a diverse set of genomic knowledge +resources. Mechanisms for communicating the stability, uptake, and development +of technical specifications are therefore of paramount importance to addressing this balance. -A maturity model is a useful mechanism for communicating varying stability across -product features (e.g. data classes or protocols) of a GKS standard. This is -needed to help data producers at each stage of the adoption lifecycle -decide on the appropriate time to engage and implement the standard. Product -features that have progressed through the maturity model should have an associated -progression of support from the GKS specification maintainers for message +A maturity model is a useful mechanism for communicating varying stability across +product features (e.g. data classes or protocols) of a GKS standard. This is +needed to help data producers at each stage of the adoption lifecycle +decide on the appropriate time to engage and implement the standard. Product +features that have progressed through the maturity model should have an associated +progression of support from the GKS specification maintainers for message generation, translation, and validation tooling. -Here we define the maturity model and release process for developing and -maintaining GKS standards, with the goal of enabling timely specification +Here we define the maturity model and release process for developing and +maintaining GKS standards, with the goal of enabling timely specification adoption by the community. .. figure:: ../images/adoption_lifecycle.png :width: 800 - The Innovation Adoption Lifecycle. - - *The Innovation Adoption Lifecycle illustrates adoption rates (y-axis) for - new technologies over time (x-axis). Innovators (leftmost on the time axis) - are among the first to adopt a new technology, and laggards (rightmost) are - among the last, reflecting the differing needs for innovation and stability - by these community groups. Adopters in every category along the innovation - adoption lifecycle benefit from communication about the maturity of technical - specification components generated by the Genomic Knowledge Standards work - stream. Communicating when a component is ready for implementation by groups - along the innovation / stability spectrum is a primary goal of the maturity - model, enabling adopters to engage at a time that is appropriate for their + The Innovation Adoption Lifecycle. + + *The Innovation Adoption Lifecycle illustrates adoption rates (y-axis) for + new technologies over time (x-axis). Innovators (leftmost on the time axis) + are among the first to adopt a new technology, and laggards (rightmost) are + among the last, reflecting the differing needs for innovation and stability + by these community groups. Adopters in every category along the innovation + adoption lifecycle benefit from communication about the maturity of technical + specification components generated by the Genomic Knowledge Standards work + stream. Communicating when a component is ready for implementation by groups + along the innovation / stability spectrum is a primary goal of the maturity + model, enabling adopters to engage at a time that is appropriate for their organizational needs.* .. _feature-maturity-levels: @@ -48,7 +48,7 @@ adoption by the community. Feature Maturity levels @@@@@@@@@@@@@@@@@@@@@@@ -It may be helpful to visualize the application of maturity levels by viewing the +It may be helpful to visualize the application of maturity levels by viewing the current :ref:`classDiagram`. .. figure:: ../images/maturity_levels.png @@ -56,11 +56,11 @@ current :ref:`classDiagram`. Product feature maturity level criteria and commitments. -Product feature maturity levels are to be reviewed and advanced by consensus among -defined decision-makers following Work Stream and GA4GH processes, in consultation -with the associated product group membership. Factors to be considered for product -feature maturity advancement include the criteria specified in the above table, the -degree of adoption observed in the community, feedback provided by adopters, and +Product feature maturity levels are to be reviewed and advanced by consensus among +defined decision-makers following Work Stream and GA4GH processes, in consultation +with the associated product group membership. Factors to be considered for product +feature maturity advancement include the criteria specified in the above table, the +degree of adoption observed in the community, feedback provided by adopters, and availability of specification maintainers to provide the level of support required. Developing a Draft Product Feature @@ -68,15 +68,15 @@ Developing a Draft Product Feature **Decision-makers**: :ref:`feature-developers`, :ref:`product-leads` -**Criteria**: Draft product feature development work should be based on real use -cases across multiple environments (aligned with `GA4GH Product Development 14.5`_). -Requirements may result directly from a `landscape analysis of the problem domain`_, -or may emerge in the course of technical specification development. It is expected -that the need for product features are first discussed in a community forum (e.g. +**Criteria**: Draft product feature development work should be based on real use +cases across multiple environments (aligned with `GA4GH Product Development 14.5`_). +Requirements may result directly from a `landscape analysis of the problem domain`_, +or may emerge in the course of technical specification development. It is expected +that the need for product features are first discussed in a community forum (e.g. GitHub Discussions, GKS Work Stream calls). -**Process**: Follow the GKS :ref:`development-process`. As part of this process, -it is expected that consensus among the decision-makers was reached and major design +**Process**: Follow the GKS :ref:`development-process`. As part of this process, +it is expected that consensus among the decision-makers was reached and major design decisions documented. Disagreements are resolved per Work Stream and GA4GH processes. Advancing from Draft to Trial Use @@ -84,17 +84,17 @@ Advancing from Draft to Trial Use **Decision-makers**: :ref:`feature-developers`, :ref:`product-leads`, :ref:`product-implementers` -**Criteria**: Advancing a draft product feature to trial use should include at least two -independent product implementers that commit to supporting the draft product feature once -it has been advanced to trial use. At least one of these implementations must be open (aligned -with `GA4GH Product Development 14.8.3`_). Advancing a product feature to trial use also mandates -a minor version increment at the next release. As part of this process, it is expected that -consensus among the decision-makers was reached and major design decisions documented. Disagreement +**Criteria**: Advancing a draft product feature to trial use should include at least two +independent product implementers that commit to supporting the draft product feature once +it has been advanced to trial use. At least one of these implementations must be open (aligned +with `GA4GH Product Development 14.8.3`_). Advancing a product feature to trial use also mandates +a minor version increment at the next release. As part of this process, it is expected that +consensus among the decision-makers was reached and major design decisions documented. Disagreement resolution is handled per Work Stream and GA4GH processes. -**Process**: A ballot release is created that describes draft models under evaluation for -advancement to trial use. A survey is sent to all Product Implementers that have indicated -they are implementing one or more features under evaluation for advance to Trial Use. This +**Process**: A ballot release is created that describes draft models under evaluation for +advancement to trial use. A survey is sent to all Product Implementers that have indicated +they are implementing one or more features under evaluation for advance to Trial Use. This survey includes: 1. Name of Product Implementer @@ -103,7 +103,7 @@ survey includes: #. Comments on response (e.g. explicit endorsement or description of gaps) There is a minimum 1-week review period for Product Implementers to respond, though this may -be longer at the discretion of the product leads. More time for individual contributors may +be longer at the discretion of the product leads. More time for individual contributors may be permitted on request. Advancing from Trial Use to Normative @@ -112,12 +112,12 @@ Advancing from Trial Use to Normative **Decision-makers**: :ref:`feature-developers`, :ref:`product-leads`, :ref:`product-implementers`, :ref:`ws-leads` -**Criteria**: A normative model should have demonstrated interoperability of multiple data -generation and data consumption implementations, and should include implementations beyond -those used to advance a model to Trial Use. Advancing a product feature to normative also -mandates a minor version increment at the next release. As part of this process, it is -expected that consensus among the decision-makers was reached and major design decisions -documented. Community consultation and disagreement resolution are handled per Work Stream +**Criteria**: A normative model should have demonstrated interoperability of multiple data +generation and data consumption implementations, and should include implementations beyond +those used to advance a model to Trial Use. Advancing a product feature to normative also +mandates a minor version increment at the next release. As part of this process, it is +expected that consensus among the decision-makers was reached and major design decisions +documented. Community consultation and disagreement resolution are handled per Work Stream and GA4GH processes. .. _GA4GH Product Development 14.5: https://www.ga4gh.org/our-products/development-and-approval-process/#section_5:~:text=14.5%20Development%20work%20should%20be%20based%20on%20real%20use%20cases%20across%20multiple%20environments. @@ -129,14 +129,14 @@ and GA4GH processes. Product Versioning and Releases @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ -Versions are used to identify releases of the entire specification, not to individual product features. -Technical specification development is intrinsically linked to policy surrounding major and minor version +Versions are used to identify releases of the entire specification, not to individual product features. +Technical specification development is intrinsically linked to policy surrounding major and minor version identification, which follow `semantic versioning v2 `__ practices for API versioning. Versioning examples ################### -Version syntax follows SemVer syntax. Examples of how product features at different maturity levels are +Version syntax follows SemVer syntax. Examples of how product features at different maturity levels are applied to the SemVer major/minor/patch syntax as follows: Major Version Increment @@ -167,17 +167,17 @@ $$$$$$$$$$$$$$$$$$$$$$$ - Addition of implementation guidance, tests, or other supporting product features that do not directly affect data compatibility -Versioning of approved GA4GH standards additionally follow the procedures for `GA4GH Product Updates `__. -Specifically, advancement of data classes to the trial use or normative levels must be accompanied by a -minor release increment, and therefore may only be included in a release following an appropriate community +Versioning of approved GA4GH standards additionally follow the procedures for `GA4GH Product Updates `__. +Specifically, advancement of data classes to the trial use or normative levels must be accompanied by a +minor release increment, and therefore may only be included in a release following an appropriate community and PRC consultation process (`GA4GH Product Development 32 `__). Releases ######## -In order to support continuous development of a technical specification, pre-release snapshots are -allowed and must use the SemVer syntax for pre-releases. Pre-release snapshots may be created for -purpose at any time by the product leads. Pre-release snapshots should use the following pre-release -labels as version suffixes for the indicated purposes: +In order to support continuous development of a technical specification, pre-release snapshots are +allowed and must use the SemVer syntax for pre-releases. Pre-release snapshots may be created for +purpose at any time by the product leads. Pre-release snapshots should use the following pre-release +labels as version suffixes for the indicated purposes: - connect.-[.] - for pre-releases to be evaluated at an upcoming GA4GH Connect meeting @@ -192,16 +192,16 @@ labels as version suffixes for the indicated purposes: - for use as needed for all other purposes - N increments for successive snapshots -These pre-release labels are appended to the major, minor, and patch components to create -a pre-release version following the SemVer ..-