Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update build procedure of imports #1052

Merged
merged 51 commits into from
Jul 27, 2022
Merged

Update build procedure of imports #1052

merged 51 commits into from
Jul 27, 2022

Conversation

allenbaron
Copy link
Collaborator

This PR accomplishes the following:

  1. Restores the previous (indirect) import of SYMP & TRANS while retaining all term annotations in the source files including definitions.
    • Only deprecated terms and ontology annotations are removed from the source files.
  2. Adds the versionIRI of import sources to their derivative <import name>_import.owl files as an rdfs:comment at the time a <import name>_import.owl file is built.
  3. Updates the versionIRI of all imports each release, in the same way the versionIRI doid.owl is updated.
    • This is necessary for true versioning of import modules and ensures that <import_name>_import.owl files are accessible online by both their ontology IRI and version IRIs.
  4. Adds make refresh_<import_name> commands to allow re-downloading a specific import source when re-building that import (as opposed to redownloading all imports OR rebuilding the import from a previously downloaded source file, when it exists).

Note that make clean_imports has also been renamed to make refresh_imports to better reflect its use in re-downloading AND rebuilding imports.

Groups: .obo, .owl, or .owl.gz

NCBItaxon shifted to .owl.gz because of large size (like CHEBI).
versionIRI for each import is saved to file at
build/<import name>.version
No imports to merge & no need to specify annotations to retain,
all are retained.
Currently affects only SYMP & TRANS, as other ontologies are
built as modules.
Refresh means download the latest version and rebuild the
corresponding _import.owl file.

Execute with `make refresh_<import name>`.

This differs from the shorter `make <import name>` which reuses
previously downloaded import sources if they exist.
The .version files will now automatically be made in the same dir
as the source files (currently all in src/ontology/imports/build)
and make will always search for these files in the imports/build
dir.
Announces whether update is happening from a source file with the
same version or a different version and shows the version(s).
Also expliclity declare .version files as .SECONDARY.

Presence of global declaration causes .version intermediate files
not to be regenerated when their inputs are and makes predicting
make behavior challenging.
To keep main _import.owl file rules together (organizational only).
@allenbaron
Copy link
Collaborator Author

allenbaron commented Jul 11, 2022

We should test the following to make sure these updates work as intended before merging this into main for production:

  1. Imports build correctly.
    • Test with the latest release of SYMP:
      • The new terms are added and appear as expected.
      • No term annotations (definitions, ids, labels, xrefs, etc) are removed.
      • All deprecated terms are removed.
      • The ontology annotation information is updated.
        • The ontology IRI should be http://purl.obolibrary.org/obo/doid/imports/symp_import.owl
        • The version IRI at the time symp_import.owl is first built is the same as the ontology IRI.
        • An rdfs:comment is added including the version IRI of the symp.owl source file. Should be "Source from http://purl.obolibrary.org/obo/symp/releases/2022-07-05/symp.owl".
    • Test with the latest release of NCBItaxon (using NCBItaxon term updates listed in Terms to add to imports #946):
      • The new terms are added and appear as expected.
      • The types of annotations are the same (id, label, deprecated, definition).
      • The resulting ncbitaxon_import.owl has an rdfs:comment including the version IRI of the ncbitaxon.owl source file and IRIs remain the same.
  2. make refresh_<import_name> works as expected (test with symp after tests in 1).
    • The build/symp.owl file re-downloads and symp_import.owl file is re-built (file contents remain the same).
  3. Imports are properly versioned during the build of a release.
    • Test this by executing a test release on this branch.
    • All imports now have versionIRIs matching http://purl.obolibrary.org/obo/doid/releases/<date>/imports/<import name>_import.owl and are otherwise unchanged.
    • ext.owl has a versionIRI matching http://purl.obolibrary.org/obo/doid/releases/<date>/ext.owl and is otherwise unchanged.

@allenbaron allenbaron requested a review from jbmunro July 11, 2022 16:06
@allenbaron allenbaron self-assigned this Jul 11, 2022
@allenbaron allenbaron added the imports Applies to ontologies imported into the Human Disease Ontology. label Jul 11, 2022
@jbmunro
Copy link
Contributor

jbmunro commented Jul 12, 2022

  1. Test with the latest release of SYMP = done.
  2. make refresh_symp = works as expected.

From list in issue #946
@allenbaron
Copy link
Collaborator Author

ncbitaxon_import.owl built correctly but there are some old terms that were removed. When I resolve #1053 I'll refresh the import and commit it but that work doesn't need to hold up this PR.

jbmunro and others added 13 commits July 26, 2022 12:47
Reordering these commands does not change output in any way.
Discarding non-FOODON terms from PO & BFO along with the
additional branches those terms created in the foodon import.

Note: Explicit removal of BFO is subsumed and superseded by this
change.
Discarding non-CL terms from CARO and BFO along with the
additional branches they cause to form in the CL import.
Definitions are no longer included, fixing error from multiple
definition for IAO:0000115 in import.
Fixes multiple labels issue on CL:0000000 & CL:0000540.
The only import previously using this query is FOODON. This
addition does not change the FOODON import in any way.
Ensures HP import does not retain duplicate labels from source,
which would cause errors in obo files.
Modifications:
- Terms in 'onset' branch now retain definitions
- Duplicate labels from source will be removed
@allenbaron
Copy link
Collaborator Author

Fixed errors in FOODON and CL by updating. Had to modify build commands to remove extra branches in FOODON and to avoid extra terms from outside both ontologies namespaces (BFO + CARO for CL and BFO + PO for FOODON).

We should consider switching sources to ontology -base.owl files where they are produced. This could simplify some of our import build commands.

@lschriml
Copy link
Contributor

lschriml commented Jul 27, 2022 via email

@allenbaron
Copy link
Collaborator Author

I went ahead and refreshed all the imports, excluding ECO, DISDRIV, and OMIM_SUSC which don't have automated build rules yet, to ensure they are all up-to-date and formatted as desired (with or without definitions).

For each updated import, I compared the previous version with the updated version 1) with Protege to verify the trees look as expected and their are no unexpected branches and 2) with robot diff. I also checked doid-edit.owl after each import was updated to ensure it looked okay.

All of these updates appear correct to me.

@allenbaron
Copy link
Collaborator Author

Finally, I did a test merge with the main branch and ran make release. There were no errors in the validation tests and only minor errors in building the obo files which I corrected (duplicate labels due to 2 SO terms and 1 UBERON term having labels in ext.owl). The release then completed successfully.

I then reviewed ext.owl and all of the imports (again) and they appeared to have correctly updated versionIRIs and otherwise looked as expected.

Finally, I reviewed all the reports. There are a few warnings that may be worth addressing (I did fix 1 whitespace warning) but nothing breaking.

This PR is now ready to be merged.

@allenbaron allenbaron merged commit 26d3975 into main Jul 27, 2022
@allenbaron allenbaron deleted the version_imports branch July 27, 2022 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imports Applies to ontologies imported into the Human Disease Ontology.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants