Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --clean-obo option to convert. #1236

Merged
merged 5 commits into from
Mar 5, 2025
Merged

Conversation

gouttegd
Copy link
Contributor

@gouttegd gouttegd commented Jan 19, 2025

Resolves [#995]

  • docs/ have been added/updated
  • tests have been added/updated
  • mvn verify says all tests pass
  • mvn site says all JavaDocs correct
  • CHANGELOG.md has been updated

This PR adds a --clean-obo option to the convert command, as discussed in #995.

The option accepts a space-separated list of keywords that dictate how OBO output should be tweaked:

  • drop-extra-labels: if an entity has more than one rdfs:label annotation, keep the first label only;
  • drop-extra-definitions: likewise, but for IAO:0000115 definitions;
  • drop-extra-comments: likewise, but for rdfs:comment annotations;
  • merge-comments: alternative to drop-extra-comments: merge rdfs:comment annotations into a single annotation, instead of keeping only the first one;
  • drop-untranslatable-axioms: drop axioms that cannot be represented in the OBO format instead of putting them in a owl-axioms header tag;
  • drop-gci-axioms: drop general concept inclusion axioms.

In addition, the option accepts the following shortcut keywords:

  • strict: equivalent to drop-extra-labels drop-extra-definitions drop-extra-comments;
  • simple: equivalent to strict drop-untranslatable-axioms drop-gci-axioms;
  • true: equivalent to simple.

This commit adds a `--clean-obo` option to the `convert` command.

The option accepts a space-separated list of options that dictate how
OBO output should be tweaked:

* `drop-extra-labels` (if an entity has more than one rdfs:label
  annotation, keep the first label only);
* `drop-extra-definitions` (likewise, but for IAO:0000115 definitions);
* `drop-extra-comments` (likewise, but for rdfs:comment annotations);
* `merge-comments` (merge rdfs:comment annotations into a single
  annotation);
* `drop-untranslatable-axioms` (drop axioms that cannot be represented in
  the OBO format instead of putting them in a owl-axioms header tag);
* `drop-gci-axioms` (drop general concept inclusion axioms).

In addition, the option accepts the following keywords:

* `strict`: equivalent to `drop-extra-{labels,definitions,comments}`;
* `simple`: equivalent to `strict drop-untranslatable-axioms
  drop-gci-axioms`;
* `true`: equivalent to `simple`.
@gouttegd
Copy link
Contributor Author

Not sure what caused the Java 11 test to take more than 10 minutes (thereby triggering a failure)… Can’t reproduce here, on my machine the test suite runs fine with Java 11, Java 17, and Java 21 in about 2.5 minutes.

@gouttegd gouttegd marked this pull request as ready for review January 20, 2025 10:13
@matentzn
Copy link
Contributor

Not sure what caused the Java 11 test to take more than 10 minutes (thereby triggering a failure)… Can’t reproduce here, on my machine the test suite runs fine with Java 11, Java 17, and Java 21 in about 2.5 minutes.

Whatever it was it was temporary

@jamesaoverton
Copy link
Member

This is really excellent! I'm very happy with the design, and the code looks good too. Please add some tests, the I'll kick the tired, and we'll get this merged.

@jamesaoverton
Copy link
Member

Not sure what caused the Java 11 test to take more than 10 minutes (thereby triggering a failure)… Can’t reproduce here, on my machine the test suite runs fine with Java 11, Java 17, and Java 21 in about 2.5 minutes.

I've also seen intermittent timeouts from one of the three Java versions, but haven't noticed a pattern, and it will be hell to debug. So I just re-run the failing one and it usually passes the second time. Not ideal.

Add two unit tests for the new methods dropExtraAnnotations and
mergeExtraAnnotations in the OntologyHelper class.
Test the overall --clean-obo feature with three new integration tests in
convert.md.

The tests use a new sample file (cl_module.ofn) which is a small CL
module built around CL:0000583 ("alveolar macrophage"). That module
contains instances of all the things that --clean-obo is suppose to deal
with:

* supernumerary annotations (two comments on CL:0000583)

* axioms that are not translatable into pure OBO:
    'has characteristic' some 'cell morphology' SubClassOf cell

* (translatable) GCI axioms:
    cell and ('has characteristic' some mononucleate)
      SubClassOf 'has part' some nucleus`
@gouttegd
Copy link
Contributor Author

@jamesaoverton Tests added. :)

Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great code, nicely written docs. I went through most things top to bottom (not all the Java lines in extreme detail) and I find this very readable, nicely written code. I have a tiny opinionated suggestion that I have no concern abandoning.

Add an optional argument to OntologyHelper.mergeExtraAnnotations to
allow the caller to specify the separator string to insert between
annotation values, when merging annotations together. The default
separator is a single space character.

This is a code-level feature only. The separator string is _not_
customizable from the command-line, when comment annotations are merged
as part of `--clean-obo merge-comments`.
Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super clean code, great docs, everything looks pretty much flawless!

@jamesaoverton jamesaoverton merged commit 9392292 into ontodev:master Mar 5, 2025
3 checks passed
@jamesaoverton
Copy link
Member

Sorry for taking so long to get back to this. Much appreciated!

@gouttegd gouttegd deleted the clean-obo branch March 5, 2025 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants