Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --clean-obo option to convert. #1236

Merged
merged 5 commits into from
Mar 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## Fixed
### Added
- Add `--clean-obo` option to [`convert`] [#995]

### Fixed
- Update owl-diff dependency for stable ordering and to avoid large string creation [#1227]
- Improve disambiguation of properties in QuotedEntityChecker [#1226]
- Skip "non-robot" columns in templates for the purposes of axiom annotations [#1216]
Expand Down
36 changes: 32 additions & 4 deletions docs/convert.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,38 @@ By default, the OBO writer strictly enforces [document structure rules](http://o

As a document is converted to OBO, you may see `ERROR MASKING ERROR` exceptions. This does not indicate failure, but it should be noted that these axioms will not be translated to OBO format. Rather, they will be included in the ontology header under `owl-axioms`. See [Untranslatable OWL axioms](http://owlcollab.github.io/oboformat/doc/obo-syntax.html#5.0.4) for more details.

You can choose to keep these in the file, or remove them with:
```
grep -v ^owl-axioms
```
The OBO output can be fine-tuned with the `--clean-obo` option. That option takes a space-separated list of keywords that each enables a customization of the OBO output. Available keywords are:
- `drop-extra-labels`: forcefully drop supernumerary `rdfs:label` annotation, to make the ontology compliant with the OBO specification (which dictates that a class can only have one label).
- `drop-extra-definitions`: likewise, but for `IAO:0000115` annotations (definitions).
- `drop-extra-comments`: likewise, but for `rdfs:comment` annotations.
- `merge-comments`: merge `rdfs:comment` annotations, when there are more than one, into a single annotation (alternative to `drop-extra-comments`).
- `drop-untranslatable-axioms`: drop axioms that cannot be represented in OBO format, instead of writing them into the aforementioned `owl-axioms` header tag.
- `drop-gci-axioms`: drop axioms that represent General Concept Inclusions, even if they can be legally represented in OBO format.

In addition, the following special keywords are also accepted:
- `strict`: equivalent to `drop-extra-labels drop-extra-definitions drop-extra-comments`, to force the production of a valid OBO file by dropping supernumerary annotations as needed.
- `true`: alias for `strict`.
- `simple`: equivalent to `strict drop-untranslatable-axioms drop-gci-axioms`, to force the production of an OBO file that is not only valid, but also free of any `owl-axioms` header tag and GCI axioms (which, while perfectly valid with respect to the OBO specification, are not always handled correctly by all OBO parsers).

#### Examples

Convert a file to OBO and ensure the resulting file is compliant with the OBO specification, dropping supernumerary annotations if necessary:

robot convert -i cl_module.ofn \
--clean-obo strict \
--output results/cl_module-strict.obo

Likewise, but with merging comments into a single one instead of dropping the supernumerary comments:

robot convert -i cl_module.ofn \
--clean-obo "strict merge-comments" \
--output results/cl_module-strict-mergedcomments.obo

Convert a file to a simple variant of the OBO format (without any `owl-axioms` tag and GCI axioms):

robot convert -i cl_module.ofn \
--clean-obo simple \
--output results/cl_module-simple.obo

---

Expand Down
145 changes: 145 additions & 0 deletions docs/examples/cl_module-simple.obo
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
format-version: 1.2
ontology: cl

[Term]
id: CL:0000000
name: cell
def: "A material entity of anatomical origin (part of or deriving from an organism) that has as its parts a maximally connected cell compartment surrounded by a plasma membrane." [CARO:mah]
comment: The definition of cell is intended to represent all cells, and thus a cell is defined as a material entity and not an anatomical structure, which implies that it is part of an organism (or the entirety of one).
is_a: UBERON:0000061 ! anatomical structure

[Term]
id: CL:0000113
name: mononuclear phagocyte
def: "A vertebrate phagocyte with a single nucleus." [GOC:add, GOC:tfm, ISBN:0781735149]
is_a: CL:0000842 ! mononuclear cell
property_value: RO:0002175 NCBITaxon:9606

[Term]
id: CL:0000235
name: macrophage
def: "A mononuclear phagocyte present in variety of tissues, typically differentiated from monocytes, capable of phagocytosing a variety of extracellular particulate material, including immune complexes, microorganisms, and dead cells." [GO_REF:0000031, GOC:add, GOC:tfm, PMID:16213494, PMID:1919437]
comment: Morphology: Diameter 30_M-80 _M, abundant cytoplasm, low N/C ratio, eccentric nucleus. Irregular shape with pseudopods, highly adhesive. Contain vacuoles and phagosomes, may contain azurophilic granules; markers: Mouse & Human: CD68, in most cases CD11b. Mouse: in most cases F4/80+; role or process: immune, antigen presentation, & tissue remodelling; lineage: hematopoietic, myeloid.
synonym: "histiocyte" EXACT []
is_a: CL:0000113 ! mononuclear phagocyte
property_value: RO:0002175 NCBITaxon:9606

[Term]
id: CL:0000583
name: alveolar macrophage
def: "A tissue-resident macrophage found in the alveoli of the lungs. Ingests small inhaled particles resulting in degradation and presentation of the antigen to immunocompetent cells. Markers include F4/80-positive, CD11b-/low, CD11c-positive, CD68-positive, sialoadhesin-positive, dectin-1-positive, MR-positive, CX3CR1-negative." [GO_REF:0000031, GOC:ana, GOC:dsd, GOC:tfm, MESH:D016676]
comment: Markers: Mouse: F4/80mid, CD11b-/low, CD11c+, CD68+, sialoadhesin+, dectin-1+, MR+, CX3CR1-.
synonym: "dust cell" EXACT []
synonym: "MF.Lu" RELATED []
xref: FMA:83023
is_a: CL:0000235 ! macrophage
property_value: RO:0002175 NCBITaxon:9606

[Term]
id: CL:0000738
name: leukocyte
def: "An achromatic cell of the myeloid or lymphoid lineages capable of ameboid movement, found in blood or other tissue." [GOC:add, GOC:tfm, ISBN:978-0-323-05290-0]
synonym: "immune cell" RELATED []
synonym: "leucocyte" EXACT []
synonym: "white blood cell" EXACT []
is_a: CL:0000988 ! hematopoietic cell
property_value: RO:0002175 NCBITaxon:9606

[Term]
id: CL:0000842
name: mononuclear cell
def: "A leukocyte with a single non-segmented nucleus in the mature form." [GOC:add]
synonym: "mononuclear leukocyte" EXACT []
synonym: "peripheral blood mononuclear cell" NARROW []
is_a: CL:0000738 ! leukocyte
intersection_of: CL:0000738 ! leukocyte
intersection_of: bearer_of PATO:0001407 ! mononucleate
relationship: bearer_of PATO:0001407 ! mononucleate
relationship: has_part GO:0005634 ! nucleus

[Term]
id: CL:0000988
name: hematopoietic cell
def: "A cell of a hematopoietic lineage." [GO_REF:0000031, GOC:add]
synonym: "haematopoietic cell" EXACT []
synonym: "hemopoietic cell" EXACT []
is_a: CL:0000000 ! cell

[Term]
id: GO:0005634
name: nucleus
namespace: cellular_component
def: "A membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. In most cells, the nucleus contains all of the cell's chromosomes except the organellar chromosomes, and is the site of RNA synthesis and processing. In some species, or in specialized cell types, RNA metabolism or DNA replication may be absent." [GOC:go_curators]
synonym: "cell nucleus" EXACT []
synonym: "horsetail nucleus" NARROW [GOC:al, GOC:mah, GOC:vw, PMID:15030757]
is_a: UBERON:0000061 ! anatomical structure
relationship: has_part UBERON:0000061 ! anatomical structure

[Term]
id: PATO:0001407
name: mononucleate
namespace: quality
def: "A nucleate quality inhering in a bearer by virtue of the bearer's having one nucleus." [Biology-online:Biology-online]
subset: cell_quality
subset: mpath_slim
subset: value_slim

[Term]
id: PATO:0010006
name: cell morphology
namespace: quality
def: "A quality of a single cell inhering in the bearer by virtue of the bearer's size or shape or structure." [https://orcid.org/0000-0002-7073-9172]
comment: Use this term for morphologies that can *only* inhere in a cell, e.g. morphological qualities inhering in a cell by virtue of the presence, location or shape of one or more cell parts.
property_value: http://purl.org/dc/terms/contributor https://orcid.org/0000-0002-7073-9172
creation_date: 2021-01-23T11:31:53Z

[Term]
id: UBERON:0000061
name: anatomical structure
namespace: uberon
def: "Material anatomical entity that is a single connected structure with inherent 3D shape generated by coordinated expression of the organism's own genome." [CARO:0000003]
synonym: "biological structure" EXACT []
synonym: "connected biological structure" EXACT [CARO:0000003]
is_a: UBERON:0000465 ! material anatomical entity
property_value: RO:0002175 NCBITaxon:33090
property_value: RO:0002175 NCBITaxon:33208
property_value: RO:0002175 NCBITaxon:4751

[Term]
id: UBERON:0000465
name: material anatomical entity
namespace: uberon
def: "Anatomical entity that has mass." [http://orcid.org/0000-0001-9114-8737]
is_a: UBERON:0001062 ! anatomical entity
property_value: RO:0002175 NCBITaxon:33090
property_value: RO:0002175 NCBITaxon:33208
property_value: RO:0002175 NCBITaxon:4751

[Term]
id: UBERON:0001062
name: anatomical entity
namespace: uberon
def: "Biological entity that is either an individual member of a biological species or constitutes the structural organization of an individual member of a biological species." [FMA:62955, http://orcid.org/0000-0001-9114-8737]
property_value: RO:0002175 NCBITaxon:33090
property_value: RO:0002175 NCBITaxon:33208
property_value: RO:0002175 NCBITaxon:4751

[Typedef]
id: bearer_of
name: has characteristic
namespace: external
def: "Inverse of characteristic_of" []
xref: RO:0000053
is_inverse_functional: true

[Typedef]
id: has_part
name: has part
namespace: external
def: "a core relation that holds between a whole and its part" []
subset: http://purl.obolibrary.org/obo/valid_for_go_annotation_extension
subset: http://purl.obolibrary.org/obo/valid_for_go_ontology
subset: http://purl.obolibrary.org/obo/valid_for_gocam
xref: BFO:0000051
is_transitive: true

147 changes: 147 additions & 0 deletions docs/examples/cl_module-strict-mergedcomments.obo
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
format-version: 1.2
ontology: cl
owl-axioms: Prefix(owl:=<http://www.w3.org/2002/07/owl#>)\nPrefix(rdf:=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>)\nPrefix(xml:=<http://www.w3.org/XML/1998/namespace>)\nPrefix(xsd:=<http://www.w3.org/2001/XMLSchema#>)\nPrefix(rdfs:=<http://www.w3.org/2000/01/rdf-schema#>)\n\n\nOntology(\nDeclaration(Class(<http://purl.obolibrary.org/obo/CL_0000000>))\nDeclaration(Class(<http://purl.obolibrary.org/obo/PATO_0010006>))\nDeclaration(ObjectProperty(<http://purl.obolibrary.org/obo/RO_0000053>))\n\n\nSubClassOf(ObjectSomeValuesFrom(<http://purl.obolibrary.org/obo/RO_0000053> <http://purl.obolibrary.org/obo/PATO_0010006>) <http://purl.obolibrary.org/obo/CL_0000000>)\n)

[Term]
id: CL:0000000
name: cell
def: "A material entity of anatomical origin (part of or deriving from an organism) that has as its parts a maximally connected cell compartment surrounded by a plasma membrane." [CARO:mah]
comment: The definition of cell is intended to represent all cells, and thus a cell is defined as a material entity and not an anatomical structure, which implies that it is part of an organism (or the entirety of one).
is_a: UBERON:0000061 ! anatomical structure
relationship: has_part GO:0005634 {gci_filler="PATO:0001407", gci_relation="bearer_of"} ! nucleus

[Term]
id: CL:0000113
name: mononuclear phagocyte
def: "A vertebrate phagocyte with a single nucleus." [GOC:add, GOC:tfm, ISBN:0781735149]
is_a: CL:0000842 ! mononuclear cell
property_value: RO:0002175 NCBITaxon:9606

[Term]
id: CL:0000235
name: macrophage
def: "A mononuclear phagocyte present in variety of tissues, typically differentiated from monocytes, capable of phagocytosing a variety of extracellular particulate material, including immune complexes, microorganisms, and dead cells." [GO_REF:0000031, GOC:add, GOC:tfm, PMID:16213494, PMID:1919437]
comment: Morphology: Diameter 30_M-80 _M, abundant cytoplasm, low N/C ratio, eccentric nucleus. Irregular shape with pseudopods, highly adhesive. Contain vacuoles and phagosomes, may contain azurophilic granules; markers: Mouse & Human: CD68, in most cases CD11b. Mouse: in most cases F4/80+; role or process: immune, antigen presentation, & tissue remodelling; lineage: hematopoietic, myeloid.
synonym: "histiocyte" EXACT []
is_a: CL:0000113 ! mononuclear phagocyte
property_value: RO:0002175 NCBITaxon:9606

[Term]
id: CL:0000583
name: alveolar macrophage
def: "A tissue-resident macrophage found in the alveoli of the lungs. Ingests small inhaled particles resulting in degradation and presentation of the antigen to immunocompetent cells. Markers include F4/80-positive, CD11b-/low, CD11c-positive, CD68-positive, sialoadhesin-positive, dectin-1-positive, MR-positive, CX3CR1-negative." [GO_REF:0000031, GOC:ana, GOC:dsd, GOC:tfm, MESH:D016676]
comment: Markers: Mouse: F4/80mid, CD11b-/low, CD11c+, CD68+, sialoadhesin+, dectin-1+, MR+, CX3CR1-. The marker set MSR1, FABP4 can identify the Human cell type alveolar macrophage in the Lung with a confidence of 0.80 (NS-Forest FBeta value). {xref="https://doi.org/10.5281/zenodo.11165918"}
synonym: "dust cell" EXACT []
synonym: "MF.Lu" RELATED []
xref: FMA:83023
is_a: CL:0000235 ! macrophage
property_value: RO:0002175 NCBITaxon:9606

[Term]
id: CL:0000738
name: leukocyte
def: "An achromatic cell of the myeloid or lymphoid lineages capable of ameboid movement, found in blood or other tissue." [GOC:add, GOC:tfm, ISBN:978-0-323-05290-0]
synonym: "immune cell" RELATED []
synonym: "leucocyte" EXACT []
synonym: "white blood cell" EXACT []
is_a: CL:0000988 ! hematopoietic cell
property_value: RO:0002175 NCBITaxon:9606

[Term]
id: CL:0000842
name: mononuclear cell
def: "A leukocyte with a single non-segmented nucleus in the mature form." [GOC:add]
synonym: "mononuclear leukocyte" EXACT []
synonym: "peripheral blood mononuclear cell" NARROW []
is_a: CL:0000738 ! leukocyte
intersection_of: CL:0000738 ! leukocyte
intersection_of: bearer_of PATO:0001407 ! mononucleate
relationship: bearer_of PATO:0001407 ! mononucleate
relationship: has_part GO:0005634 ! nucleus

[Term]
id: CL:0000988
name: hematopoietic cell
def: "A cell of a hematopoietic lineage." [GO_REF:0000031, GOC:add]
synonym: "haematopoietic cell" EXACT []
synonym: "hemopoietic cell" EXACT []
is_a: CL:0000000 ! cell

[Term]
id: GO:0005634
name: nucleus
namespace: cellular_component
def: "A membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. In most cells, the nucleus contains all of the cell's chromosomes except the organellar chromosomes, and is the site of RNA synthesis and processing. In some species, or in specialized cell types, RNA metabolism or DNA replication may be absent." [GOC:go_curators]
synonym: "cell nucleus" EXACT []
synonym: "horsetail nucleus" NARROW [GOC:al, GOC:mah, GOC:vw, PMID:15030757]
is_a: UBERON:0000061 ! anatomical structure
relationship: has_part UBERON:0000061 ! anatomical structure

[Term]
id: PATO:0001407
name: mononucleate
namespace: quality
def: "A nucleate quality inhering in a bearer by virtue of the bearer's having one nucleus." [Biology-online:Biology-online]
subset: cell_quality
subset: mpath_slim
subset: value_slim

[Term]
id: PATO:0010006
name: cell morphology
namespace: quality
def: "A quality of a single cell inhering in the bearer by virtue of the bearer's size or shape or structure." [https://orcid.org/0000-0002-7073-9172]
comment: Use this term for morphologies that can *only* inhere in a cell, e.g. morphological qualities inhering in a cell by virtue of the presence, location or shape of one or more cell parts.
property_value: http://purl.org/dc/terms/contributor https://orcid.org/0000-0002-7073-9172
creation_date: 2021-01-23T11:31:53Z

[Term]
id: UBERON:0000061
name: anatomical structure
namespace: uberon
def: "Material anatomical entity that is a single connected structure with inherent 3D shape generated by coordinated expression of the organism's own genome." [CARO:0000003]
synonym: "biological structure" EXACT []
synonym: "connected biological structure" EXACT [CARO:0000003]
is_a: UBERON:0000465 ! material anatomical entity
property_value: RO:0002175 NCBITaxon:33090
property_value: RO:0002175 NCBITaxon:33208
property_value: RO:0002175 NCBITaxon:4751

[Term]
id: UBERON:0000465
name: material anatomical entity
namespace: uberon
def: "Anatomical entity that has mass." [http://orcid.org/0000-0001-9114-8737]
is_a: UBERON:0001062 ! anatomical entity
property_value: RO:0002175 NCBITaxon:33090
property_value: RO:0002175 NCBITaxon:33208
property_value: RO:0002175 NCBITaxon:4751

[Term]
id: UBERON:0001062
name: anatomical entity
namespace: uberon
def: "Biological entity that is either an individual member of a biological species or constitutes the structural organization of an individual member of a biological species." [FMA:62955, http://orcid.org/0000-0001-9114-8737]
property_value: RO:0002175 NCBITaxon:33090
property_value: RO:0002175 NCBITaxon:33208
property_value: RO:0002175 NCBITaxon:4751

[Typedef]
id: bearer_of
name: has characteristic
namespace: external
def: "Inverse of characteristic_of" []
xref: RO:0000053
is_inverse_functional: true

[Typedef]
id: has_part
name: has part
namespace: external
def: "a core relation that holds between a whole and its part" []
subset: http://purl.obolibrary.org/obo/valid_for_go_annotation_extension
subset: http://purl.obolibrary.org/obo/valid_for_go_ontology
subset: http://purl.obolibrary.org/obo/valid_for_gocam
xref: BFO:0000051
is_transitive: true

Loading
Loading