Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OboTermIncompleteError when processing mondo in specific directory #496

Open
kmanpearl opened this issue Jul 31, 2024 · 2 comments
Open

Comments

@kmanpearl
Copy link

Since I use multiple versions of the same datasets processed with different levels of label redundancy, I have multiple obnb data directories to store the processed data. When trying to load mondo within one directory using dat = MondoDiseaseOntology(root='../data/obnb/FullyRedundant') it works as expected. However when running dat = MondoDiseaseOntology(root='../data/obnb/NonRedundant') I got the following error. Not sure how to troubleshoot or what could be causing this. I also tried running MondoDiseaseOntology(root='../data/obnb/NonRedundant', redownload=True, reprocess=True) but got the same error so I assume its related to something in the directory.

OboTermIncompleteError                    Traceback (most recent call last)
Cell In[2], [line 1](vscode-notebook-cell:?execution_count=2&line=1)
----> [1](vscode-notebook-cell:?execution_count=2&line=1) non = MondoDiseaseOntology(root='../data/obnb/NonRedundant')

File [~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/ontology/mondo.py:16](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/ontology/mondo.py:16), in MondoDiseaseOntology.__init__(self, root, xref_prefix, **kwargs)
     [14](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/ontology/mondo.py:14) def __init__(self, root, xref_prefix=None, **kwargs):
     [15](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/ontology/mondo.py:15)     """Initialize MondoDiseaseOntology data object."""
---> [16](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/ontology/mondo.py:16)     super().__init__(root, xref_prefix=xref_prefix, **kwargs)

File [~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/ontology/base.py:31](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/ontology/base.py:31), in BaseOntologyData.__init__(self, root, xref_prefix, branch, **kwargs)
     [29](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/ontology/base.py:29) self.xref_prefix = xref_prefix
     [30](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/ontology/base.py:30) self.branch = branch
---> [31](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/ontology/base.py:31) super().__init__(root, **kwargs)

File [~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/base.py:101](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/base.py:101), in BaseData.__init__(self, root, version, redownload, reprocess, retransform, log_level, pre_transform, transform, cache_transform, download_cache, gene_id_converter, **kwargs)
     [98](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/base.py:98)     self._download_archive()
     [99](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/base.py:99)     self._process()  # FIX:
--> [101](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/base.py:101) self.load_processed_data()
    [102](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/base.py:102) self._apply_transform(transform)

File [~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/ontology/base.py:59](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/ontology/base.py:59), in BaseOntologyData.load_processed_data(self, path)
     [57](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/ontology/base.py:57) self.plogger.info(f"Load processed annodataion {path}")
     [58](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/ontology/base.py:58) ont = OntologyGraph(logger=self.plogger)
---> [59](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/ontology/base.py:59) self.xref_to_onto_ids = ont.read_obo(path, xref_prefix=self.xref_prefix)
     [60](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/data/ontology/base.py:60) self.data = ont if self.branch is None else ont.restrict_to_branch(self.branch)

File [~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:390](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:390), in OntologyGraph.read_obo(self, path, xref_prefix)
    [388](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:388) xref_to_term_id = defaultdict(set)
    [389](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:389) with open(path) as f:
--> [390](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:390)     for term in self.iter_terms(f):
    [391](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:391)         term_id, term_name, term_xrefs, term_parents = term
    [393](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:393)         self.add_node(term_id, exist_ok=True)

File [~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:314](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:314), in OntologyGraph.iter_terms(fp)
    [312](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:312) for _, stanza_lines in groups:
    [313](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:313)     if next(stanza_lines).startswith("[Term]"):
--> [314](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:314)         yield OntologyGraph.parse_stanza_simplified(stanza_lines)

File [~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:365](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:365), in OntologyGraph.parse_stanza_simplified(stanza_lines)
    [362](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:362)         term_parents.append(strip_key(line, key))
    [364](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:364) if term_id is None or term_name is None:
--> [365](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:365)     raise OboTermIncompleteError
    [367](https://vscode-remote+ssh-002dremote-002bv100.vscode-resource.vscode-cdn.net/mnt/research/compbio/krishnanlab/projects/net2onto/notebooks/~/anaconda3/envs/net2onto/lib/python3.10/site-packages/obnb/graph/ontology.py:367) return term_id, term_name, term_xrefs, term_parents

OboTermIncompleteError:
@falquaddoomi
Copy link
Collaborator

falquaddoomi commented Aug 21, 2024

Hey @kmanpearl, just FYI I was able to replicate the issue when running dat = MondoDiseaseOntology(root='../data/obnb/FullyRedundant').

As far as I can tell the issue is occurring because the current version of mondo.obo includes the following stanza that's missing a name field:

[Term]
id: CHEBI:36684
is_obsolete: true
replaced_by: CHEBI:17792

The current implementation will throw OboTermIncompleteError if either the ID or name is missing.

I presume it's not occurring on your end for that particular folder, '../data/obnb/FullyRedundant', because you downloaded it back when it didn't contain the stanza above that's currently breaking it. When you try to use a new folder, e.g. '../data/obnb/NonRedundant', it downloads the new dataset and breaks.

Again just FYI, I'm currently looking into which, if any, of the following options make sense:

  1. to skip entries with a missing ID or name,
  2. perhaps use the ID as the name if the ID's there and the name isn't, or
  3. in this case to use the replaced_by field to reference a different entity, i.e., CHEBI:17792, rather than trying and failing to use the incomplete CHEBI:36684 entry.

@kmanpearl
Copy link
Author

@falquaddoomi I'm adding this here as well for continuity - not sure if this changes how you address the problem but the term CHEBI:36684 is originally from the Chemical Entities of Biological Interest Ontology. I am not exactly sure how mondo uses these other ontology nodes, but for the purpose of disease classification only nodes begining with MONDO: are actually relevant to the task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants