Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip invalid lines when converting out of OBO #1039

Closed
cthoyt opened this issue Aug 5, 2022 · 4 comments
Closed

Skip invalid lines when converting out of OBO #1039

cthoyt opened this issue Aug 5, 2022 · 4 comments

Comments

@cthoyt
Copy link

cthoyt commented Aug 5, 2022

The Cellosaurus ontology contains many invalid lines, e.g. the following line has improperly escaped curly braces in the molecule's name:

comment: "Group: Patented cell line. Registration: International Depositary Authority, China Center for Type Culture Collection; CCTCC C2014222. Monoclonal antibody isotype: IgG1, kappa. Monoclonal antibody target: ChEBI; CHEBI:144925; 1-(4-methoxyphenyl)-2-{[4-(4-nitrophenyl)butan-2-yl]amino}ethanol (Phenylethylamine A)."

If you run robot convert -I https://ftp.expasy.org/databases/cellosaurus/cellosaurus.obo -o ~/Desktop/cellosaurus.json -vvv and look very carefully for the relevant error (for now, you have to search the output for org.semanticweb.owlapi.oboformat.OBOFormatOWLAPIParser - #1038 would be helpful for this), you find that:

LINENO: 29219 - Missing '=' in trailing qualifier block. This might happen for not properly escaped '{', '}' chars in comments.
LINE: comment: "Monoclonal antibody isotype: IgG2a, kappa. Monoclonal antibody target: ChEBI; CHEBI:144925; Phenylethylamine A (1-(4-methoxyphenyl)-2-{[4-(4-nitrophenyl)butan-2-yl]amino}ethanol)."        org.semanticweb.owlapi.oboformat.OBOFormatOWLAPIParser.parse(OBOFormatOWLAPIParser.java:60)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:220)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.actualParse(OWLOntologyManagerImpl.java:1254)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1208)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyManagerImpl.java:1165)
        org.obolibrary.robot.IOHelper.loadOntology(IOHelper.java:531)
        org.obolibrary.robot.IOHelper.loadOntology(IOHelper.java:417)
        org.obolibrary.robot.IOHelper.loadOntology(IOHelper.java:298)
        org.obolibrary.robot.CommandLineHelper.getInputOntology(CommandLineHelper.java:487)
        org.obolibrary.robot.CommandLineHelper.updateInputOntology(CommandLineHelper.java:585)

This ontology doesn't do its curation in an open source way so it's difficult to communicate and help solve this issue. Further, I downloaded the file and started making fixes one at a time, but I have to re-run robot convert on every step. It would be nice if there were a setting that allowed for invalid lines to be skipped on OBO parsing.

CC @AmosBairoch @lubianat

Update: this is the same underlying issue as ebi-chebi/ChEBI#4273

@matentzn
Copy link
Contributor

matentzn commented Aug 5, 2022

Hmm.. I think this is outside of the scope of ROBOT.. If you want this to happen you have to go through https://github.com/owlcs/owlapi/issues/ or join the #obo-format channel on OBO slack where @balhoff is currently thinking about prefix maps for OBO format and other fixes - he may be amenable to this. But a ROBOT issue per se this is not I don't think - if the raw data is broken, the tool cant be expected to deal with all eventualities, so I would simple run a grep -v on the OBO file prior to parsing. If you agree, can you close the issue?

@balhoff
Copy link
Contributor

balhoff commented Aug 5, 2022

This exact issue is a problem with the currently released ChEBI OBO file: ebi-chebi/ChEBI#4273

@matentzn
Copy link
Contributor

matentzn commented Aug 5, 2023

Rethinking this now: I could implement a "repair --obo-format" option that deals with the most frequent violations like multiple labels and multiple comments etc.. I would be open to this but it would have to be now!

@matentzn
Copy link
Contributor

matentzn commented Aug 5, 2023

Sorry, I now realise I discuss this here: #995 and that this (broken rows) is not possible at all right now without a major OWLAPI update.

This needs to be either added as an OWL API ticket, or oboformat.. https://github.com/owlcollab/oboformat/issues

I will close this now, as what ROBOT can do about this can be covered by #995

@matentzn matentzn closed this as not planned Won't fix, can't repro, duplicate, stale Aug 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants