-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Alternative symbol" synonyms & "formerly" updates #132
base: main
Are you sure you want to change the base?
Conversation
259e0d9
to
48fc895
Compare
9ef0d72
to
8a07fa3
Compare
8a07fa3
to
830fbcf
Compare
830fbcf
to
92780ed
Compare
@joeflack4 do you plan to update this to use |
omim2obo/main.py
Outdated
graph: Graph, source: URIRef, prop: URIRef, target: Union[Literal, str, URIRef], | ||
anno_pred_vals: List[Tuple[URIRef, Union[Literal, str, URIRef]]] | ||
): | ||
"""Add an axion annotation to the graph.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: 'axion'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
) -> Tuple[List[str], List[str], List[str], List[str]]: | ||
"""Separate current title/symbols from deprecated (marked 'former') ones""" | ||
former_titles = [x for x in titles if ', FORMERLY' in x.upper()] | ||
former_symbols = [x for x in symbols if ', FORMERLY' in x.upper()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to apply this to symbols that are in the "Alternative Title(s); symbol(s)" column since 'FORMERLY' is used with symbols too, e.g. https://www.omim.org/entry/606417?search=606417&highlight=FORMERLY
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, it looks like I only made this change to "alternative titles/symbols" actually, satisfying:
But this does not satisfy also doing this for "included" as for:
I'm really surprised I didn't do that. I thought I had!
- @joeflack4 Deal with "formerly" "included" entries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, one reason I hadn't done this is because of:
For now I'm adding them as mondo#omim_included
. I don't think we can mark them as a synonym type unless we start doing that with "included titles" as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Complete. Now we are processing all of the "former" entries:
- alt titles
- alt symbols
- included titles
- included symbols
titles2 = [remove_included_and_formerly_suffixes(x) for x in titles] | ||
symbols2 = [remove_included_and_formerly_suffixes(x) for x in symbols] | ||
# additional reformatting for titles | ||
titles2 = [cleanup_label(x) for x in titles2] # todo: is this redundant? gets done when adding synonyms to graph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to sort out this answer and remove the #todo
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it's not really this line that's redundant, it's several places further down in the process.
This became more involved than I anticipated. I did it as a separate PR:
Even though this line titles2 = [cleanup_label(x) for x in titles2]
is new, those changes really aren't in scope for this PR, so I think we can address that PR separately and merge it into main later.
omim2obo/main.py
Outdated
if inc_titles_str: | ||
included_titles, included_symbols = parse_title_symbol_pairs(inc_titles_str) | ||
included_titles, included_symbols = clean_alt_and_included_titles(included_titles, included_symbols) | ||
included_is_included = included_titles or included_symbols # redundant. can't be included symbol w/out title |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you doing this so you don't have to check for an empty list to see if the entry has any values in the "Included Title(s); symbols" column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naw. The reason I did this is to make the code clearer. Didn't want to come across the code just checking for "included titles" and have someone thing "well why isn't it checking for included symbols?".
Come to think of it, another approach is that I could remove this line. And instead, further down, I could potentially replace if included_is_included
with something like if included_titles: # fyi: only need to check for included titles because there can't be any included symbols unless there are included titles
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is a somewhat oddly named variable and duplicates information, included_titles
, that already exists in this case an could be used in that later check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
omim2obo/main.py
Outdated
graph.add((omim_uri, RDFS['comment'], Literal(included_detected_comment))) | ||
for included_label in cleaned_inc_labels: | ||
graph.add((omim_uri, URIRef(INCLUDED_URI), Literal(label_cleaner.clean(included_label, abbrev)))) | ||
# - exact titles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it would be more clear if you maintained the omim terminology of Preferred or Alternative since there is not an "exact title"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I would agree with you.
Actually, AFAIK I've dropped the previous "exact" nomenclature, which I think was referencing "preferred" previously, everywhere in the codebase.
What 'exact' means here is 'exact synonyms: titles'.
Below it I have 'related titles', 'exact abbreviations', 'related, deprecated 'former' titles', and 'related, deprecated 'former' abbreviations'.
If you want, I can update all these comments to say "exact/related synonyms".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you are not going to use the terminology from the mimTitles file, then document somewhere what "exact titles" is, for example the value from the "Preferred Title; symbol" column for the "Preferred Title" after processing and now being added as an exact synonym or 'exact synonyms: titles' or the processed title value added as an exact synonym
This kind of documentation should be done for each of the types below, e.g. exact abbreviations, etc.
It's easy to read that the information is being added as an exact synonym for example, but since "exact title" is something made up and there is a lot of processing of the value/variable (pref_title) being added here there is a fair bit of backtracking one needs to do to sort out what the original value from the file is being added here since there nothing in the source file named "exact title"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I successfully communicated earlier. The comment isn't inspired by the previously named variables. I've since refactored all the variable names to be in sync with the OMIM data model.
The 'exact' refers to 'exact synonym' I mean to say that this is a section for "any title (non symbol) that is an exact synonym". Easier to see if you look at this line with the lines around it:
# Add synonyms
# - exact titles
graph.add((omim_uri, oboInOwl.hasExactSynonym, Literal(label_cleaner.clean(pref_title, pref_abbrev))))
In any case, I have refactored these comments! They will be easier to read now and will not cause any confusion.
omim2obo/main.py
Outdated
for title in former_alt_titles: | ||
clean_title = Literal(label_cleaner.clean(title, pref_abbrev)) | ||
add_triple_and_optional_annotations(graph, omim_uri, oboInOwl.hasRelatedSynonym, clean_title, | ||
[(OWL.deprecated, Literal(True))]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think that the intent was to add owl:deprecated
as an annotation on the synonym annotation. I have a question into Sabrina to confirm this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, OK. Right now I can't think of anything else I would deprecate. I wouldn't want to deprecate the whole term, since non-deprecated MIMs can have both deprecated and non-deprecated synonyms.
- @twhetzel to discuss "formerly" title/symbol deprecation with Sabrina
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a need to add owl:deprecated
to anything here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I'm basing this change off of:
In that issue it links to the discussion with Nicole where she talks about this being their standard practice.
Example:
synonym: "juvenile carcinoma (formerly)" RELATED DEPRECATED [GARD:0009408]
This change makes it so that this process is automated for them.
edit: FYI @twhetzel you also asked me to make this change it looks like: comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's confirm with @sabrinatoro which of the suggestions to follow since the comments in that issue from Sabrina and Nicole are not in agreement.
omim2obo/main.py
Outdated
for abbrevs in [pref_symbols, alt_symbols]: | ||
for abbreviation in abbrevs: | ||
add_triple_and_optional_annotations(graph, omim_uri, oboInOwl.hasExactSynonym, Literal(abbreviation), | ||
[(OBOINOWL.hasSynonymType, MONDONS.abbreviation)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs updating from other decisions, e.g. oboInOwl
and use the OMO abbreviation property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twhetzel For those types of situations, my solution is usually (a) if I merge those first, I merge main
back into this PR, or (b) if I merge this first, then when those get merged, those changes make it into main
anyhow.
Usually I opt for 'a' for best results.
But sometimes it is better to be duplicative about this stuff. Builds do run off of main
after all. Actually, one just ran and released, but I deleted it because it did not include this OMO bug fix.
Anyway I'll go ahead and propagate those changes to this PR, perhaps just by merging those feature branches into it now.
Great review btw! Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and merged in main
(after merging in the other PRs you approved) as well as:
So now those changes have propagated here.
I also added this warning to the OP:
Warning
Merging this will also merge the following PR, so it should be completed first:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- update this branch based on past discussions
- blocked until question about owl:deprecated is confirmed, typically this is only added to classes
- other comments inline
- Update: Comments to improve clarity - Delete: redundant variable 'included_is_included', and refactored / recommented around that change.
- Formerly: synonym pred: related -> exact - Formerly: owl:drepecated synonym annotation -> MONDO:omim_formerly
b509332
to
112a19f
Compare
- Update: MONDO: OMIM properties to the correct spelling.
- Add: a comment
…d-props-rename "Alternative symbol" synonyms & "Formerly" - props rename
OMIM_FORMERLY
updates
OMIM_FORMERLY
updates
Warning
included
/formerly
refactor #167oio:exactSynonym
&mondo#abbreviation
#139, INCLUDED
#141, FORMERLY
#137oio:exactSynonym
&MONDO:omim_formerly
#138Google drive folder with outputs and diff.
Changes
Add alt & included symbol synonyms
'Draft' status
included
/formerly
refactor #167, which I think has important ramifications and should be merged first and then rebased/merged into this PR.