Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add draft docs for MP HP mappings #963

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add draft docs for MP HP mappings #963

wants to merge 1 commit into from

Conversation

matentzn
Copy link
Collaborator

This PR adds a documentation page that explains the where to find MP HP mappings.

This is very complicated - so many files. Please help ordering the chaos

@sbello @cmungall

@cmungall
Copy link
Member

I think this is a good description of the current state of things. But this is WAY too complicated for 99% of the intended consumers. Things such as whether HPO terms are subsumed by MPO terms is such an ontologists question. Most users just want mappings.

Can we not just integrate the files into a single SOT?

Think GAFs and evidence codes. If you want the GAF for a species you just download it. Most people just look at the key columns. Behind the scenes the GAF merges many different streams. Users who care to see it can see the provenance. Users who care to filter by evidence code can do that too.

Can we not just merge the SSSOMs? Or maybe into two files - high confidence, high recall.

| HP:0100624 | Corpus cavernosum sclerosis | | MP:0011528 | abnormal placental labyrinth villi branching morphogenesis | | UPHENO:0003055 | | | | | 2.416 | 0.214 | | | 0.720 |
| HP:0100624 | Corpus cavernosum sclerosis | | MP:0003205 | testicular atrophy | | UPHENO:0002682 | | | | | 5.394 | 0.191 | | | 1.015 |
| HP:0100624 | Corpus cavernosum sclerosis | | MP:0009256 | enlarged corpus epididymis | | UPHENO:0002523 | | | | | 3.686 | 0.234 | | | 0.930 |

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a different set of examples, could you find one that had in it an exact or nearly exact match plus more distant matches. Looking at this I would not want to present any of these as options to a user. At least swap out one of the examples for abnormal penis morphology (MP:0005187) which is the closest MP term I could find

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the assumption here that these are all using 'exact match' as the basis for the scores?

@sbello
Copy link

sbello commented Oct 31, 2024

@cmungall If you do want to merge the files I would create at least 2 one for those with explicit match types and one for those with confidence scores. Anyone wanting to import this is going to need to handle and parse these differently. At MGI we brought in the IMPC files but skipped the Pistoia ones as I was uncertain of what the match type really was or what a good cutoff for inclusion/exclusion would be.
For ease of upkeep I would still want a separate MGI file as that way I only need to worry about messing with my own data and not anyone else's
There is also the problem that each group has a distinct set of columns but that is the easiest thing to deal with. Maybe. I added a confidence score to the MGI manual file but I doubt that my confidence score is really the same as the confidence score in the Pistoia files or that a consumer of the file would want to treat those the same.

@matentzn
Copy link
Collaborator Author

matentzn commented Nov 1, 2024

We will definitely not merge the HP MP source files so they can be correctly curated - but we should offer a well documented merged product to the community with as much confidence values and metadata as possible!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants