-
Notifications
You must be signed in to change notification settings - Fork 29
Filter Clinvar RCV records based on Review Status / number of stars #216
Comments
@cmungall wondered if we maybe should filter Clinvar records by stars during ingestion as opposed to filtering via the UI, @kshefchek thoughts? |
I'll pass that question along to @mellybelly |
ah okay, @mellybelly let me know what your thoughts are |
I think we need a joined up strategy we can all agree on. If there are use cases for including lower quality CV then let's include all and dynamically filter. But remember filters are always leaky (how will people using our API, our TSVs, our RDF dumps, our Neo4J, our Solr control filtering? Do we have this well documented?) and implementing the logic of the filters ubiquitously will take some amount of this; so the conservative option is not to include anything we perceive to of low Q to start with, and gradually introduce this allowing sufficient time for testing of filters and APIs etc to ensure non-leakiness |
I continue to think that we need to understand the data before we put them on our website. In principle, there should not be many cases where ClinVar genuinely adds something that OMIM does not have in the Mendelian realm. Before we talk about the technical solution, can we please investigate what the data is telling us? From my experience so far, I would say we should add a ClinVar-only disease attribution only following some degree of human intervention. |
@pnrobinson here's a little data from some command-line hacks on the latest Clinvar release XML.
So, short answer, there are 15,321 current ClinVar RCV records with no OMIM XRef. As you might expect, most are these are low quality (0 stars), but a small number are pretty high quality. |
@justaddcoffee curious what is considered an OMIM xref, a SCV from OMIM? eg SCV000032832 from https://www.ncbi.nlm.nih.gov/clinvar/RCV000012597.32/ or a medgen disease ID with an omim xref? |
Does this mean that there is no OMIM entry under condition? Can you provide a few examples? |
I'm possibly being a little too inclusive in my definition of OMIM xref here - basically I'm counting those RCV records in which I see |
yes I think the MIM xrefs are just identifier xrefs and may not mean there's evidence from OMIM, using the SCV to filter evidence from OMIM would be better. |
That's right - no OMIM entry, at least not in the XML.
This Clinvar record for example comes up - no OMIM xref, and multiple submitters (2 stars) Same here, no OMIM xref, but this one was reviewed by an expert panel (4 stars):
Okay, if we consider only SCV OMIM xrefs, there would be many more of these Clinvar records with no OMIM xrefs |
are there any cases where the relationship is pathogenic, either 3 or 4 stars, and no evidence from OMIM? EDIT: here are a few from poking around their UI https://www.ncbi.nlm.nih.gov/clinvar/RCV000038535.3/ |
Well, this is just what I mean -- we really need to understand the data
The problems we were seeing related to a ClinVar entry that pointed to the "wrong" OMIM entry. We need a list of ClinVar entries that point to OMIM entries that have a D2G relation that is not in OMIM, and we need to see if any of those are correct or at least not obviously wrong. |
In order to prevent low quality variant -> disease associations, we probably can/should filter RCV records based on review status/number of stars.
(I've converted to a separate ticket from a comment on this ticket per discussion with @monicacecilia @kshefchek )
This requires at least these things:
Ingesting review status/stars, via this PR
https://www.ncbi.nlm.nih.gov/clinvar/docs/review_status/
SEPIO terms for Clinvar review status (i.e. this ticket) @mbrush
(Item 1) above is blocked by this item)
possibly/probably some UI magic from @iimpulse to do the actual filtering, after the review status is ingested and present in a new data release
The text was updated successfully, but these errors were encountered: