Skip to content
This repository has been archived by the owner on Oct 31, 2024. It is now read-only.

Filter Clinvar RCV records based on Review Status / number of stars #216

Open
justaddcoffee opened this issue Sep 30, 2019 · 13 comments
Open

Comments

@justaddcoffee
Copy link
Member

In order to prevent low quality variant -> disease associations, we probably can/should filter RCV records based on review status/number of stars.

(I've converted to a separate ticket from a comment on this ticket per discussion with @monicacecilia @kshefchek )

This requires at least these things:

  1. Ingesting review status/stars, via this PR
    https://www.ncbi.nlm.nih.gov/clinvar/docs/review_status/

  2. SEPIO terms for Clinvar review status (i.e. this ticket) @mbrush
    (Item 1) above is blocked by this item)

  3. possibly/probably some UI magic from @iimpulse to do the actual filtering, after the review status is ingested and present in a new data release

@justaddcoffee
Copy link
Member Author

justaddcoffee commented Sep 30, 2019

@cmungall wondered if we maybe should filter Clinvar records by stars during ingestion as opposed to filtering via the UI, @kshefchek thoughts?

@kshefchek
Copy link
Contributor

@cmungall wondered if we maybe should filter Clinvar records by stars during ingestion as opposed to filtering via the UI, @kshefchek thoughts?

I'll pass that question along to @mellybelly

@justaddcoffee
Copy link
Member Author

@cmungall wondered if we maybe should filter Clinvar records by stars during ingestion as opposed to filtering via the UI, @kshefchek thoughts?

I'll pass that question along to @mellybelly

ah okay, @mellybelly let me know what your thoughts are

@cmungall
Copy link
Member

I think we need a joined up strategy we can all agree on. If there are use cases for including lower quality CV then let's include all and dynamically filter. But remember filters are always leaky (how will people using our API, our TSVs, our RDF dumps, our Neo4J, our Solr control filtering? Do we have this well documented?) and implementing the logic of the filters ubiquitously will take some amount of this; so the conservative option is not to include anything we perceive to of low Q to start with, and gradually introduce this allowing sufficient time for testing of filters and APIs etc to ensure non-leakiness

@pnrobinson
Copy link
Member

I continue to think that we need to understand the data before we put them on our website. In principle, there should not be many cases where ClinVar genuinely adds something that OMIM does not have in the Mendelian realm. Before we talk about the technical solution, can we please investigate what the data is telling us? From my experience so far, I would say we should add a ClinVar-only disease attribution only following some degree of human intervention.

@justaddcoffee
Copy link
Member Author

@pnrobinson here's a little data from some command-line hacks on the latest Clinvar release XML.

Current RCV records: 780218
Current RCV records with OMIM: 764897
Current RCV records with no OMIM: 15321
no_assertion_criteria_provided (0 stars): 14311
no_assertion_provided (0 stars): 455
criteria_provided_single_submitter (1 star): 542
criteria_provided_multiple_submitters (2 stars): 1
reviewed_by_expert_panel (4 stars): 12

So, short answer, there are 15,321 current ClinVar RCV records with no OMIM XRef. As you might expect, most are these are low quality (0 stars), but a small number are pretty high quality.

@kshefchek
Copy link
Contributor

@justaddcoffee curious what is considered an OMIM xref, a SCV from OMIM? eg SCV000032832 from https://www.ncbi.nlm.nih.gov/clinvar/RCV000012597.32/

or a medgen disease ID with an omim xref?

@pnrobinson
Copy link
Member

Does this mean that there is no OMIM entry under condition? Can you provide a few examples?

@justaddcoffee
Copy link
Member Author

I'm possibly being a little too inclusive in my definition of OMIM xref here - basically I'm counting those RCV records in which I see <XRef Type="MIM" DB="OMIM"/>. Glad to do something more sophisticated if you guys like

@kshefchek
Copy link
Contributor

yes I think the MIM xrefs are just identifier xrefs and may not mean there's evidence from OMIM, using the SCV to filter evidence from OMIM would be better.

@justaddcoffee
Copy link
Member Author

justaddcoffee commented Oct 1, 2019

Does this mean that there is no OMIM entry under condition?

That's right - no OMIM entry, at least not in the XML.

Can you provide a few examples?

This Clinvar record for example comes up - no OMIM xref, and multiple submitters (2 stars)
https://www.ncbi.nlm.nih.gov/clinvar/variation/160935/

Same here, no OMIM xref, but this one was reviewed by an expert panel (4 stars):
https://www.ncbi.nlm.nih.gov/clinvar/RCV000211148.1/

yes I think the MIM xrefs are just identifier xrefs and may not mean there's evidence from OMIM, using the SCV to filter evidence from OMIM would be better.

Okay, if we consider only SCV OMIM xrefs, there would be many more of these Clinvar records with no OMIM xrefs

@kshefchek
Copy link
Contributor

kshefchek commented Oct 1, 2019

are there any cases where the relationship is pathogenic, either 3 or 4 stars, and no evidence from OMIM?

EDIT: here are a few from poking around their UI

https://www.ncbi.nlm.nih.gov/clinvar/RCV000038535.3/
https://www.ncbi.nlm.nih.gov/clinvar/RCV000076836.3/
https://www.ncbi.nlm.nih.gov/clinvar/RCV000490609.4/

@pnrobinson
Copy link
Member

Well, this is just what I mean -- we really need to understand the data

  1. https://www.ncbi.nlm.nih.gov/clinvar/variation/160935/ -> This is a CNV
  2. https://www.ncbi.nlm.nih.gov/clinvar/RCV000211148.1/ -> this is a pharmacogenetic association.

The problems we were seeing related to a ClinVar entry that pointed to the "wrong" OMIM entry.

We need a list of ClinVar entries that point to OMIM entries that have a D2G relation that is not in OMIM, and we need to see if any of those are correct or at least not obviously wrong.
In the future, we may also want to represent CNVs and pharmacogenetic associations, but I would suggest we table that until the problems with the UI are fixed!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants