Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional options for hasEvidenceItems #258

Open
mbrush opened this issue Jan 16, 2025 · 11 comments
Open

Additional options for hasEvidenceItems #258

mbrush opened this issue Jan 16, 2025 · 11 comments

Comments

@mbrush
Copy link
Contributor

mbrush commented Jan 16, 2025

Originally posted by @Mrinal-Thomas-Epic in #234 (comment):

For our use case, we're thinking it would be nice to also have simpler options for hasEvidenceItems. Specifically, it would be nice to be able to have the option to put in a Document (for references to PubMed articles) or a Coding (for references to external DBs). I can see an argument for wrapping the Document or Coding within another Statement, but think that could be more complicated than needed for this simple case

@mbrush
Copy link
Contributor Author

mbrush commented Jan 16, 2025

Responses from original post:

From korikuzma:

@Mrinal-Thomas-Epic
I can see an argument for wrapping the Document or Coding within another Statement, but think that could be more complicated than needed for this simple case

Is there a reason why you would not be able to / want to store Documents in EvidenceLine.reportedIn?

From mrinal-thomas-epic:

@korikuzma I didn't realize it was inheriting the reportedIn property from InformationEntity. I think that would work for Document, so the only ask is for being able specify a Coding.

For an example scenario, let's say a user is evaluating the pathogenicity of a variant using the ACMG 2015 guidelines. They look at gnomAD and finds that the BA1 criteria is satisfied. I would imagine that we would have a BA1 EvidenceLine that somehow references the gnomAD entry. It doesn't feel right to use an IRI reference if the gnomAD entry is outside of our system (which feels closer to a Coding).

@mbrush
Copy link
Contributor Author

mbrush commented Jan 16, 2025

@Mrinal-Thomas-Epic if I am understanding your example correctly, you want to report that you have a Pathogenicity Statement with an Evidence Line based on the BA1 ACMG criterion - and want to capture a gnomAD record as the evidence item for this Evidence Line (as opposed to representing the frequency data from that record explicitly using a CAFStudyResult object).

I would simply use the EvidenceLine.hasEvidenceItems attribute to capture an iriReference to the url of the gnomAD record, e.g.

  hasEvidenceItems: https://gnomad.broadinstitute.org/variant/1-55051215-G-GA?dataset=gnomad_r4

I don't think there is anything in the definition of the iriReference data type that suggests it cannot be used to capture IRIs from outside systems (e.g.a gnomAD url) - indeed I think this is a primary use case for this this data type is meant to be applied. If that is the case, we should add some clarifying language to this effect to the iriReference description here.

@mbrush
Copy link
Contributor Author

mbrush commented Jan 17, 2025

UPDATE (1-17-25):
It has since come to my attention that the iriReference data type is intended to be used as @Mrinal-Thomas-Epic assumed - to point to representations of va-compliant data objects within the system. It is not meant to point to urls of web pages that might, in this case, report the data the provider wants to say was used as evidence.

If Epic intends to stand up a service / system that stores and serves va-compliant data objects of the types that can be values of the hasEvidenceItems attribute (StudyResult, Statement, EvidenceLine), then an iriReference could be used to reference such data objects directly form the hasEvidenceItems attribute:

If not, another solution to support the gnomAD scenario is to create and send a CohortAlleleFrequencyStudyResult object that holds the relevant data from the gnomAD record. If a data provider did not want to ingest and transform gnomAD data to achieve this, they could create a minimal StudyResult object that includes only the required focusAllele, optionally a description, and then uses the isReportedIn attribute to capture the url of the web page for the gnomaD record. e.g.:

  hasEvidenceItems: 
     - type: CohortAlleleFrequencyStudyResult
       focusAllele:  ga4gh:VA.t0rDoiIessOWmP0SF0plhXtOwi8TRaZz
       description: "Allele frequency data from gnomad v4 about insertion 1-55051215-G-GA (GRCh38) from various populations"
       isReportedIn: https://gnomad.broadinstitute.org/variant/1-55051215-G-GA?dataset=gnomad_r4

In this way, the data provider has to do very little work and data consumers will be able to follow the provided url to navigate to the gnomAD page that shows the data items for this StudyResult.

@Mrinal-Thomas-Epic
Copy link

Mrinal-Thomas-Epic commented Jan 19, 2025

I think the second option could work for us as long as we don't add additional required properties to StudyResult or Statement. The only potential issue is that in some cases, the data provider might not want to make a statement about whether the external record is a StudyResult or Statement. More minor, but is there a way to discretely represent the system (e.g. gnomad) and version (e.g., v4) with this approach?

@Mrinal-Thomas-Epic
Copy link

I've also hit a couple other questions while trying to work with EvidenceLine:

  1. Does Statement need a reportedIn property if EvidenceLine has one? I also have the same question about other properties like specifiedBy (but maybe that can be a separate discussion).
  2. It looks like EvidenceItem can point to another EvidenceLine. When would you use this?

@mbrush
Copy link
Contributor Author

mbrush commented Jan 20, 2025

Several good observations and questions here @Mrinal-Thomas-Epic. I'll address each in a separate comment.


> the data provider might not want to make a statement about whether the external record is a StudyResult or Statement

This would be possible if we made InformationEntity a concrete class - which would let the data creator instantiate this and not have to commit to a more specific type. However I suspect @ahwagner will have concerns with this.

That said, the preferred outcome IMO is for the data creator to select the most appropriate subclass in cases like this, and I am optimistic that our documentation will allow them to do so. Do you anticipate it being a significant problem to require the data provider chooses an InformationEntity subtype to instantiate? @Mrinal-Thomas-Epic ? I If so, I would love to discuss further, and hear ideas about the kind of additional guidance we coudl provide to make this easier. .

@mbrush
Copy link
Contributor Author

mbrush commented Jan 20, 2025

> is there a way to discretely represent the system (e.g. gnomad) and version (e.g., v4) with this approach?

Yes. There is a 'sourceDataSet' attribute on StudyResult that takes a DataSet object (link), which includes attributes for this type of info.

The data example here illustrates this, and the relevant snippet is shown below:

Image

Does this address your need @Mrinal-Thomas-Epic?

@mbrush
Copy link
Contributor Author

mbrush commented Jan 20, 2025

> Does Statement need a reportedIn property if EvidenceLine has one?

The reportedIn attribute points to a document that reports the specific knowledge/information that is conveyed by the object from which it hangs. It is important to consider that the central information conveyed in a Statement is different than that conveyed in an Evidence Line that supports it.

reportedIn on a Statement would point to a document that reports the central knowledge expressed in the statement - e.g. a publication that concludes that `Variant X is not pathogenic for Disease Y'.

reportedIn on an EvidenceLine would point to a document that reports the central information expressed in the evidence line - e.g. that 'some set of allele frequency data represents stand- alone evidence disputing the pathogenicity of Variant X'. (Note that this type of information is usually generated in internal curation pipelines and rarely reported in public docs or pubs - so it will be less common to see the reportedIn attribute used in an EvidenceLine.)

If a situation arises where there is a document reports the central info conveyed in both a Statement and one of its supporting EvidenceLines, this document can be referenced separately using the reportedIn attribute in these two objects, if the data provider desires.

Does the above address your question/use case @Mrinal-Thomas-Epic?


As an aside, what is a more common use case for EvidenceLines is the desire to point directly to documents/sources reporting supporting 'evidence items', without having to represent those evidence items directly in the data The SEPIO model include a 'shortcut' property supporting exactly this - see the evidenceItemSources property here.

SEPIO also provides a shortcut property that allows data creators to point directly to sources of evidence from the root Statement itself - without the need to create either evidence lines or evidence items - see the hasEvidenceFromSources property here.

These are likely not immediately relevant here, but please keep in mind the availability of these types of shortcut properties in SEPIO - which can be brought into the VA-Spec model when the need arises. Happy to discuss this in more detail in another venue.

@mbrush
Copy link
Contributor Author

mbrush commented Jan 20, 2025

> It looks like EvidenceItem can point to another EvidenceLine. When would you use this?

I believe this was permitted because some of the more complex ClinGen curation pipelines around gene curation, and clinical actionability statements, had a use case for this type of thing. @larrybabb can provide details. However these models are not implemented yet, so it may make sense to remove this as a permissible type for the hasEvidenceItem attribute in v1.0.

@Mrinal-Thomas-Epic
Copy link

Mrinal-Thomas-Epic commented Jan 21, 2025

is there a way to discretely represent the system (e.g. gnomad) and version (e.g., v4) with this approach?
Yes. There is a 'sourceDataSet' attribute on StudyResult that takes a DataSet object (link), which includes attributes for this type of info.
...

Does this address your need @Mrinal-Thomas-Epic?

For StudyResults, yes. We also need something like that for statements. For example, when evaluating the ACMG criteria BP6 (Reputable source recently reports variant as benign but the evidence is not available to the laboratory to perform an independent evaluation), we want to include a reference to an external database (which isn't necessarily in GKS format).

@Mrinal-Thomas-Epic
Copy link

Mrinal-Thomas-Epic commented Jan 21, 2025

Does Statement need a reportedIn property if EvidenceLine has one?_

...

reportedIn on a Statement would point to a document that reports the central knowledge expressed in the statement - e.g. a publication that concludes that `Variant X is not pathogenic for Disease Y'.

In my mind, " a document that reports the central knowledge expressed in the statement" can refer to one of two use cases:

  • This document directly reports the knowledge. A curator created the statement to discretely represent the document (or part of it). I have no issue with this use case.
  • A curator is determining the classification for a variant. They're using their own judgement, but they also refer to one or more papers that may explicitly say the variant is pathogenic or not for the disease. IMO this use case should always put their documents in EvidenceLine.reportedIn, rather than Statement.reportedIn. Is that correct?

@ahwagner ahwagner moved this to Backlog in VA-Spec Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

2 participants