Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling variants with common prefixes between GENIE and public GN #260

Open
rmadupuri opened this issue Aug 21, 2023 · 1 comment
Open

Comments

@rmadupuri
Copy link

rmadupuri commented Aug 21, 2023

A few variants successfully annotate when pointed to the public GN (https://www.genomenexus.org/) but are failing when pointing to the Genie GN (https://genie.genomenexus.org/). The variants are passed in region format for genie and hgvsg format for public and the variants with common prefixes are handled differently in each case.

Below are a few examples that pass annotation when pointed to public but fail when pointed to genie site.
test_failed_variants.txt

@leexgh
Copy link
Member

leexgh commented Aug 29, 2023

I made a flowchart to show why there are some variants failing on GENIE genome nexus but can annotate successfully on public genome nexus (https://lucid.app/lucidchart/1424c03a-ec63-4b51-99ef-a3fcab1a600e/edit?viewport_loc=391%2C100%2C2208%2C1159%2C0_0&invitationId=inv_edc74010-8a66-4178-9ac8-0187072a9ebd).

Basically it's because the genomic coordinate doesn't match with the length of the reference allele. Besides insertion, all other types of variants should have the length of start to end equal to the length of reference allele (insertion variants should have end = start + 1). When we do annotation validation, we compare the given reference allele with annotated reference allele. So if the given genomic coordinate doesn't match the length of the reference allele, we will get annotated reference alleles based on the given genomic coordinate, it will be either longer or shorter so it won't be the same as the given reference allele.

There is a corner case on public genome nexus for only one reference allele and wrong end position, e.g. 3,183210442,183210443,C,CT, when we create follow-up query to validate the annotation, the follow-up query is created in hgvs format (3:g.183210442del ), which doesn't include wrong end position information in the query, so it could pass validation.

The solution I would propose is to harmonize the genomic location genome-nexus/genome-nexus#701, it could also solve some other problems like missing end position.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants