-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare the downstream amino acid sequences using the shifted start position #576
Comments
This is certainly a strange situation. We have experimented on our side as well and the Downstream plugin for us consistently returns one leading amino acid. Before version 1.5.5 we were not stripping the leading amino acid in the downstream sequence and were getting wrong predictions that were doubling this leading amino acid. Please see #484 for a bug report detailing this. To me this makes it seem that this is indeed a problem with the Downstream plugin returning inconsistent output. I wonder if this is a GRCh37 vs GRCh38 issue? Both the user in the above issue as well as our pipelines use GRCh38. |
I made issue Ensembl/VEP_plugins#342 in the VEP plugins repo and hope that someone on their end can shed some light on what's going on. |
The response from Ensembl/VEP_plugins#342 seems to suggest that this difference might be due to using the |
With
Without
Well, without |
Thank you so much for testing this. This is very helpful. I will report this back to the folks at Ensembl. |
In the meantime, you can run pVACseq with VEP 100 annotations as long as you don't use the |
Hi, |
Unfortunately, there isn't currently a tool to accomplish this although it has been on my to-do list. There is an open issue in the VAtools repo to implement such a tool. Please feel free to follow that issue to receive notifications for this task. |
@stekaz would you be able to attach the full Edit: I was able to reproduce your VCFs with the instructions you provided. |
has this been solved in the vep release100 of grch37? |
I don't think it has as per the latest comment on Ensembl/VEP_plugins#342. Leaving off the |
This problem has been resolved in the newly released pVACtools version 2.0. In this version, we switched from using the @stekaz If you would be able to test this new version with your VCF to confirm that it works correctly for you now, that would be appreciated. |
This looks really good @susannasiebert. So far, I've just run VEP as before but this time with the new Frameshift plugin. I've pasted a view of the data below, showing the 'WildtypeProtein' sequence (first line) followed by the 'FrameshiftSequence' (second line) for each consequence annotation (newline separated):
|
Running pvacseq on the test vcf above:
I will try now the new version on the full set of VCFs and compare with what I had previously. This might take a couple days. |
@stekaz thank you for testing it out. We're looking forward to seeing the results from your tests whenever you have them. |
Just had a look at the output for our limited number of samples and I think they look ok. It looks like the peptide sequences for the frameshift variants are coming through correctly too. At least, we didn't hit the above exception so I'm calling that a win. Thanks again for fixing this one @susannasiebert cheers. |
Describe the bug
This is similar or identical to #571, where we hit this exception when running
pvacseq
. Although we did not perform the initial processing (read alignment, variant calling, etc) we're confident that the reference that was used is the 1KG hs37: human_g1k_v37. Attached is a test.vcf that should reproduce the issue. As you can see, included in the header (unadulterated from one of the original files) is a pointer to the reference used.I annotated our VCFs using VEP v100.2 using the 100_GRCh37 cache. I have investigated a number of variants and think that this issue is related to how the downstream amino acid sequence is reported by the Downstream plugin. This plugin (v2.3) uses the shifted start position to ensure that only mutant amino acid sequences are provided. I have experimented using the unshifted start position to include the 5' wildtype amino acid(s) as required by pVACseq. This commit/branch resolves the issue, but unfortunately, I don't think altering the Downstream plugin is the right place to make the change (i.e. the plugin appears to work as expected, which is to provide just the mutant amino acids). Instead, I think we need to ensure that pVACseq uses the shifted start position when comparing the wildtype amino acid sequence.
To Reproduce
Log Output
Expected Behavior
Ensure the wildtype amino acid sequence is shifted correctly to reflect the amino acid change.
The text was updated successfully, but these errors were encountered: