Replies: 5 comments 16 replies
-
Hi @mfasold, you'll need to find a way to add in the reference base to the VCF in order to get the matching working. You may be able to do this using the reference dataset we provide (because all variants will inevitably have to intersect that if you would like to use |
Beta Was this translation helpful? Give feedback.
-
Thank you for your reply! Please allow me to give a bit more detail on my toy example. We are looking here at SNP rs1801282 which has the REF allele C, and (according to dbSNP) the ALT alleles G and T. In the scorefiles for my example trait PGS000033 (PGS000033_hmPOS_GRCh38.txt.gz and scorefiles.txt.gz) the effect_allele is C and the other_allele is G. Since my original VCF did not contain hom-ref entries, I did variant calling with the GATK haplotype caller together with the positions from the scorefile to create a special GVCF that contains the explicit hom-ref call for all positions in the scorefile. For our SNP rs1801282, that gave the entry Note that I here focus on the “difficult” case, the homozygous reference SNPs, as other variants with a non-empty ALT allele will match, even with the original VCF. So I am already aware that the homref variants need to be included in the scoring and a special GVCF input file is needed for pgsc_calc. As mentioned, I did this and received an output where the two homref variants still did not match - see the *log.csv.gz file: So when the reference base is already contained in the (G)VCF, do you mean I should add the other_allele to the (G)VCF? (And yes, I would like to use –run_ancestry later.) Because this is what I did by adding the G manually in the ALT column of the GVCF. Now the example SNP is not unmatched anymore (but also not matched like the others): This seems not the way to go. Where should I add the reference base? And what do refer to with the reference dataset? |
Beta Was this translation helpful? Give feedback.
-
To my knowledge, it doesn't matter what the alternate allele is for PRS calculation as long as the sample doesn't have that allele. [Edit: this is wrong.] Here is something I have tried before. I would like to know if this is ok to do for pgsc_calc:
|
Beta Was this translation helpful? Give feedback.
-
Do you have a suggestion on how to best obtain a Super-Scoring-File: it should contain, for my genome build hg38, all the variants (with chr, pos, effect allele, other allele) for all possible scores in the catalog? The idea, of course, would be to run the variant calling and VCF preparation once, so that afterward I will be able to run pgsc_calc on any list of pgs_id's that I'm interested in. I saw that there is the |
Beta Was this translation helpful? Give feedback.
-
I went ahead with adding artificial ALT alleles to my VCF file which correspond to the effect alleles in the scoring file. I find several cases where the variant does not match, opposed to my expectation. Here are two examples - first the VCF entry and then the output from the match log:
Any ideas what could be the reason? |
Beta Was this translation helpful? Give feedback.
-
Hi! I would like to calculate PGS using DNA variants from a single WGS sample. I processed my data according to this discussion thread linked in the docs.
Is there a possibility to match homref variants when no ALT is given in the input VCF file?
Here is an example entry from my GVCF file that I obtained using the suggested processing steps with GATK:
chr3 12351626 . C . . . DP=40 GT:AD:DP:RGQ 0/0:40:40:99
My scorefile contains the following entry for this location:
3 12351626 C G 0.0453 additive False PGS000033_hmPOS_GRCh38 3
So the effect allele corresponds to the allele in my GVCF. Contrary to my expectation, the variant is not matched. It is however matched, if I manually put in G as ALT allele in the GVCF (replacing the "."). Since the genotype is 0/0, shouldn’t it be matched no matter what allele is in the ALT column?
Beta Was this translation helpful? Give feedback.
All reactions