Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[minicaller] alt alle frequency calculation #249

Open
tingchenlrx opened this issue May 29, 2024 · 0 comments
Open

[minicaller] alt alle frequency calculation #249

tingchenlrx opened this issue May 29, 2024 · 0 comments

Comments

@tingchenlrx
Copy link

tingchenlrx commented May 29, 2024

I ran an older version of minicaller (version 6d7e78c) over a bam file and it output a VCF file with a selected multi-allelic variant shown below:

cusRef 24 . G A,C,T . . AC=1,1,1;AF=0.250,0.250,0.250;AN=4;DP=164410 GT:DP:DP4:DPG 0/3/2/1:164410:161365,259,2785,1:161624,1356,977,452

From the DPG field above, I calculated the alt allele frequency (AF) this way:
Alt allele T AF = 1356/(161624+1356+977+452)
Alt allele C AF = 977/(161624+1356+977+452)
Alt allele A AF = 452/(161624+1356+977+452)

I then ran the latest version of minicaller (version 69ca18e) over the same bam file. (I was able to resolve the "too many open files" error message by increasing the value in maxRecordsInRam. Thanks so much!)
In order to obtain all the variants, I turned off two filters by setting:

  • --bad-ad-ratio 1
    In this case, it satisfies 1< ALT/(REF+ALT) < 0 so that no genotypes will be filtered
  • --gt-fraction 0
    It satisfies ALT/(REF+ALT) < 0 so again no genotypes will be ignored.
    In addition I set --min-gt-allele-depth 10 and --min-gt-depth 10.

I then looked into the variant in the same position, and here's the variant detected by the new version:

cusRef 24 . G T 38 . AC=1;AF=0.500;AN=2;DP=324605 GT:AD:DP:FT:GQ 0/1:161624,1356:162981:LowQual:38

My questions are:

(1) In the variant from the new version of minicaller, there is only one variant (G->T), whereas there are 3 variants (G->A,C,T) from the old version. Looks like the new version just selected the alt allele with the highest read counts. Can you please explain why?

(2) In the variant from new version, is it okay for me to calculate the alt allele frequency using the AD field this way?
Alt allele T AF = 1356/(161624+1356)

Our environment

  • the latest version of jvarkit (69ca18e)
  • openJDK Java Version 22
  • REHL9

I apologize for a long post, but thank you so much for your attention!

Best,
Ting

@tingchenlrx tingchenlrx changed the title [minicaller] [minicaller] alt alle frequency calculation May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant