Coverage of modekit pileup results does not match the coverage displayed on IGV #385

rania-o · 2025-02-25T14:10:12Z

Hello,

I used modkit pileup with a bam basecalled with dorado model [email protected] and called for m5C, m6a, pseudou and inosine.
I have two questions :

The first one is why does modkit report positions as modified when the fraction of modified base is 0 ?
The second is about the valid coverage or the score, for one position on IGV I have more than 7500 reads, but in the results modkit reports only 39 for the coverage, why this big difference ? I also have the case where modkit report ~2900 while on IGV I have only 800 reads.

Thanks in advance for you reply,
Rania

ArtRand · 2025-02-25T15:49:35Z

Hello @rania-o,

The first one is why does modkit report positions as modified when the fraction of modified base is 0 ?

Modkit pileup will report all reference positions where at least one read had a valid call. A "valid call" can be either a canonical or modified call with a probability $\ge$ the pass threshold. You may filter out the positions that have 0% modified - but that's not the default behavior.

The second is about the valid coverage or the score, for one position on IGV I have more than 7500 reads, but in the results modkit reports only 39 for the coverage, why this big difference ? I also have the case where modkit report ~2900 while on IGV I have only 800 reads.

For high-depth, you may want to increase the --max-depth parameter in modkit pileup. If you have a position with valid coverage of 39, but a read depth of 7500, could you check the value of $\text{N}_{\text{diff}}$? You may have a case where most of your reads align an alternate base, for example a G, then a small proportion of the reads have a mismatch and a modification reported on that mismatch. When modkit reports more depth than IGV, it can often be the case that IGV is downsampling the modBAM, if I recall correctly you can change this in the preferences.

rania-o · 2025-02-26T10:06:38Z

Hi @ArtRand ,

Thanks for your reply.

Modkit pileup will report all reference positions where at least one read had a valid call. A "valid call" can be either a canonical or modified call with a probability ≥ the pass threshold. You may filter out the positions that have 0% modified - but that's not the default behavior.

What do you mean here by "that's not the default behavior" ? Filtering out positions with 0% modified automatically by modkit ?

For high-depth, you may want to increase the --max-depth parameter in modkit pileup. If you have a position with valid coverage of 39, but a read depth of 7500, could you check the value of N diff ? You may have a case where most of your reads align an alternate base, for example a G, then a small proportion of the reads have a mismatch and a modification reported on that mismatch. When modkit reports more depth than IGV, it can often be the case that IGV is downsampling the modBAM, if I recall correctly you can change this in the preferences.

For the case where IGV shows less reads than modkit, actually I'm aware of this and I checked on IGV all the reads are displayed.

For the other case, you are right. All the reads are considered as an alternate base. But the problem that I face is that I'm doubting about the modification to be a pseudoU since the signature of this modification is misbasecalling uridine as a cytosine T-->C, which is the case on IGV. But, I'm a bit surprised by the model not calling this base as a pseudoU, and instead calling an m6a even if the base in the reference is not an A.

This is the output for this position :

EU293121.1      2131    2132    a       19      +       2131    2132    255,0,0 19      26.32   5       12      2       345     1192    6575    0
EU293121.1      2131    2132    m       6247    +       2131    2132    255,0,0 6247    2.37    148     6099    0       345     1192    347     0
EU293121.1      2131    2132    17596   19      +       2131    2132    255,0,0 19      10.53   2       12      5       345     1192    6575    0
EU293121.1      2131    2132    17802   321     +       2131    2132    255,0,0 321     13.40   43      278     0       345     1192    6273    0

ArtRand added the question Looking for clarification on inputs and/or outputs label Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coverage of modekit pileup results does not match the coverage displayed on IGV #385

Coverage of modekit pileup results does not match the coverage displayed on IGV #385

rania-o commented Feb 25, 2025 •

edited

Loading

ArtRand commented Feb 25, 2025

rania-o commented Feb 26, 2025 •

edited

Loading

Coverage of modekit pileup results does not match the coverage displayed on IGV #385

Coverage of modekit pileup results does not match the coverage displayed on IGV #385

Comments

rania-o commented Feb 25, 2025 • edited Loading

ArtRand commented Feb 25, 2025

rania-o commented Feb 26, 2025 • edited Loading

rania-o commented Feb 25, 2025 •

edited

Loading

rania-o commented Feb 26, 2025 •

edited

Loading