Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

threshold modification in modkit pileup #357

Open
baibhav-bioinfo opened this issue Jan 29, 2025 · 1 comment
Open

threshold modification in modkit pileup #357

baibhav-bioinfo opened this issue Jan 29, 2025 · 1 comment
Labels
question Looking for clarification on inputs and/or outputs

Comments

@baibhav-bioinfo
Copy link

Hello,
Appreciation for developing such helpful program.

I am working with DRS data and using "Dorado+modkit" for m6A modification analysis.
I ran Dorado for modification basecalling, then used dorado aligner for genome mapping.

Now I am using modkit pileup to convert the bam information into bed file using following command.

$modkit pileup --threads 96 a1_DRACH_.bam a1_sup_DRACH_pileup.bed --log-filepath a1_sup_DRACH_pileup.log

calculated chunk size: 144, interval size 100000, processing 14400000 positions concurrently
attempting to sample 10042 reads
Using filter threshold 0.9238281 for A.
Done, processed 1654243 rows. Processed ~8092362 reads and skipped ~41874 reads.

The filter threshold used here automatically as ~0.92, is this threshold same as "--modified-bases-threshold 0.92" in dorado while basecalling?

@ArtRand
Copy link
Contributor

ArtRand commented Jan 30, 2025

Hello @baibhav-bioinfo,

No this isn't the same. If you use --modified-bases-threshold 0.92 any canonical probability >=0.92 will be elided (i.e. not explicitly recorded). If a base bas a canonical probability <0.92 it will still be recorded. Whereas the filter threshold calculated by modkit is saying that any base modification probability <0.92 will be filtered out.

If you look at the SAM specification (page 7) it has the details on the "implicitly low probability of modification" calls - it's a bit in the weeds though. In general I would recommend using the default values in dorado.

@ArtRand ArtRand added the question Looking for clarification on inputs and/or outputs label Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Looking for clarification on inputs and/or outputs
Projects
None yet
Development

No branches or pull requests

2 participants