assess the precision of the 4mC ratio #344

hannan666666 · 2025-01-15T11:37:33Z

Hello, I am working on quantifying the ratio of 4mC in mouse samples, but I have encountered a challenge. According to public papers, 4mC is very rare in mammals. I was wondering if you could provide some guidance on how I can assess the precision of the 4mC ratio of the modkit? Additionally, do you have any strategies to improve its precision, such as setting a higher threshold for the analysis? Thank you very much !!!

bases C
total_reads_used 10042
count_reads_C 10042
@ pass_threshold_C 0.640625
base code pass_count pass_frac all_count all_frac
C - 33096024 0.9303225 35700393 0.905164
C m 1632598 0.045892 2118287 0.053708013
C 21839 846164 0.0237855 1622119 0.041127943

ArtRand · 2025-01-16T00:47:30Z

Hello @hannan666666,

We recommend testing base modification models on synthetic strands. We've recently published a blog post describing how we derive the model performance metrics. Unfortunately, the 4mC validation data hasn't been released publicly yet.

I ran a test on the validation data I have, using the latest models ([email protected]_4mC_5mC@v3) and attached the pass confusion matrix from modkit validate.

> Call probability threshold: 0.6836
> Percent of modified base calls removed: 9.98%
> Filtered accuracy: 96.85%
> Filtered modified base calls contingency table
                  Called Base
         ┌───────┬────────┬────────┬────────┐
         │       │ C      │ 21839  │ m      │
         ├───────┼────────┼────────┼────────┤
 Ground  │ C     │ 97.83% │  1.75% │  0.42% │
 Truth   │ 21839 │  1.10% │ 98.78% │  0.12% │
         │ m     │  0.45% │  0.02% │ 99.52% │
         └───────┴────────┴────────┴────────┘

The threshold value I'm getting isn't much higher than what you're getting. There will always be a trade-off between increasing the --filter-threshold and the sensitivity of the model. What I would do is look at the output from modkit sample-probs and pick a threshold value for 4mC that corresponds to ~15-20th percentile.

hannan666666 · 2025-01-16T02:56:11Z

Thank you very much for your kind and informative reply! If possible, could you share the species and the 4mC fraction of your validation sample? My sample is from a mouse, and the 4mC fraction I observed is 0.041127943. Based on your experience, do you think this value is unusually high for mammals? I would greatly appreciate any insights you could provide.

Thank you again for your time and support!

ArtRand added the question Looking for clarification on inputs and/or outputs label Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assess the precision of the 4mC ratio #344

assess the precision of the 4mC ratio #344

hannan666666 commented Jan 15, 2025

ArtRand commented Jan 16, 2025

hannan666666 commented Jan 16, 2025

assess the precision of the 4mC ratio #344

assess the precision of the 4mC ratio #344

Comments

hannan666666 commented Jan 15, 2025

ArtRand commented Jan 16, 2025

hannan666666 commented Jan 16, 2025