You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am working on quantifying the ratio of 4mC in mouse samples, but I have encountered a challenge. According to public papers, 4mC is very rare in mammals. I was wondering if you could provide some guidance on how I can assess the precision of the 4mC ratio of the modkit? Additionally, do you have any strategies to improve its precision, such as setting a higher threshold for the analysis? Thank you very much !!!
bases C
total_reads_used 10042
count_reads_C 10042
@ pass_threshold_C 0.640625
base code pass_count pass_frac all_count all_frac
C - 33096024 0.9303225 35700393 0.905164
C m 1632598 0.045892 2118287 0.053708013
C 21839 846164 0.0237855 1622119 0.041127943
The text was updated successfully, but these errors were encountered:
We recommend testing base modification models on synthetic strands. We've recently published a blog post describing how we derive the model performance metrics. Unfortunately, the 4mC validation data hasn't been released publicly yet.
I ran a test on the validation data I have, using the latest models ([email protected]_4mC_5mC@v3) and attached the pass confusion matrix from modkit validate.
> Call probability threshold: 0.6836
> Percent of modified base calls removed: 9.98%
> Filtered accuracy: 96.85%
> Filtered modified base calls contingency table
Called Base
┌───────┬────────┬────────┬────────┐
│ │ C │ 21839 │ m │
├───────┼────────┼────────┼────────┤
Ground │ C │ 97.83% │ 1.75% │ 0.42% │
Truth │ 21839 │ 1.10% │ 98.78% │ 0.12% │
│ m │ 0.45% │ 0.02% │ 99.52% │
└───────┴────────┴────────┴────────┘
The threshold value I'm getting isn't much higher than what you're getting. There will always be a trade-off between increasing the --filter-threshold and the sensitivity of the model. What I would do is look at the output from modkit sample-probs and pick a threshold value for 4mC that corresponds to ~15-20th percentile.
Thank you very much for your kind and informative reply! If possible, could you share the species and the 4mC fraction of your validation sample? My sample is from a mouse, and the 4mC fraction I observed is 0.041127943. Based on your experience, do you think this value is unusually high for mammals? I would greatly appreciate any insights you could provide.
Hello, I am working on quantifying the ratio of 4mC in mouse samples, but I have encountered a challenge. According to public papers, 4mC is very rare in mammals. I was wondering if you could provide some guidance on how I can assess the precision of the 4mC ratio of the modkit? Additionally, do you have any strategies to improve its precision, such as setting a higher threshold for the analysis? Thank you very much !!!
bases C
total_reads_used 10042
count_reads_C 10042
@ pass_threshold_C 0.640625
base code pass_count pass_frac all_count all_frac
C - 33096024 0.9303225 35700393 0.905164
C m 1632598 0.045892 2118287 0.053708013
C 21839 846164 0.0237855 1622119 0.041127943
The text was updated successfully, but these errors were encountered: