-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to Filter Sites for High-Confidence Modifications? #346
Comments
Hi @evans-ZH you might want to take a look at the PS: I am not the developer. |
Thanks for reaching out! Could you clarify what you mean by “nearly every site is marked as modified”? Specifically, what criteria are you using to determine modification? It sounds like basecalling errors might be contributing to modification calls. To address this, you could separate your analysis by base type. For example, when analyzing m6A, you can specify the canonical base using the Let me know if you need further assistance! |
I selected sites with percent modified > 1 to draw m6A motif, now they are DRACH compliant in reference sequence, I want to know if this is correct and whether it fit m5C, inosine and pseU? if not, is there a more general filtering method? Or can you provide some ideas to verify that the obtained sites are accurate and important? Thanks! |
Hi @evans-ZH, Thank you for following up with additional details and questions! It sounds like you’re making good progress in refining your analysis, but I’d like to help clarify some key aspects of how modified bases are detected and reported, which may provide better context for interpreting your results and planning the next steps.
It’s important to note that modification calls in the output represent the fraction of reads supporting a modification at a given position. A low percentage in the “percent modified” column does not necessarily mean the position “is modified” in a biologically significant way. Instead, it might reflect noise, sequencing errors, or model limitations, as the detection process is not absolute. Given that models have an inherent error rate (typically 0.3%-1.5%, depending on the model and sample characteristics), some positions will naturally appear to have modifications even if they do not. Therefore, low fraction values should be treated with caution and may not represent biologically meaningful modifications.
You’re on the right track with filtering to focus on the most reliable modification calls. The The difference between --filter-threshold and --filter-percentile is worth noting: For your goals, relying on
It’s encouraging that filtering for higher confidence and focusing on motifs like DRACH has helped refine your m6A results. If you’re seeing overlap between modifications (e.g., m6A and inosine), this might reflect shared signal regions or noise, especially when percentages are low. To address this:
To move forward effectively, it might help to revisit your research goals. For instance: Being clear about your objectives will help you set appropriate thresholds and prioritize the most relevant data for downstream analysis.
Here are a few practical suggestions: Let me know if you’d like clarification on any of these points or additional guidance! Best of luck with your analysis, and feel free to share further updates. |
Hi,
I used modkit pileup to detect modification sites, including m6A, m5C, inosine, and pseU. However, the resulting dataset contains an overwhelming number of sites, I use the filter:
But it still seems that nearly every site is marked as modified.
Additionally, I noticed that the motif of the detected m6A sites at the corresponding location on the reference genome did not match DRACH.
I’d like to know if there are any recommended methods or additional filtering criteria I can apply to identify high-confidence and biologically meaningful modification sites. Thanks!
The text was updated successfully, but these errors were encountered: