-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on Modkit DMR Pair Results and Validate Accuracy #327
Comments
Hello @BrendanBeahan, Once again, your commands look fine. I reproduced the DMR table below to make it a little easier to read:
The I would investigate why you're not getting closer to 100% m6A calls on the modified sample. This is likely why the A |
Hello @ArtRand, My goal is to benchmark several m6A-calling tools and then to pick the most appropriate one for our lab’s specific use case. In order to compare m6A-callers I was hoping to have some sort of continuous site-level confidence score, kind of like how m6Anet outputs site level modification probabilities and some of the other tools provide a p-value or FDR etc. This would serve as a threshold/continuous metric for a precision-recall curve and some other benchmarking. However, I am just as happy to use modkit in a comparative manner. I’m really just looking for a single metric I can point to and say “This is how confident modkit is that this site is modified.”. Even though it is much lower than expected, perhaps I could just use % modified for this metric? This obviously seems inappropriate, but I'm not entirely sure what alternative I would use. I was thinking perhaps the map_p of dmr pair. Regarding our experimental design, it’s a bit complicated. We have two equal samples of a fully modified in-vitro transcribed oligo and an unmodified IVT, which we barcoded and sequenced together. I then utilized demultiplexing software to partition the reads into unaligned bams for barcode 1 (unmodified) and barcode 2 (modified). Notably, at this stage I had already lost many of the modified reads.. plausibly due to the abnormal amount of modifications causing issues with our sequencing. Regardless, my demultiplexing commands appeared fine to the authors of that software when I contacted them and their software has been used in similar IVT contexts to success. I then used Dorado (which I suppose in turn uses minimap) to align these unaligned bams to a custom fasta containing the sequence of nucleotides of the synthetic IVT. At this stage I removed any unmapped reads. These bams are then used to filter my initial pod5 files into a modified directory and an unmodified directory, which in turn is fed back into my benchmarking pipeline so that all of my m6A-callers can use these now properly partitioned pod5 directories. From here, it is a bit redundant, but I run a modified dorado basecaller on these now filtered pod5 directories, then map with minimap to the same reference, then utilize modkit pileup etc. So in sum, I’m a bit stumped why I’m getting such a low % of modified m6A sites in my modified sample. But I think more importantly for me, I’m unsure how to create a continuous site-level metric showing modkit’s confidence of m6A modification to use as the continuous threshold for generating something like a precision-recall curve. I suppose in the case of most of these other tools, they are using lack of modification as the null hypothesis by comparing to an unmodified dataset or a previously trained model on a similarly unmodified dataset. Any ideas would be greatly appreciated. Best, |
Also, in looking at the initial output of modkit pileup for my modified pod5 directory further I see that the sites that should be modified (i.e. the A's) have an average valid coverage of almost 2 orders of magnitude higher than for the non-A/unmodified sites. With basically equal % modified predictions agnostic of the base in question. |
Hello @BrendanBeahan,
I can see what you're trying to do here, but let me see if I can convince you that you don't need to calculate a site-level confidence score. Basically, tl;dr, if your per-read-per-base accuracy is highest, you'll also minimize the difference between the estimated %-modified and actual %-modified. In the experimental setup you've constructed, you expect that a given position will have a m6A frequency of 100% or 0%. You can use the difference in the percent methylation from the bedMethyl to the expected modification level as a metric of accuracy. You could aggregate every call for every read and produce a precision-recall plot, confusion matrix, or any other summary statistic. This is what
This is a large warning signal. Could you run the modified sample separately from the unmodified one to test that the demultiplexing is working? From the BED file you sent:
There is a mistake here where you have One thing I might try is instead of using 100% m6A and 0% m6A, try using a known stoichiometric fraction m6A. The polymerase will likely prefer to incorporate one nucleotide over the other, but if you add 50% m6A and 50% A to the reaction, I would expect the modification calls at a given site to be close to 50%.We've also compared whole transcriptome dRNA seq to the GLORI method, and seen good concordance. Ideally, really, the fully synthetic oligos like we have in the blog post is the best way to compare methods. We don't have dRNA date publicly released yet - but I hear it's coming. |
Hello,
Sorry for the flurry of recent questions approaching new years! But I have recently run modkit dmr pair comparing an in-vitro transcribed dataset where every A is m6A modified with another oligo which is fully unmodified.
I am, however, not seeing performance in keeping with what I was expecting when looking at existing modkit/dorado benchmarks, and I was wondering if I was perhaps doing something wrong? Or I am simply misinterpreting the results?
For dmr pair I perform the following:
modkit dmr pair \\ --ref ${params.reference}\\ -a ${bed_unmod_gz}\\ -b ${bed_mod_gz}\\ --base A\\ --header\\ --out-path single_site_dmr.tsv
And the output looks as follows:
chrom start end name score a_counts a_total b_counts b_total a_mod_percentages b_mod_percentages a_pct_modified b_pct_modified map_pvalue effect_size MYGENE 1 2 . 133.0527343210424 a:14 7809 a:59 508 a:0.18 a:11.61 0.0017928032 0.11614173 0 -0.11333333333333334 MYGENE 2 3 . 225.94086564894315 a:48 7696 a:128 720 a:0.62 a:17.78 0.0062370063 0.17777778 0 -0.17 MYGENE 3 4 . 330.8802245030597 a:48 7542 a:180 788 a:0.64 a:22.84 0.0063643595 0.2284264 0.0000000000000000001116233249574274 -0.22333333333333336 MYGENE 4 5 . 493.87030782775764 a:43 7669 a:251 859 a:0.56 a:29.22 0.0056069894 0.29220024 0.000000000000000000000000007378606107449063 -0.2866666666666667 MYGENE 5 6 . 54.474088063362615 a:73 7695 a:59 844 a:0.95 a:6.99 0.00948668 0.069905214 0 -0.060000000000000005
My concern is that nearly every single map pvalue is effectively 0 with a small number being exactly 1. And these 1 values occasionally overlap with the expected modification sites as well, which I would assume is the opposite of expected behavior.
Also when I try to utilize modkit validate I see the accuracy is only about 50%, with the log output as follows for the modified oligo dataset:
My bed file input for validate looks as follows:
MYGENE 1 2 a . +
MYGENE 2 3 a . +
MYGENE 3 4 a . +
MYGENE 4 5 a . +
MYGENE 5 6 a . +
MYGENE 6 7 - . +
MYGENE 7 8 - . +
MYGENE 8 9 - . +
MYGENE 9 10 - . +
MYGENE 10 11 - . +
MYGENE 11 12 - . +
MYGENE 12 13 - . +
MYGENE 13 14 - . +
MYGENE 14 15 - . +
MYGENE 15 16 a . +
I would greatly appreciate any insights into what might be affecting the performance or if there’s anything I should adjust in my approach. Happy to provide any additional details that would help clarify the issue.
Best,
Brendan
The text was updated successfully, but these errors were encountered: