You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As i understood, if interested in only 5mC vs C (and not 5hmC), then collapsing is performed by splitting 5hmC probabilities equally between both states: collapsing 'h', with 'm' and canonical options, half of the probability of 'h' will be added to both 'm' and 'C'.
This would work only assuming that 5hmC calls are equally possible between 5mC and C. However, it is known that 5hmC is more likely to be a FP of 5mC and not canonical C (e.g., see section "The use of more comprehensive negative controls to account for confounding DNA modifications" in https://www.biorxiv.org/content/10.1101/2024.11.19.624260v1.full.pdf)
Do I understand correctly, that if Pcanonical = 0.45, P5mC=0.35, P5hmC=0.2 - it would be assigned Pcanonical (providing threshold is 0.4) ?
Is there a smarter way to first merge P5mC + P5hmC and then call based on the probability?
Thanks!
The text was updated successfully, but these errors were encountered:
As i understood, if interested in only 5mC vs C (and not 5hmC), then collapsing is performed by splitting 5hmC probabilities equally between both states: collapsing 'h', with 'm' and canonical options, half of the probability of 'h' will be added to both 'm' and 'C'.
That's correct.
This would work only assuming that 5hmC calls are equally possible between 5mC and C. However, it is known that 5hmC is more likely to be a FP of 5mC and not canonical C (e.g., see section "The use of more comprehensive negative controls to account for confounding DNA modifications" in https://www.biorxiv.org/content/10.1101/2024.11.19.624260v1.full.pdf)
The method that Modkit uses doesn't have any information regarding the relative frequency of the various modifications in the sample. It also doesn't have any information about the error rate of the model to form any kind of likelihood of observing $p_{\text{5hmC}}$ given that $\text{x} = \text{5mC}$ or anything like that.
Basically, tl;dr, the method in Modkit is a fairly unbiased way to turn a 3-way classification into a 2-way classification.
Do I understand correctly, that if Pcanonical = 0.45, P5mC=0.35, P5hmC=0.2 - it would be assigned Pcanonical (providing threshold is 0.4) ?
That's correct.
Is there a smarter way to first merge P5mC + P5hmC and then call based on the probability?
Modkit has a couple other ways actually.
In functions like pileup you can use --combine-mods which will sum together all modification calls.
You can use modkit adjust-mods ${inbam} ${outbam} --convert h m which will add $p_{\text{5hmC}}$ to $p_{\text{5mC}}$ - I think this is the most similar to what you're looking for.
You can play around with this using the test data in the repo. E.g.
$ bam=./tests/resources/bc_anchored_10_reads.sorted.bam
$ modkit extract full ${bam} /tmp/scratch/orig.tsv
$ modkit adjust-mods ${bam} stdout --convert h m | modkit extract full stdin /tmp/scratch/converted.tsv
In general we've found that the default method for --ignore gives the best results when evaluating against ground truth synthetic oligos. We have a blog post and public data available if you want to try the various other methods. There is an older blog post comparing to WGBS as well.
Thanks for the question, happy to elaborate on anything that isn't clear.
As i understood, if interested in only 5mC vs C (and not 5hmC), then collapsing is performed by splitting 5hmC probabilities equally between both states:
collapsing 'h', with 'm' and canonical options, half of the probability of 'h' will be added to both 'm' and 'C'.
This would work only assuming that 5hmC calls are equally possible between 5mC and C. However, it is known that 5hmC is more likely to be a FP of 5mC and not canonical C (e.g., see section "The use of more comprehensive negative controls to account for confounding DNA modifications" in https://www.biorxiv.org/content/10.1101/2024.11.19.624260v1.full.pdf)
Thanks!
The text was updated successfully, but these errors were encountered: