Do you use `modkit dmr`? #361

ArtRand · 2025-01-31T22:37:23Z

Hello everyone.

I'd like to know if you're having issues with modkit dmr either in the pair or multi variety.

If you're not using it (but you are doing some kind of differential methylation analysis), why not?

Are the outputs hard to interpret, not helpful, or not compatible to other methods?
Is it too slow or (worse) are there bugs?

One thing that's on my immediate roadmap is to compare an open dataset to a published tool such as DSS. I'm also experimenting with a method to get p-values for regions so you could find significantly differentially methylated regions.

If you're using it and liking it, throw a 👍 on here for fun. But don't hold back if there are things that could be better. Of course I'm not promising I can get to all of them.

The text was updated successfully, but these errors were encountered:

kylepalos · 2025-02-06T18:59:41Z

I've been using DMR quite a bit and it has been fast and intuitive! Thanks to the devs for making Modkit a very user-friendly tool!

I do have two very minor questions that I couldn't really find answers to elsewhere.
In both cases, I usually perform paired site specific analyses, such as:

modkit dmr pair \
-a sample1_rep1.bed.gz -a sample1_rep2.bed.gz \
-b sample2_rep1.bed.gz -b sample2_rep2.bed.gz \
-o DMR.bed \
--ref reference.fasta \
--base A --base T \
--min-valid-coverage 10

When analyzing the outputs with balanced replicates, would you recommend always analyzing the balanced effect sizes and p-values (rather than the un-balanced/raw values)? The effect sizes seem to be agreeable b/w raw and balanced, but p-values agree less, see attached scatter plots below. I'm not sure if this is expected behavior or if something about my analysis may be off.
This one is ever more minor. I often analyze modification mutants where the effects are quite strong and a substantial fraction of my p-values (balanced or raw) == 0. I realize the exact p-value past a certain point isn't very interesting/informative, but I was wondering whether the range of reporting could/should be expanded beyond ~1e-50? This would just allow me to not have a massive clump of points at a very similar -log10(p-value) on volcano plots and similar graphics. Again, extremely minor and not actually a Modkit issue.

Thanks a lot!

ArtRand · 2025-02-08T01:21:50Z

@kylepalos Thanks for this!

When analyzing the outputs with balanced replicates, would you recommend always analyzing the balanced effect sizes and p-values (rather than the un-balanced/raw values)? The effect sizes seem to be agreeable b/w raw and balanced, but p-values agree less, see attached scatter plots below. I'm not sure if this is expected behavior or if something about my analysis may be off.

Let me take a look into this.

Ge0rges · 2025-02-23T23:17:14Z

I wanted to chime in here in support of this command. I have explored many different methods for quantifying how differently methylated a nucleotide is in different samples. I've typically looked for:

Significance metric (p-value)
Effect size metric
Whether methylation type is taken into account
On regions, whether position is taken into account
Whether methylation fraction is taken into account
Whether the test can be corrected for differences in coverage
Whether the test can take advantage of replicates
Whether the test can handle different number of replicates AND different coverage per replicate

This is a tall order. I started with modkit dmr went around the block a few (many) times, and finally have settled on modkit dmr pair which satisfies all these criteria. I think the tool you've developed is excellent. It is on my list of things to do on in the feature to explore a contribution integrating it into anvi'o (perhaps at a workshop in September).

For me the only thing lacking is a robust study testing the command's output on a controlled dataset, perhaps including a benchmark to some other relevant statistical tests. Perhaps that will one day be conducted by a member of the scientific community. That's on your TODO! Great!

The output is perfectly suitable for me, I've built a small software suite that reads that data in along with other modkit outputs, genetic annotations, etc. to output relevant plots that allow for a nice analysis.

Thanks for developing it.

ArtRand added the good first issue Good for newcomers label Jan 31, 2025

ArtRand mentioned this issue Feb 7, 2025

Filtering bedmethyl file and DMR analysis #364

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do you use `modkit dmr`? #361

Do you use `modkit dmr`? #361

ArtRand commented Jan 31, 2025

kylepalos commented Feb 6, 2025

ArtRand commented Feb 8, 2025

Ge0rges commented Feb 23, 2025 •

edited

Loading

Do you use modkit dmr? #361

Do you use modkit dmr? #361

Comments

ArtRand commented Jan 31, 2025

kylepalos commented Feb 6, 2025

ArtRand commented Feb 8, 2025

Ge0rges commented Feb 23, 2025 • edited Loading

Do you use `modkit dmr`? #361

Do you use `modkit dmr`? #361

Ge0rges commented Feb 23, 2025 •

edited

Loading