modkit bedmethyl merge bug #350

paulinor42 · 2025-01-17T13:36:45Z

Hello,
I am trying to merge one pileup.bed containing 5mC and 5hmC mod with another pileup.bed containing 5mC and 5hmC mods. Both files have been compressed and indexed according to the online documentation.

Below is the command I am using

dist_modkit_v0.4.2_10d99bc/modkit bedmethyl merge \
PROM0146_Frost_179130_06152023_5mCG_5hmCG_hs1_sort_pileup.bed.gz PROM0146_Frost_179130_FC2_10042023_5mCG_5hmCG_5khz_hs1_sort_pileup.bed.gz \
-o 179130_5mCG_5hmCG_hs1_pileup.bed \
-g genome_references/chm13/hs1_genome_sizes.tsv \
--threads 20 \
--force \
--log-filepath merge.log \
--interval-size 50000

However, the program keeps stalling and not exiting, and it does not write anything to the output pileup bed.

This is the stdout that the commands freezes on. I notice it keeps on processing contigs even though they are only 25 in the genome sizes file.

calculated chunk size: 30, interval size 50000, processing 1500000 positions concurrently
[00:03:01] ######################################## 1001/25 contigs processed
0 merging contigs
0 batch errors

Here is log output

[src/command_utils.rs::295][2025-01-17 08:25:10][INFO] calculated chunk size: 30, interval size 50000, processing 1500000 positions concurrently
[src/interval_chunks.rs::512][2025-01-17 08:25:10][DEBUG] there are 25 contig(s) to work on (25 parts)

Thank you for your help. modkit is a very helpful and useful tool!

The text was updated successfully, but these errors were encountered:

ArtRand · 2025-01-17T21:25:52Z

Hello @paulinor42

Sorry about the stall. So if I'm understanding correctly, hs1_genome_sizes.tsv has a subset of the contigs in the bedMethyls. This should be fine and a supported use case.

I notice it keeps on processing contigs even though they are only 25 in the genome sizes file.

This is a bug, but it should be a harmless one, and shouldn't cause a stall like you're seeing - and actually provides a clue to what's going on.

To me it looks like Modkit is failing to write to the file, or the filesystem is holding it up. The program loads sections of each bedMethyl in parallel, merges them, and puts the merged records on a queue to be written. This queue is set to a maximum capacity of 1000 items. So the fact that it stalls at 1001 makes me think that the queue is full.

Would you be willing to share the data that produces this problem with me so I can try and reproduce it? If so, please send me email at art.rand[at]nanoporetech.com and we can sort it out. Thanks!

paulinor42 · 2025-01-18T21:37:35Z

Hi @ArtRand,

Thanks for the response.
The hs1_genome_sizes.tsv file has all the contigs in the bed methyl file. So the bedmethyl files and the hs1_genome_sizes.tsv file should have only 25 contigs. The contigs are the chromosomes in the CHM13 genome; originally, I used the .fai file of the reference sequences I aligned to but i get the same error.

and yes! I can send you the files.

ArtRand · 2025-01-23T14:59:47Z

Hello @paulinor42,

For some reason your email got triaged and didn't make it to my inbox. I found it and I have your files. I'll try to reproduce asap and get back to you about the solution. Thanks!

stevenb21 · 2025-01-31T19:04:30Z

Hello @ArtRand, big fan of what you are doing - I just wanted to add that I also face the same bug, however my magic number is 699 instead of 1001.

Below is the command I ran:
modkit bedmethyl merge sample1_h1.bedmethyl.gz sample1_h2.bedmethyl.gz sample1_ug.bedmethyl.gz -o "$out_path" -g "$sizes_path" --force --log-filepath "$log_path"

I created the size.tsv file with the following awk code from another ticket:
awk '/^>/ {if (seq) print chr"\t"seq; chr=substr($1,2); seq=0; next} {seq+=length($0)} END {if (seq) print chr"\t"seq}' input.fasta > sizes.tsv

I created tabix files for each .bedmethyl.gz file with

tabix -p bed sample1_h1.bedmethyl.gz

and the log output:
[src/command_utils.rs::295][2025-01-31 13:42:41][INFO] calculated chunk size: 6, interval size 100000, processing 600000 positions concurrently [src/interval_chunks.rs::512][2025-01-31 13:42:41][DEBUG] there are 25 contig(s) to work on (25 parts)

This could be user error on my part, please let me know if I am missing anything workflow wise.

ArtRand added bug Something isn't working troubleshooting workflow and data preparation questions labels Jan 17, 2025

ArtRand added the needs-attention Requires follow up from developers label Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modkit bedmethyl merge bug #350

modkit bedmethyl merge bug #350

paulinor42 commented Jan 17, 2025

ArtRand commented Jan 17, 2025

paulinor42 commented Jan 18, 2025

ArtRand commented Jan 23, 2025

stevenb21 commented Jan 31, 2025 •

edited

Loading

modkit bedmethyl merge bug #350

modkit bedmethyl merge bug #350

Comments

paulinor42 commented Jan 17, 2025

ArtRand commented Jan 17, 2025

paulinor42 commented Jan 18, 2025

ArtRand commented Jan 23, 2025

stevenb21 commented Jan 31, 2025 • edited Loading

stevenb21 commented Jan 31, 2025 •

edited

Loading