Chunk process for summary and sample-prob #389

KunFang93 · 2025-02-27T19:41:51Z

I was wondering if it might be possible to add chunk-based processing (similar to the pileup method) for the –no-sampling option in summary and sample-prob in the future. Currently, the –no-sampling option is very resource-intensive—in my case, processing 150,000 reads requires around 60GB of RAM. Because my modifications are sparse, –no-sampling seems the only viable option I have. While I can work around this by splitting my BAM file into smaller segments and then aggregating the results, it would be ideal if the –no-sampling option could incorporate chunk processing strategy like pileup in the future.

Thanks for your help!

Best,
Kun

ArtRand · 2025-02-28T23:28:11Z

Hello @KunFang93,

That's a good idea, both of those commands are due for a little refresh. One caution about splitting the bam, depending on how you're doing it, you can have reads that get counted in two splits if they span the gap. Another option is to use modkit extract calls and pipe the table through another filter that calculates the statistics per-read. All of the rows for a read will come out together, so you can operate on each read at once, calculate the %-modified, etc.

Calculating the pass thresholds is a little more complicated. Right now the percentiles are naively, but exactly. I can already think of a few ways to be more clever about calculating the percentiles without using as much memory. Thanks for the use case and the pressure, I'll see what I can do.

KunFang93 · 2025-03-01T21:14:15Z

Thanks for your suggestion! I will try it. Looking forward to seeing the new tricks in old functions :)

ArtRand · 2025-03-05T14:29:21Z

Reopening this to track the work.

ArtRand added the question Looking for clarification on inputs and/or outputs label Feb 28, 2025

KunFang93 closed this as completed Mar 1, 2025

ArtRand added enhancement New feature or request and removed question Looking for clarification on inputs and/or outputs labels Mar 5, 2025

ArtRand reopened this Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunk process for summary and sample-prob #389

Chunk process for summary and sample-prob #389

KunFang93 commented Feb 27, 2025

ArtRand commented Feb 28, 2025

KunFang93 commented Mar 1, 2025

ArtRand commented Mar 5, 2025

Chunk process for summary and sample-prob #389

Chunk process for summary and sample-prob #389

Comments

KunFang93 commented Feb 27, 2025

ArtRand commented Feb 28, 2025

KunFang93 commented Mar 1, 2025

ArtRand commented Mar 5, 2025