You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering if it might be possible to add chunk-based processing (similar to the pileup method) for the –no-sampling option in summary and sample-prob in the future. Currently, the –no-sampling option is very resource-intensive—in my case, processing 150,000 reads requires around 60GB of RAM. Because my modifications are sparse, –no-sampling seems the only viable option I have. While I can work around this by splitting my BAM file into smaller segments and then aggregating the results, it would be ideal if the –no-sampling option could incorporate chunk processing strategy like pileup in the future.
Thanks for your help!
Best,
Kun
The text was updated successfully, but these errors were encountered:
That's a good idea, both of those commands are due for a little refresh. One caution about splitting the bam, depending on how you're doing it, you can have reads that get counted in two splits if they span the gap. Another option is to use modkit extract calls and pipe the table through another filter that calculates the statistics per-read. All of the rows for a read will come out together, so you can operate on each read at once, calculate the %-modified, etc.
Calculating the pass thresholds is a little more complicated. Right now the percentiles are naively, but exactly. I can already think of a few ways to be more clever about calculating the percentiles without using as much memory. Thanks for the use case and the pressure, I'll see what I can do.
Hi @ArtRand,
I was wondering if it might be possible to add chunk-based processing (similar to the pileup method) for the –no-sampling option in summary and sample-prob in the future. Currently, the –no-sampling option is very resource-intensive—in my case, processing 150,000 reads requires around 60GB of RAM. Because my modifications are sparse, –no-sampling seems the only viable option I have. While I can work around this by splitting my BAM file into smaller segments and then aggregating the results, it would be ideal if the –no-sampling option could incorporate chunk processing strategy like pileup in the future.
Thanks for your help!
Best,
Kun
The text was updated successfully, but these errors were encountered: