-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime #15
Comments
Hi Jonatan, Thank you for pointing this out. We made some changes to the indel candidate site selection and reorganizing the code to become more modular, and this might be causing an increase in runtime. I reran some tests and it does seem that there is an increase in runtime. While I fix this issue, you can try to use an older release (less than v0.4) which has similar indel performance and same SNP performance. Additionally I can try to create a branch that provides older candidate selection method as an option and uses the same API as v0.4 release. |
Another thing, if you used human reference genome for variant calling, did you use In our paper, we used this parameter to report runtime. |
Hello! Thanks for all the input and tips. I'll try using a previous version and check how faster it goes. Eventually we'll be running this in a HPC with access to more CPUs which will increment the speed by a lot but I still found it weird to be so slow in 8 CPUs, it's taken 5 days just to complete one sample. For your second reply yes, I did use the |
Just for context, can you tell me what is the coverage of your BAM file, and if you know which Guppy version was used to basecall the reads? |
Hello! The average coverage of the BAM file if I didn't calculate it in a wrong way because there are a lot of ways of computing it and I always fail to find an easy and straightforward one, is 20x. As for the Guppy version was the 3.4.5. |
Hi Jonatan, It turns out that the problem was being caused by this commit: 2546959, so I have reverted the changes from that commit in v0.4.1 (both in this repo and docker). You should be able to get a ~40% reduction in runtime compared to v0.4.0 and the performance will be similar to the one reported in our paper. During this testing I found several other areas of runtime improvement, for instance replacing biopython's pairwise alignment algorithm with one that is implemented in C. I will be releasing these improvements over the next few weeks. Also, NanoCaller logs report coverage which is calculated for SNP calling. If you use NanoCaller_WGS.py, these logs will be in the |
Hello Mian! That's great news! Thanks for looking into it and fixing it in such a quick time. I reckon the improvements of replacing Biopython will come with a future version and that are not yet implemented in the v0.4.1? I'm looking forward to it! On another note I've checked the logs and it seems that on average the coverage was 20x for the sample tested. According to a discussion with the author of PEPPER which is another tool for SNP/Indel calling we agreed that 20x seems to be low to call these type of variants since it'll call almost everything it finds. Hopefully for the samples I asked to be sequenced I'll have more coverage and the calling will be more precise. Best regards, Jonatan |
Hello!
On the NanoCaller paper you have a table of run times for the different modes and different technologies. And I noticed that for ONT on the mode to call
both
on 16 CPUs the runtime was about 18h. But I've been running my data on 8 CPUs since it's the max I have on this machine, and it's been going on for 23h already and it hasn't reached chromosome 3 yet. The type of data is ONT running in theboth
mode. What could be the reason is taking so long?Best regards,
Jonatan
The text was updated successfully, but these errors were encountered: