-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bwa-mem2 mem parallelisation #35
Comments
For the record, we want to split the fastq file rather than the CRAM as it would allow reusing the workflow for PacBio and ONT, esp. as the PacBio BAM file is "weird". |
@muffato Based on the latest discussion with Shane, are we still focusing on splitting by FASTQ. He recommended we split by BAM/CRAM to avoid excess I/O. |
If you can avoid converting to fastq, please do so. It's a valid disk optimisation even without splitting the input file |
We've run the pipeline quite a lot and Hi-C alignment with BWA doesn't seem to be an issue for us. The largest sample I've tested was Sambucus nigra: 6.3 billion reads (many species have around 1 billion reads) and 11.8 Gbp genome (the average genome size is < 1 Gbp). It took 2 days and 2 hours to run, which is fine in the After-Party context. |
Description of feature
For illumina and hic, I want to break the input fastq files, align them against the genome. The split files need to be aligned with the same
@RG
tag.split
@RG
tag-c
tagmarkdup_stats
subworkflowMaybe possible to combine steps (5) and (6).
The text was updated successfully, but these errors were encountered: