Bwa-mem2 mem parallelisation #35

priyanka-surana · 2022-04-06T20:35:57Z

Description of feature

For illumina and hic, I want to break the input fastq files, align them against the genome. The split files need to be aligned with the same @RG tag.

Split fastq using split
Make sure all split files have the same @RG tag
Run bwa-mem2 mem on them individually
Sort them with samtools individually
Merge all split alignments from the same individual with -c tag
Merge at the specimen level.
Run through the rest of the markdup_stats subworkflow

Maybe possible to combine steps (5) and (6).

The text was updated successfully, but these errors were encountered:

muffato · 2022-04-13T16:37:04Z

For the record, we want to split the fastq file rather than the CRAM as it would allow reusing the workflow for PacBio and ONT, esp. as the PacBio BAM file is "weird".

priyanka-surana · 2023-03-16T19:27:42Z

@muffato Based on the latest discussion with Shane, are we still focusing on splitting by FASTQ. He recommended we split by BAM/CRAM to avoid excess I/O.

muffato · 2023-03-17T08:25:49Z

If you can avoid converting to fastq, please do so. It's a valid disk optimisation even without splitting the input file

muffato · 2023-11-30T10:22:41Z

We've run the pipeline quite a lot and Hi-C alignment with BWA doesn't seem to be an issue for us. The largest sample I've tested was Sambucus nigra: 6.3 billion reads (many species have around 1 billion reads) and 11.8 Gbp genome (the average genome size is < 1 Gbp). It took 2 days and 2 hours to run, which is fine in the After-Party context.

priyanka-surana added the enhancement Improvement of the existing features label Apr 6, 2022

priyanka-surana added this to the v0.2 milestone Apr 7, 2022

priyanka-surana removed this from the v0.2 milestone Apr 24, 2022

priyanka-surana pinned this issue Dec 3, 2022

priyanka-surana added this to the 1.2.0 milestone Mar 16, 2023

priyanka-surana unpinned this issue Mar 16, 2023

priyanka-surana self-assigned this Mar 16, 2023

priyanka-surana added feature Requests for new features user request Requests made by users and public labels Jun 27, 2023

priyanka-surana added this to Genome After Party Jun 27, 2023

github-project-automation bot moved this to Todo in Genome After Party Jun 27, 2023

muffato mentioned this issue Jun 28, 2023

Implement chunking to speed up alignment #74

Closed

muffato added the backlog label Nov 30, 2023

muffato removed this from the 1.2.0 milestone Dec 8, 2023

muffato unassigned priyanka-surana Feb 12, 2024

muffato removed enhancement Improvement of the existing features backlog labels Jun 1, 2024

muffato added this to readmapping Jun 5, 2024

muffato moved this to Ideas in readmapping Jun 5, 2024

muffato added enhancement Improvement of the existing features and removed feature Requests for new features labels Jun 17, 2024

tkchafin assigned reichan1998 Aug 6, 2024

reichan1998 mentioned this issue Aug 13, 2024

Implement chunking to speed up alignment #105

Closed

9 tasks

tkchafin linked a pull request Sep 16, 2024 that will close this issue

Implement cram chunking and minimap2-based Hi-C alignments #113

Merged

9 tasks

tkchafin closed this as completed Sep 17, 2024

github-project-automation bot moved this from Ideas to Done in readmapping Sep 17, 2024

github-project-automation bot moved this from Todo to Done in Genome After Party Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bwa-mem2 mem parallelisation #35

Bwa-mem2 mem parallelisation #35

priyanka-surana commented Apr 6, 2022

muffato commented Apr 13, 2022

priyanka-surana commented Mar 16, 2023

muffato commented Mar 17, 2023

muffato commented Nov 30, 2023

Bwa-mem2 mem parallelisation #35

Bwa-mem2 mem parallelisation #35

Comments

priyanka-surana commented Apr 6, 2022

Description of feature

muffato commented Apr 13, 2022

priyanka-surana commented Mar 16, 2023

muffato commented Mar 17, 2023

muffato commented Nov 30, 2023