From 42fee8e6f38368674b915652f3e37df9ca89b0f6 Mon Sep 17 00:00:00 2001
From: Ksenia hifiasm
on the HiFi reads, purging the primary contigs with purge_dups
, and scaffolding them up with YaHS
.
+Optionally, if Illumina 10X data is provided, the purged contigs and haplotigs can be polished.
+
+In case of a diploid genome when HiFi and HiC data is coming from the same individual addtionally hifiasm
can be run in HiC mode to produce a phased assembly. In that case the produced haplotypes are not purged but scaffolded up directly with YaHS
.
+
+Optionally, the organelles assembly can be triggered. The mitochondrion and (if relevant) plastid sequences are produced using MitoHiFi
and OATK
.
+
+The directories listed below will be created in the --outdir
directory after the pipeline has finished. All paths are relative to the top-level --outdir
directory.
## Subworkflows
@@ -43,13 +50,16 @@ This subworkflow generates a KMER database and coverage model used in [PURGE_DUP
- primary assembly in GFA and FASTA format; for more details refer to [hifiasm output](https://hifiasm.readthedocs.io/en/latest/interpreting-output.html)
- .\*hifiasm.\*/.*a_ctg.[g]fa
- haplotigs in GFA and FASTA format; for more details refer to [hifiasm output](https://hifiasm.readthedocs.io/en/latest/interpreting-output.html)
+ - .\*hifiasm-hic.\*/.*hap1.p_ctg.[g]fa
+ - fully phased hap1 if hifiasm is run in HiC mode; for more details refer to [hifiasm output](https://hifiasm.readthedocs.io/en/latest/interpreting-output.html)
+ - .\*hifiasm-hic.\*/.*hap2.p_ctg.[g]fa
+ - fully phased hap2 if hifiasm is run in HiC mode; for more details refer to [hifiasm output](https://hifiasm.readthedocs.io/en/latest/interpreting-output.html)
- .\*hifiasm.\*/.*bin
- internal binary hifiasm files; for more details refer [here](https://hifiasm.readthedocs.io/en/latest/faq.html#id12)
This subworkflow generates a raw assembly(-ies). First, hifiasm is run on the input HiFi reads then raw contigs are converted from GFA into FASTA format, this assembly is due to purging, polishing (optional) and scaffolding further down the pipeline.
-In case hifiasm HiC mode is switched on, it is performed as an extra step with results stored in hifiasm-hic folder.
\*.hifiasm..\*/scaffolding/.*_merged_sorted.bed
+ - \*.hifiasm.\*/scaffolding[_hap1/_hap2/^$]/.*_merged_sorted.bed
- bed file obtained from merged mkdup bam
- - \*.hifiasm..\*/scaffolding/.*mkdup.bam
+ - \*.hifiasm.\*/scaffolding[_hap1/_hap2/^$]/.*mkdup.bam
- final read mapping bam with mapped reads
\*.hifiasm..\*/scaffolding/.*.stats
+ - \*.hifiasm.\*/scaffolding[_hap1/_hap2/^$]/.*.stats
- output of samtools stats
- - \*.hifiasm..\*/scaffolding/.*.idxstats
+ - \*.hifiasm.\*/scaffolding[_hap1/_hap2/^$]/.*.idxstats
- output of samtools idxstats
- - \*.hifiasm..\*/scaffolding/.*.flagstat
+ - \*.hifiasm.\*/scaffolding[_hap1/_hap2/^$]/.*.flagstat
- output of samtools flagstat
\*.hifiasm..\*/scaffolding/yahs/out.break.yahs/out_scaffolds_final.fa
+ - \*.hifiasm.\*/scaffolding[_hap1/_hap2/^$]/yahs/out.break.yahs/out_scaffolds_final.fa
- scaffolds in FASTA format
- - \*.hifiasm..\*/scaffolding/yahs/out.break.yahs/out_scaffolds_final.agp
+ - \*.hifiasm.\*/scaffolding[_hap1/_hap2/^$]/yahs/out.break.yahs/out_scaffolds_final.agp
- coordinates of contigs relative to scaffolds
- - \*.hifiasm..\*/scaffolding/yahs/out.break.yahs/alignments_sorted.txt
+ - \*.hifiasm.\*/scaffolding[_hap1/_hap2/^$]/yahs/out.break.yahs/alignments_sorted.txt
- Alignments for Juicer in text format
- - \*.hifiasm..\*/scaffolding/yahs/out.break.yahs/yahs_scaffolds.hic
+ - \*.hifiasm.\*/scaffolding[_hap1/_hap2/^$]/yahs/out.break.yahs/yahs_scaffolds.hic
- Juicer HiC map
- - \*.hifiasm..\*/scaffolding/yahs/out.break.yahs/*cool
+ - \*.hifiasm.\*/scaffolding[_hap1/_hap2/^$]/yahs/out.break.yahs/*cool
- HiC map for cooler
- - \*.hifiasm..\*/scaffolding/yahs/out.break.yahs/*.FullMap.png
+ - \*.hifiasm.\*/scaffolding[_hap1/_hap2/^$]/yahs/out.break.yahs/*.FullMap.png
- Pretext snapshot