A generic pipeline that can be run routinely on all Illumina sequence runs, regardless of the project or organism of interest.
- Sequence quality information
- Possible contamination
- Parse run-level QC statistics from the 'InterOp' directory and write to
.csv
and.json
format. - FastQC: sample-level sequence quality metrics
mash
: Estimate genome size and depth of coverageseqtk fqchk
: Measure average sequence quality score, percent of bases above Q30 and GC content.kraken2
+bracken
: Taxonomic classification of reads. Estimation of relative abundances of taxonomic groups (genus, species) in each sample.- MultiQC: Collect several QC metrics into a single interactive HTML report.
nextflow run BCCDC-PHL/routine-sequence-qc \
[--instrument_type nextseq] \
[--kraken2_db /path/to/kraken2_db] \
[--bracken_db /path/to/bracken_db] \
[--seqtk_fqchk_threshold <threshold>] \
[--mash_sketch_kmer_size <kmer_size>] \
[--mash_sketch_minimum_copies <copies>] \
--run_dir <your illumina run directory> \
--outdir <output directory>
<outdir>
├── abundance_top_n
│ ├── top_3_abundances_genus.csv
│ └── top_5_abundances_species.csv
├── basic_qc_stats
│ └── basic_qc_stats.csv
├── bracken
│ ├── <sample_id>_Genus_bracken_abundances.tsv
│ ├── <sample_id>_Genus_bracken_abundances_adjusted.tsv
│ ├── <sample_id>_Genus_bracken.txt
│ ├── <sample_id>_Genus_bracken_adjusted.txt
│ ├── <sample_id>_Species_bracken_abundances.tsv
│ ├── <sample_id>_Species_bracken_abundances_adjusted.tsv
│ ├── <sample_id>_Species_bracken.txt
│ ├── <sample_id>_Species_bracken_adjusted.txt
│ ├── ...
├── fastqc
│ ├── <sample_id>_R1_fastqc
│ ├── <sample_id>_R2_fastqc
│ ├── ...
├── interop_summary
│ ├── interop_index-summary.csv
│ ├── interop_summary.csv
│ └── interop_summary.json
├── kraken2
│ ├── <sample_id>_kraken2.txt
│ ├── ...
├── mash_sketch
│ ├── <sample_id>_R1_mash_sketch.txt
├── mash_sketch_summary
│ └── mash_sketch_summary.csv
├── multiqc
│ ├── multiqc_data
│ └── multiqc_report.html
└── parse_sample_sheet
│ └── sample_sheet.json
├── pipeline_complete.json
├── seqtk_fqchk
│ ├── <sample_id>_seqtk_fqchk_all_positions.csv
│ ├── <sample_id>_seqtk_fqchk_by_position.csv
└── seqtk_fqchk_summary
└── seqtk_fqchk_summary.csv