This Nextflow pipeline is designed to process metagenomic sequencing data, characterize overall taxonomic composition, and identify and quantify reads mapping to viruses infecting certain host taxa of interest. It was developed as part of the Nucleic Acid Observatory project.
The pipeline currently consists of three workflows:
INDEX
: Creates indices and reference files used by theRUN
andRUN_VALIDATION
workflows1.RUN
: Performs the main analysis, including QC, viral identification, taxonomic profiling, and optional BLAST validation.RUN_VALIDATION
: Performs part of therun
workflow dedicated to validation of taxonomic classification with BLAST2.DOWNSTREAM
: Performs downstream analysis of the results from therun
workflow, currently limited to marking duplicate reads3.
- Installation and usage:
- Workflow details:
- Configuration and output:
- Other:
Footnotes
-
The
INDEX
workflow is intended to be run first, after which many instantiations of theRUN
workflow can use the same index output files. ↩ -
The
RUN_VALIDATION
workflow is intended to be run after theRUN
workflow if the optional BLAST validation was not selected during theRUN
workflow. Typically, this workflow is run on a subset of the host viral reads identified in theRUN
workflow, to evaluate the sensitivity and specificity of the viral identification process. ↩ -
The
DOWNSTREAM
workflow is designed to handle tasks that require cross-read comparisons, including potentially across multiple runs. ↩