RNAseq.vsh

A version of the nf-core/rnaseq pipeline (version 3.14.0) in the Viash framework.

Rationale

We stick to the original nf-core pipeline as much as possible. This also means that we create a subworkflow for the 5 main stages of the pipeline as depicted in the README.

Getting started

As test data, we can use the small dataset nf-core provided with their test profile: https://github.com/nf-core/test-datasets/tree/rnaseq3/testdata/GSE110004.

A simple script has been provided to fetch those files from the github repository and store them under testData/minimal_test (the subdirectory is created to support full_test later as well): bin/get_minimal_test_data.sh.

Additionally, a script has been provided to fetch some additional resources for unit testing the components. Thes will be stored under testData/unit_test_resources: bin/get_unit test_data.sh

To get started, we need to:

Install nextflow system-wide
Fetch the test data:

bin/minimal_test.sh
bin/get_minimal_test_data.sh

Running the pipeline

To actually run the pipeline, we first need to build the components and pipeline:

viash ns build --setup cb --parallel

Now we can run the pipeline using the command:

nextflow run target/nextflow/workflows/pre_processing/main.nf \
  -profile docker \
  --id test \
  --input testData/minimal_test/SRR6357070_1.fastq.gz \
  --publish_dir testData/test_output/

Alternatively, we can run the pipeline with a sample sheet using the built-in --param_list functionality: (Read file paths must be specified relative to the sample sheet’s path)

cat > testData/minimal_test/input_fastq/sample_sheet.csv << HERE
id,fastq_1,fastq_2,strandedness
WT_REP1,SRR6357070_1.fastq.gz;SRR6357071_1.fastq.gz,SRR6357070_2.fastq.gz;SRR6357071_2.fastq.gz,reverse
WT_REP2,SRR6357072_1.fastq.gz,SRR6357072_2.fastq.gz,reverse
RAP1_UNINDUCED_REP1,SRR6357073_1.fastq.gz,,reverse
HERE

nextflow run target/nextflow/workflows/rnaseq/main.nf \
  --param_list testData/minimal_test/input_fastq/sample_sheet.csv \
  --publish_dir "test_results/full_pipeline_test" \
  --fasta testData/minimal_test/reference/genome.fasta \
  --gtf testData/minimal_test/reference/genes.gtf.gz \
  --transcript_fasta testData/minimal_test/reference/transcriptome.fasta \
  -profile docker

Pipeline sub-workflows and components

The pipeline has 5 sub-workflows that can be run separately.

Prepare genome: This is a workflow for preparing all the reference data required for downstream analysis, i.e., uncompress provided reference data or generate the required index files (for STAR, Salmon, Kallisto, RSEM, BBSplit).
Pre-processing: This is a workflow for performing quality control on the input reads It performs FastQC, extracts UMIs, trims adapters, and removes ribosomal RNA reads. Adapters can be trimmed using either Trim galore! or fastp (work in progress).
Genome alignment and quantification: This is a workflow for performing genome alignment using STAR and transcript quantification using Salmon or RSEM (using RSEM’s built-in support for STAR) (work in progress). Alignment sorting and indexing, as well as computation of statistics from the BAM files is performed using Samtools. UMI-based deduplication is also performed.
Post-processing: This is a workflow for duplicate read marking (picard MarkDuplicates), transcript assembly and quantification (StringTie), and creation of bigWig coverage files.
Pseudo alignment and quantification: This is a workflow for performing pseudo alignment and transcript quantification using Salmon or Kallisto.
Final QC: This is a workflow for performing extensive quality control (RSeQC, dupRadar, Qualimap, Preseq, DESeq2, featureCounts). It presents QC for raw reads, alignments, gene biotype, sample similarity, and strand specificity (MultiQC).

Reusing components from biobox

At the moment, this pipeline makes use of the following components from biobox:

gffread
star/star_genome_generate
star/star_align_reads
salmon/salmon_index
salmon/salmon_quant
featurecounts
samtools/samtools_sort
samtools/samtools_index
samtools/samtools_stats
samtools/samtools_flagstat
samtools/samtools_idxstats
multiqc (work in progress - updating assets/multiqc_config.yaml)
fastp (work in progress)
rsem/rsem_prepare_reference (work in progress)
rsem/rsem_calculate_expression (work in progress)

Name		Name	Last commit message	Last commit date
Latest commit History 488 Commits
.vscode		.vscode
bin		bin
examples		examples
src		src
.gitignore		.gitignore
README.md		README.md
README.qmd		README.qmd
_viash.yaml		_viash.yaml
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNAseq.vsh

Rationale

Getting started

Running the pipeline

Pipeline sub-workflows and components

Reusing components from biobox

About

Releases

Packages

Contributors 5

Languages

viash-hub/rnaseq

Folders and files

Latest commit

History

Repository files navigation

RNAseq.vsh

Rationale

Getting started

Running the pipeline

Pipeline sub-workflows and components

Reusing components from biobox

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages