Skip to content

Latest commit

 

History

History
1288 lines (1051 loc) · 78.5 KB

README_ONCOANALYSER.md

File metadata and controls

1288 lines (1051 loc) · 78.5 KB

Oncoanalyser (links: GitHub, nf-core) is a Nextflow implementation of the Hartwig Medical Foundation DNA and RNA sequencing analysis pipeline.

Supported sequencing and sample setups

Data type Sequencing method Paired tumor/normal Tumor-only
DNA Whole genome sequencing (WGS)
DNA Targeted sequencing:
- Whole exome sequencing (WES)
- Panel sequencing
RNA Whole transcriptome sequencing (WTS) -

Pipeline overview

The pipeline uses tools from hmftools (except for bwa-mem2, STAR and Picard MarkDuplicates):

Task Tool
Read alignment bwa-mem2 - DNA
STAR - RNA
Read post-processing REDUX - DNA, duplicate marking and unmapping
Picard MarkDuplicates - RNA, duplicate marking
SNV, MNV, INDEL calling SAGE - Variant calling
PAVE - Transcript/coding effect annotation
SV calling ESVEE
CNV calling AMBER - B-allele frequencies
COBALT - Read depth ratios
PURPLE - Purity/ploidy estimation, variant annotation
SV and driver event interpretation LINX
RNA transcript analysis ISOFOX
Oncoviral detection VIRUSbreakend - Viral content and integration calling
VirusInterpreter - Post-processing
Immune analysis LILAC - HLA typing
NEO - Neo-epitope prediction
Mutational signature fitting SIGS
HRD prediction CHORD
Tissue of origin prediction CUPPA
Summary report ORANGE

Getting started

This section will assume that:

  • The analysis starts from paired tumor/normal BAMs
  • Reads are aligned to the GRCh37 reference genome
  • BAMs contain whole genome sequencing data
  • Docker images are used to run each tool

The user has other options including:

1. Install Nextflow

See: https://www.nextflow.io/docs/latest/install.html

2. Install Docker

See: https://docs.docker.com/engine/install/

3. Set up resource files

Download and extract the reference genome and hmftools resources using these links.

Create a file called resources.config which points to the resource file paths:

params {
   genomes {
      'GRCh37_hmf' {
         fasta         = "/path/to/Homo_sapiens.GRCh37.GATK.illumina.fasta"
         fai           = "/path/to/Homo_sapiens.GRCh37.GATK.illumina.fasta.fai"
         dict          = "/path/to/Homo_sapiens.GRCh37.GATK.illumina.fasta.dict"
         img           = "/path/to/Homo_sapiens.GRCh37.GATK.illumina.fasta.img"
         bwamem2_index = "/path/to/bwa-mem2_index/"
         gridss_index  = "/path/to/gridss_index/"
      }
   }

   ref_data_hmf_data_path = "/path/to/hmf_pipeline_resources/"
}

4. Set up sample sheet

Create a file called sample_sheet.csv which points to the sample inputs:

group_id,subject_id,sample_id,sample_type,sequence_type,filetype,filepath
COLO829,COLO829,COLO829T,tumor,dna,bam,/path/to/COLO829T.dna.bam
COLO829,COLO829,COLO829R,normal,dna,bam,/path/to/COLO829R.dna.bam

BAM and BAI files for the above COLO829 test sample can be downloaded from here.

Tip

Jump to: Sample sheet

5. Run Oncoanalyser with Nextflow

nextflow run nf-core/oncoanalyser \
-profile docker \
-revision pipeline_v6.0 \
-config resources.config \
--mode wgts \
--genome GRCh37_hmf \
--input sample_sheet.csv \
--outdir output/ \
-work-dir output/work

Table of contents

Command line interface

Running Oncoanalyser

We use the nextflow run command to run the Oncoanalyser:

nextflow run nf-core/oncoanalyser \
-profile docker \
-revision 1.0.0 \
-config hmf_pipeline_resources.config \
--mode wgts \
--genome GRCh37_hmf \
--input sample_sheet.csv \
--outdir output/ \
--max_cpus 32 \
--max_memory 128.GB \
-resume

The above command will automatically pull the Oncoanalyser git repo. However, we can point nextflow run to a local Oncoanalyser repo (e.g. one we've manually pulled), which can be useful for debugging. This will run repo with the currently checked out commit and is incompatible with the -revision argument.

nextflow run /path/to/oncoanalyser_repo \
# other arguments

Please section Outputs and Work directory for details on the outputs of Oncoanalyser.

Note

Nextflow-specific arguments start with a single hyphen (-). Oncoanalyser-specific arugments start with two hyphens (--).

Nextflow arguments

All arguments for nextflow run are documented in the CLI reference. The below table lists some relevant ones:

Argument   Description
-config Path to a configuration file. Multiple config files can be provided.
-profile Pre-defined config profile. For Oncoanalyser, can be docker, singularity, test_stub
-latest Pull latest changes before run
-revision A specific Oncoanalyser branch/tag to run. See the Oncoanalyser GitHub for available branches/tags
-resume Resume from cached results (by default the previous run). Useful if you've cancelled a run with CTRL+C, or a run has crashed and you've fixed the issue.
-stub Dry run. Under the hood, Oncoanalyser runs touch <outputfile> rather than actually running the tools. Useful for testing if the arguments and configuration files provided are correct.
-work-dir Path to a directory where Nextflow will put temporary files for each step in the pipeline. If this is not specified, Nextflow will create the work/ directory in the current directory
-help Show all Nextflow command line arguments and their descriptions

Oncoanalyser arguments

The below table lists all arguments that can be passed to Oncoanalyser:

Argument           Description
--input Path to a sample sheet
--outdir1 Path to the output directory. While a process/tool is running, files are temporarily stored in the work directory (see: -work-dir argument). Only when the process completes are the files copied to the output directory.
--mode Can be:
- wgts: Whole genome sequencing and/or whole transcriptome sequencing analysis
- targeted: Targeted sequencing analysis (e.g. for panel or whole exome sequencing)
--genome Reference genome version. Can be GRCh37_hmf or GRCh38_hmf
--panel Panel name, e.g. tso500
--force_panel Required flag when --panel is not tso500 (i.e. force run in targeted mode for non-supported panels)
--max_cpus Enforce an upper limit of CPUs each process can use, e.g. 16
--max_memory Enforce an upper limit of memory available to each process, e.g. 32.GB
--max_time Enforce an upper limit of to the time a process can take, e.g. 240.h
--max_fastq_records When positive, will use fastp to split fastq files so that each resultant fastq file has no more than max_fastq_records records. When nonpositive, fastp is not used and the provided fastq files are passed as-is to the aligner.
--processes_exclude2 A comma separated list specifying which processes to skip (e.g. --processes_exclude lilac,virusinterpreter). Note: Downstream processes depending on the output of an upstream tool will also be skipped.
--processes_include2 When also specifying --processes_manual, a comma separated list specifying which processes to include (e.g. --processes_include lilac,virusinterpreter). See Running specific tools for details on how to set up input files in the sample sheet
--processes_manual Only run processes provided in --processes_include
--prepare_reference_only Only stage reference genome and resource files
--isofox_read_length User defined RNA read length used for ISOFOX
--isofox_gc_ratios User defined ISOFOX expected GC ratios file
--isofox_counts User defined ISOFOX expected counts files (read length dependent)
--isofox_tpm_norm User defined ISOFOX TPM normalisation file for panel data
--isofox_gene_ids User defined ISOFOX gene list file for panel data.
--isofox_functions Semicolon-separated list of ISOFOX functions to run. Default: TRANSCRIPT_COUNTS;ALT_SPLICE_JUNCTIONS;FUSIONS;RETAINED_INTRONS
--fastp_umi Enable UMI processing by fastp
--fastp_umi_location Passed to fastp arg --umi_loc. Can be per_index or per_read
--fastp_umi_length Passed to fastp arg --umi_len. Expected length (number of bases) of the UMI
--fastp_umi_skip Passed to fastp arg --umi_skip. Number of bases to skip following UMI
--redux_umi Enable UMI processing by REDUX
--redux_umi_duplex_delim UMI duplex delimiter as used by REDUX, Default: _
--ref_data_hmf_data_path Path to hmftools resource files
--ref_data_panel_data_path Path to panel resource files
--ref_data_hla_slice_bed Path to HLA slice BED file
--create_stub_placeholders Create placeholders for resource files during stub run
--email Email address for completion summary
--monochrome_logs Do not use coloured log outputs

Notes:

  1. WARNING: Cannot be provided to a config file
  2. Valid process names are: alignment, amber, bamtools, chord, cobalt, cuppa, esvee, isofox, lilac, linx, neo, orange, pave, purple, redux, sage, sigs, virusinterpreter

Sample sheet

The sample sheet is a comma separated table with the following columns:

  • subject_id: Top level grouping
  • group_id: Groups sample_id entries (e.g. group tumor DNA, normal DNA, tumor RNA) into the same analysis
  • sample_id
  • sample_type: tumor or normal
  • sequence_type: dna or rna
  • filetype: bam, bai, fastq, or see Running specific tools for other valid values
  • filepath: Absolute filepath to input file. Can be local filepath, URL, or S3 URI
  • info: Sequencing library and lane info for FASTQ inputs

BAM inputs

Below is an example sample sheet with BAM files for a tumor/normal WGS run:

subject_id,group_id,sample_id,sample_type,sequence_type,filetype,filepath
PATIENT1,PATIENT1,PATIENT1-T,tumor,dna,bam,/path/to/PATIENT1-T.dna.bam
PATIENT1,PATIENT1,PATIENT1-R,normal,dna,bam,/path/to/PATIENT1-R.dna.bam

BAM indexes (.bai files) are expected to be in the same directory as the BAM files. Alternatively, provide the BAM index path by providing bai under column filetype:

subject_id,group_id,sample_id,sample_type,sequence_type,filetype,filepath
PATIENT1,PATIENT1,PATIENT1-T,tumor,dna,bam,/path/to/PATIENT1-T.dna.bam
PATIENT1,PATIENT1,PATIENT1-T,tumor,dna,bai,/path/to/PATIENT1-T.dna.bam.bai
PATIENT1,PATIENT1,PATIENT1-R,normal,dna,bam,/path/to/PATIENT1-R.dna.bam
PATIENT1,PATIENT1,PATIENT1-R,normal,dna,bai,/path/to/PATIENT1-R.dna.bam.bai

FASTQ inputs

Below is an example sample sheet with FASTQ files for a tumor/normal WGS run:

subject_id,group_id,sample_id,sample_type,sequence_type,filetype,filepath,info
PATIENT1,PATIENT1,PATIENT1-T,tumor,dna,fastq,/path/to/PATIENT1-T_S1_L001_R1_001.fastq.gz;/path/to/PATIENT1-T_S1_L001_R2_001.fastq.gz,library_id:S1;lane:001
PATIENT1,PATIENT1,PATIENT1-T,tumor,dna,fastq,/path/to/PATIENT1-T_S1_L002_R1_001.fastq.gz;/path/to/PATIENT1-T_S1_L002_R2_001.fastq.gz,library_id:S1;lane:002
PATIENT1,PATIENT1,PATIENT1-R,normal,dna,fastq,/path/to/PATIENT1-R_S2_L001_R1_001.fastq.gz;/path/to/PATIENT1-R_S2_L001_R2_001.fastq.gz,library_id:S2;lane:001
PATIENT1,PATIENT1,PATIENT1-R,normal,dna,fastq,/path/to/PATIENT1-R_S2_L002_R1_002.fastq.gz;/path/to/PATIENT1-R_S2_L002_R2_001.fastq.gz,library_id:S2;lane:002

Comments:

  • Under info, provide the sequencing library and lane info separated by ;
  • Under filepath, provide the forward ('R1') and reverse ('R2') FASTQ files separated by ;

Note

Only gzip compressed, non-interleaved pair-end FASTQ files are currently supported

Sample modes

Providing sample_type and sequence_type in different combinations allows Oncoanalyser to run in different sample modes. The below sample sheets use BAM files, but different sample modes can also be specified for FASTQ files.

Tumor-only DNA

subject_id,group_id,sample_id,sample_type,sequence_type,filetype,filepath
PATIENT1,PATIENT1,PATIENT1-T,tumor,dna,bam,/path/to/PATIENT1-T.dna.bam

Tumor-only RNA

subject_id,group_id,sample_id,sample_type,sequence_type,filetype,filepath
PATIENT1,PATIENT1,PATIENT1-T-RNA,tumor,rna,bam,/path/to/PATIENT1-T.rna.bam

Tumor/normal DNA, tumor-only RNA

subject_id,group_id,sample_id,sample_type,sequence_type,filetype,filepath
PATIENT1,PATIENT1,PATIENT1-T,tumor,dna,bam,/path/to/PATIENT1-T.dna.bam
PATIENT1,PATIENT1,PATIENT1-R,normal,dna,bam,/path/to/PATIENT1-R.dna.bam
PATIENT1,PATIENT1,PATIENT1-T-RNA,tumor,dna,bam,/path/to/PATIENT1-T.rna.bam

Multiple patients and/or samples

Suppose you have multiple patients, each with one or more biopsies taken from different years.

You could then set:

  • subject_id to the patient ID
  • group_id to the set of samples for a particular year (e.g. PATIENT1-YEAR1)
  • sample_id to the actual sample IDs in the sample set for that year

For example:

subject_id,group_id,sample_id,sample_type,sequence_type,filetype,filepath
PATIENT1,PATIENT1-YEAR1,PATIENT1-YEAR1-T,tumor,dna,bam,/path/to/PATIENT1-YEAR1-T.dna.bam
PATIENT1,PATIENT1-YEAR1,PATIENT1-YEAR1-R,normal,dna,bam,/path/to/PATIENT1-YEAR1-R.dna.bam
PATIENT1,PATIENT1-YEAR2,PATIENT1-YEAR2-T,tumor,dna,bam,/path/to/PATIENT1-YEAR2-T.dna.bam
PATIENT1,PATIENT1-YEAR2,PATIENT1-YEAR2-R,normal,dna,bam,/path/to/PATIENT1-YEAR2-R.dna.bam
PATIENT2,PATIENT2-YEAR1,PATIENT2-YEAR1-T,tumor,dna,bam,/path/to/PATIENT2-YEAR1-T.dna.bam
PATIENT2,PATIENT2-YEAR1,PATIENT2-YEAR1-R,normal,dna,bam,/path/to/PATIENT2-YEAR1-R.dna.bam

Running from REDUX BAM

For DNA sequencing analyses, read alignment with bwa-mem2 and read pre-processing with REDUX are the pipeline steps that take the most time and compute resources. Thus, we can re-run Oncoanalyser from a REDUX BAM if it is already exists, e.g. due to updates to downstream tools from hmftools.

Simply provide the REDUX BAM path, specifying bam_redux under filetype:

group_id,subject_id,sample_id,sample_type,sequence_type,filetype,filepath
PATIENT1,PATIENT1,PATIENT1-T,tumor,dna,bam_redux,/path/to/PATIENT1-T.dna.redux.bam

The *.jitter_params.tsv and *.ms_table.tsv.gz REDUX output files are expected to be in the same directory as the REDUX BAM. If these files are located elsewhere, their paths can also be explicitly provided by specifying redux_jitter_tsv and redux_ms_tsv under filetype:

group_id,subject_id,sample_id,sample_type,sequence_type,filetype,filepath
PATIENT1,PATIENT1,PATIENT1-T,tumor,dna,bam_redux,/path/to/PATIENT1-T.dna.redux.bam
PATIENT1,PATIENT1,PATIENT1-T,tumor,dna,redux_jitter_tsv,/path/to/PATIENT1-T.dna.jitter_params.tsv
PATIENT1,PATIENT1,PATIENT1-T,tumor,dna,redux_ms_tsv,/path/to/PATIENT1-T.dna.ms_table.tsv.gz

Running specific tools

It is possible to run Oncoanalyser from any tool from hmftools. For example, you may want to run CUPPA and already have the outputs from PURPLE, LINX, and VirusInterpreter. In this case, you would provide the outputs from those tools to the sample sheet, specifying entries where filetype is purple_dir, linx_anno_dir, and virusinterpreter_dir:

group_id,subject_id,sample_id,sample_type,sequence_type,filetype,filepath
PATIENT1,PATIENT1,PATIENT1-T,tumor,dna,purple_dir,/path/to/purple/dir/
PATIENT1,PATIENT1,PATIENT1-T,tumor,dna,linx_anno_dir,/path/to/linx/dir/
PATIENT1,PATIENT1,PATIENT1-T,tumor,dna,virusinterpreter_dir,/path/to/virus/dir/

Please see the respective tool READMEs for details on which input data is required.

Below are all valid values for filetype:

Type Values
Raw inputs bam, bai, fastq
REDUX output bam_redux, redux_jitter_tsv, redux_ms_tsv
Other tool outputs amber_dir, bamtools, bamtools_dir, cobalt_dir, esvee_vcf, esvee_vcf_tbi, isofox_dir, lilac_dir, linx_anno_dir, pave_vcf, purple_dir, sage_vcf, sage_vcf_tbi, sage_append_vcf, virusinterpreter_dir
ORANGE inputs chord_dir, sigs_dir, cuppa_dir, linx_plot_dir, sage_dir

Configuration files

Nextflow configuration files can be used to configure Oncoanalyser. This section summarizes concepts of Nextflow configuration files that are relevant for using Oncoanalyser.

For details on specific configurations, please jump to the relevant section:

Note

Configuration is fully detailed in the Nextflow and nf-core documentation

Basic config example

Config items can be declared using blocks, where curly brackets define the scope of the encapsulated config items. The below example has the params scope, with workDir being un-scoped:

params {
   ref_data_hmf_data_path = '/path/to/hmf_pipeline_resources/'
   redux_umi = true
}

workDir = '/path/to/work/'

The above config items can also be compactly re-written with dot syntax like so:

params.ref_data_hmf_data_path = '/path/to/hmf_pipeline_resources/'
params.redux_umi = true

workDir = '/path/to/work/'

Oncoanalyser arguments as config

The params scope can be used to define Oncoanalyser arguments. Running Oncoanalyser with the above example config:

nextflow run nf-core/oncoanalyser \
-config above_example.config \
# other arguments

...is equivalent to running:

nextflow run nf-core/oncoanalyser \
--ref_data_hmf_data_path /path/to/hmf_pipeline_resources/ \
--redux_umi \
# other arguments

The params scope is also used to define reference data paths (e.g. reference genome, hmftools resources) as detailed in Resource files.

Multiple config files

You may want to keep certain configuration items in separate files. For example:

resource_files.config may contain:

params {
   ref_data_hmf_data_path = '/path/to/hmf_pipeline_resources/'
}

...and processes.config may contain:

process {
   withName: 'REDUX.*' {
      cpus = 32
   }
}

You can provide both when running Oncoanalyser like so:

nextflow run nf-core/oncoanalyser \
-config resource_files.config \
-config processes.config \
# other arguments

Resource files

Links

GRCh37

Type Description Name
hmftools hmftools resources hmf_pipeline_resources.37_v6.0--2.tar.gz
Genome FASTA Homo_sapiens.GRCh37.GATK.illumina.fasta
Genome FASTA index Homo_sapiens.GRCh37.GATK.illumina.fasta.fai
Genome FASTA seq dictionary Homo_sapiens.GRCh37.GATK.illumina.fasta.dict
Genome bwa-mem2 index image Homo_sapiens.GRCh37.GATK.illumina.fasta.img
Genome bwa-mem2 index bwa-mem2_index/2.2.1.tar.gz
Genome GRIDSS index gridss_index/2.13.2.tar.gz
Genome (RNA) STAR index star_index/gencode_19/2.7.3a.tar.gz
Panel TSO500 data panels/tso500_5.34_37--1.tar.gz

GRCh38

Type Description Name
hmftools hmftools resources hmf_pipeline_resources.38_v6.0--2.tar.gz
Genome FASTA GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
Genome FASTA index GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.fai
Genome FASTA seq dictionary GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.dict
Genome bwa-mem2 index image GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.img
Genome bwa-mem2 index bwa-mem2_index/2.2.1.tar.gz
Genome GRIDSS index gridss_index/2.13.2.tar.gz
Genome (RNA) STAR index star_index/gencode_38/2.7.3a.tar.gz
Panel TSO500 data panels/tso500_5.34_38--1.tar.gz

Configuring general resource files

The below example shows the most essential config items when configuring resource files. Not all items are required depending on the experimental setup. Please see the inline comments for details.

Note

Single line comments start with //. Multi-line comments start with /* and end with */

params {
   genomes {
   
      'GRCh37_hmf' { // Can be 'GRCh37_hmf' or 'GRCh38_hmf'
      
         fasta         = "/path/to/Homo_sapiens.GRCh37.GATK.illumina.fasta"
         fai           = "/path/to/Homo_sapiens.GRCh37.GATK.illumina.fasta.fai"
         dict          = "/path/to/Homo_sapiens.GRCh37.GATK.illumina.fasta.dict"
         img           = "/path/to/Homo_sapiens.GRCh37.GATK.illumina.fasta.img"
         
         // Required if aligning reads from FASTQ files (can be skipped when running from BAM files)
         bwamem2_index = "/path/to/bwa-mem2_index/"
         
         // Required if running VIRUSbreakend
         gridss_index  = "/path/to/gridss_index/"
         
         // Required only for RNA sequencing data
         star_index    = "/path/to/star_index/"
      }
      
      // Both GRCh37_hmf and GRCh38_hmf entries can be provided!
      'GRCh38_hmf' {
         fasta         = "/path/to/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna"
         // Provide remaining options in a similar manner as for 'GRCh37_hmf' above
      }
      
   }
   
   // Always required
   ref_data_hmf_data_path = "/path/to/hmf_pipeline_resources/"
}

Configuring panel resource files

Running in --mode targeted, requires some additional resources files to be configured:

params {   
   ref_data_panel_data_path = "/path/to/panel_resources/"
   
   // These are relative paths within the dir provided by `ref_data_panel_data_path`
   panel_data_paths {
      
      custom_panel { // This is the name that should be passed to the `--panel` argument
         
         // Can be '37' or '38'
         '37' {
              
            driver_gene_panel           = 'common/custom_panel.driver_gene_panel.tsv'
            sage_actionable_panel       = 'variants/custom_panel.coding_panel.v37.bed.gz'
            sage_coverage_panel         = 'variants/custom_panel.coverage_panel.v37.bed.gz'
            pon_artefacts               = 'variants/custom_panel.sage_pon_artefacts.tsv.gz'
            target_region_bed           = 'custom_panel.panel_regions.v37.bed.gz'
            target_region_normalisation = 'copy_number/custom_panel.cobalt_normalisation.37.tsv'
            target_region_ratios        = 'copy_number/custom_panel.target_regions_ratios.37.tsv'
            target_region_msi_indels    = 'copy_number/custom_panel.target_regions_msi_indels.37.tsv'
            
            // Optional. These can be omitted, or provided a falsy value such as '' or []
            isofox_tpm_norm             = ''
            isofox_gene_ids             = ''
            isofox_counts               = ''
            isofox_gc_ratios            = ''
         }
      }
   }
}

When running Oncoanalyser:

  • Provide both the general and panel resources config files to -config
  • Pass the panel name to --panel. This should match the name defined in the panel resources config file
  • Provide argument --force_panel if --panel is not tso500 (this is currently the only supported panel type)
nextflow run nf-core/oncoanalyser \
--panel custom_panel \
--force_panel \
-config general_resources.config \
-config panel_resources.config \
--mode targeted \
# other arguments

Configuring processes

There are many options for configuration processes. However, this section will go over some common use cases.

Note

Configuration of processes is fully detailed in the nf-core. All options (a.k.a 'directives') for configuring processes are detailed in the Nextflow process reference docs.

Compute resources

Each hmftool is run within a Nextflow process. We can use the process scope and withName to select tools by name and set compute resources options (as well as other config options):

process {
   withName: 'SAGE_SOMATIC' {
      cpus = 32
      memory = 128.GB
      disk = 1024.GB
      time = 48.h
   }
}

Values with a units are provided in quotes with a space or without quotes using a dot, e.g. '128 GB' or 128.GB.

Please see the Nextflow process reference docs to see all possible options. The following links is the documentation for the ones used above: cpus, memory, time, disk.

We can also use a regular expression to select multiple processes. SAGE for example has the processes SAGE_SOMATIC, SAGE_GERMLINE and SAGE_APPEND. We can select all 3 like so:

process {
   withName: 'SAGE.*' {
      cpus = 32
   }
}

Processes are also grouped by compute resource labels, with the main ones being (in order of increasing compute load) process_single, process_low, process_medium and process_high. The labels process_medium_memory and process_high_memory are only used for creating genome indexes. We can use withLabel to set options for all tools with an associated label:

process {
   withLabel: 'process_low' {
      cpus = 2
   }
}

Maximum resources

The maximum resources for any process can also be set using resourceLimits. If a process requests more resources than allowed (e.g. a process requests 64 cores but the largest node in a cluster has 32), the process would normally fail or cause the pipeline to hang forever as it will never be scheduled. Setting resourceLimits will automatically reduce the process resources to comply with the provided limits before the job is submitted.

process {
   resourceLimits = [
      cpus: 32,
      memory: 128.GB,
      time: 48.h
   ]
}

Error handling

We can use errorStrategy and maxRetries to determine how Oncoanalyser proceeds when encountering an error. For example, to retry 3 times on any error for any process, we can provide this config:

process {
   errorStrategy = 'retry'
   maxRetries = 3
}

Valid values for errorStrategy are (details in the Nextflow documentation):

  • retry: Retry the process
  • terminate: Fail the pipeline immediately
  • finish: Terminate after submitted and running processes are done
  • ignore: Allow the pipeline to continue upon error

Process selectors can also be used to target specific processes for error handling:

process {
   withName: 'SAGE_SOMATIC' {
      errorStrategy = 'retry'
      maxRetries = 3
   }
}

Container images

Oncoanalyser by default uses Docker and Singularity images built by the bioconda-recipes Azure CI/CD infrastructure.

Use -profile docker or -profile singularity to tell Oncoanalyser whether to run with Docker or Singularity respectively. For example:

nextflow run nf-core/oncoanalyser 
-profile docker \
# other arguments

Docker images built by Hartwig's Google Cloud CI/CD infrastructure are also available (though not used by default by Oncoanalyser).

Note

Configuration of container images is fully detailed in the nf-core and Nextflow documentation

Docker and singularity image URIs/URLs have consistent patterns:

Source Platform Host URI or URL
Bioconda Docker quay.io Pattern: quay.io/biocontainers/hmftools-{TOOL}:{TAG}
Example: quay.io/biocontainers/hmftools-sage:4.0_beta--hdfd78af_4
Bioconda Singularity Galaxy Project Pattern: https://depot.galaxyproject.org/singularity/hmftools-{tool}:{tag}
Example: https://depot.galaxyproject.org/singularity/hmftools-sage:4.0_beta--hdfd78af_4
Hartwig Docker Dockerhub Pattern: docker.io/hartwigmedicalfoundation/{TOOL}:{TAG}
Example: docker.io/hartwigmedicalfoundation/sage:4.0-rc.2

Bioconda recipes also have a consistent URL pattern:

These patterns are useful to know as the bioconda-recipes, quay.io, and Galaxy Project repos especially have thousands of entries but poor search functionality.

Oncoanalyser/Nextflow automatically pulls Bioconda images at runtime. However, images can also be manually pulled from URIs/URLs. For example:

## Docker: Downloads into your local Docker repository 
docker pull quay.io/biocontainers/hmftools-sage:4.0_beta--hdfd78af_4

## Singularity: Downloads image to a file called 'hmftools-sage:4.0_beta--hdfd78af_4' 
singularity pull https://depot.galaxyproject.org/singularity/hmftools-sage:4.0_beta--hdfd78af_4

Caching Singularity images

Some compute environments, especially HPCs (high performance clusters), grant limited network access which prevents Oncoanalyser/Nextflow from automatically pulling images at runtime. To get around this, we can manually download the Singularity images to a directory:

cd /path/to/cache_dir/

## For the image name provided to the `--name` argument, remove 'https://' and replace '/' with '-'
singularity pull \
--name depot.galaxyproject.org-singularity-hmftools-sage:4.0_beta--hdfd78af_4.img \
https://depot.galaxyproject.org/singularity/hmftools-sage:4.0_beta--hdfd78af_4

## Repeat for all singularity images that Oncoanalyser uses
## singularity pull ...

and set the NXF_SINGULARITY_CACHEDIR environment variable (Nextflow documentation) to tell Oncoanalyser/Nextflow where to look for local images at runtime:

export NXF_SINGULARITY_CACHEDIR='/path/to/cache_dir/'

nextflow run nf-core/oncoanalyser 
-profile singularity \
# ...other arguments

Alternatively, the path to the Singularity cache dir can also be provided to a config file:

singularity {
    cacheDir = '/path/to/cache_dir/'
}

to be passed to nextflow run:

export NXF_SINGULARITY_CACHEDIR='/path/to/cache_dir/'

nextflow run nf-core/oncoanalyser 
-profile singularity \
-config singularity.config \
# ...other arguments

Note

All singularity options are detailed in the Singularity Nextflow documentation

Configuring container images

We can override the default container image used by Oncoanalyser like so:

process {
   withName: 'SAGE.*' {
      container = docker.io/hartwigmedicalfoundation/sage:4.0-rc.2
   }
   
  withName: 'ESVEE.*' {
     container = docker.io/hartwigmedicalfoundation/esvee:1.0-rc.4
  }
}

This is useful for example when you want to use updated container images that are not yet officially supported (e.g. betas or release candidates).

In general, the process names for all hmftools are {TOOL} or {TOOL}_{SUBPROCESS}. For example, SAGE has the processes: SAGE_SOMATIC, SAGE_GERMLINE, SAGE_APPEND. Therefore, use regex suffix .* (e.g. SAGE.*) to capture the subprocesses for each tool.

Outputs

Oncoanalyser writes output files to the below directory tree structure at the path provided by the --outdir argument. Files are grouped by the group_id provided in the sample sheet, then by tool:

output/
├── pipeline_info/
├── group_id_1/
│   ├── alignments/
│   ├── amber/
│   ├── bamtools/
│   ├── chord/
│   ├── cobalt/
│   ├── cuppa/
│   ├── esvee/
│   ├── isofox/
│   ├── lilac/
│   ├── linx/
│   ├── orange/
│   ├── pave/
│   ├── purple/
│   ├── sage/
│   ├── sigs/
│   ├── virusbreakend/
│   └── virusinterpreter/
│   
├── group_id_2/
│   └── ...
│   
...

All intermediate files used by each process are kept in the Nextflow work directory. Once an analysis has completed this directory can be removed.

Pipeline information

Created by Nextflow:

pipeline_info/
├── execution_report_<date_time>.html   # HTML report of execution metrics and details
├── execution_timeline_<date_time>.html # Timeline diagram showing process start/duration/finish
├── execution_trace_<date_time>.txt     # Resource usage
├── pipeline_dag_<date_time>.html       # Pipeline diagram showing how each process is connected

Created by Oncoanalyser:

├── params_<date_time>.json             # Parameters used by the pipeline run
└── software_versions.yml               # Tool versions

Read alignment

No outputs from bwa-mem2 and STAR are published.

Read post-processing

REDUX: Duplicate marking and unmapping

<group_id>/alignments/
├── dna
│   ├── <tumor_dna_id>.jitter_params.tsv         # Microsatellite jitter model parameters
│   ├── <tumor_dna_id>.ms_table.tsv.gz           # Aggregated repeat units and repeat counts
│   ├── <tumor_dna_id>.redux.bam                 # Read alignments
│   ├── <tumor_dna_id>.redux.bam.bai             # Read alignments index
│   ├── <tumor_dna_id>.redux.duplicate_freq.tsv  # Duplicate read frequencies
│   ├── <tumor_dna_id>.repeat.tsv.gz             # Repeat units and repeat counts per site
│   ├── <normal_dna_id>.jitter_params.tsv        # See above
│   ├── <normal_dna_id>.ms_table.tsv.gz          # See above
│   ├── <normal_dna_id>.redux.bam                # See above
│   ├── <normal_dna_id>.redux.bam.bai            # See above
│   ├── <normal_dna_id>.redux.duplicate_freq.tsv # See above
│   └── <normal_dna_id>.repeat.tsv.gz            # See above

Picard MarkDuplicates: Duplicate marking

└── rna
    ├── `<tumor_rna_id>.md.bam`     # Read alignments               
    ├── `<tumor_rna_id>.md.bam.bai` # Read alignments index         
    └── `<tumor_rna_id>.md.metrics` # Duplicate read marking metrics

SNV, MNV, INDEL calling

SAGE: Variant calling

<group_id>/sage
├── somatic
│   ├── <normal_dna_id>.sage.bqr.png            # Normal DNA sample base quality recalibration metrics plot
│   ├── <normal_dna_id>.sage.bqr.tsv            # Normal DNA sample base quality recalibration metrics
│   ├── <tumor_dna_id>.sage.bqr.png             # Tumor DNA sample base quality recalibration metrics plot
│   ├── <tumor_dna_id>.sage.bqr.tsv             # Tumor DNA sample base quality recalibration metrics
│   ├── <tumor_dna_id>.sage.exon.medians.tsv    # Tumor DNA sample exon median depths
│   ├── <tumor_dna_id>.sage.gene.coverage.tsv   # Tumor DNA sample gene coverages
│   ├── <tumor_dna_id>.sage.somatic.vcf.gz      # Tumor DNA sample small variant calls
│   └── <tumor_dna_id>.sage.somatic.vcf.gz.tbi  # Tumor DNA sample small variant calls index
├── germline
│   ├── <normal_dna_id>.sage.bqr.png            # Tumor DNA sample base quality recalibration metrics plot             
│   ├── <normal_dna_id>.sage.bqr.tsv            # Tumor DNA sample base quality recalibration metrics
│   ├── <normal_dna_id>.sage.exon.medians.tsv   # Normal DNA sample exon median depths
│   ├── <normal_dna_id>.sage.gene.coverage.tsv  # Normal DNA sample gene coverages
│   ├── <tumor_dna_id>.sage.bqr.png             # Normal DNA sample base quality recalibration metrics plot
│   ├── <tumor_dna_id>.sage.bqr.tsv             # Normal DNA sample base quality recalibration metrics
│   ├── <tumor_dna_id>.sage.germline.vcf.gz     # Normal DNA sample filtered small variant calls
│   └── <tumor_dna_id>.sage.germline.vcf.gz.tbi # Normal DNA sample filtered small variant calls index
└── append
    ├── <normal_dna_id>.sage.append.vcf.gz      # Normal VCF with SMNVs and RNA data appended
    └── <tumor_dna_id>.sage.append.vcf.gz       # Tumor VCF with SMNVs and RNA data appended

PAVE: Transcript/coding effect annotation

<group_id>/pave/
├── <tumor_dna_id>.sage.germline.pave.vcf.gz     # VCF with annotated germline SAGE SMNVs
├── <tumor_dna_id>.sage.germline.pave.vcf.gz.tbi # VCF index
├── <tumor_dna_id>.sage.somatic.pave.vcf.gz      # VCF with annotated somatic SAGE SMNVs
└── <tumor_dna_id>.sage.somatic.pave.vcf.gz.tbi  # VCF index

SV calling

ESVEE: Variant calling

<group_id>/esvee/
├── prep
│   ├── <tumor_dna_id>.esvee.prep.bam                 # BAM with candidate SV reads 
│   ├── <tumor_dna_id>.esvee.prep.bam.bai             # BAM index
│   ├── <tumor_dna_id>.esvee.prep.disc_stats.tsv      # Discordant reads stats
│   ├── <tumor_dna_id>.esvee.prep.fragment_length.tsv # Fragment length stats
│   ├── <tumor_dna_id>.esvee.prep.junction.tsv        # Candidate junctions
│   ├── <normal_dna_id>.esvee.prep.bam                # BAM with candidate SV reads
│   └── <normal_dna_id>.esvee.prep.bam.bai            # BAM index
├── assemble
│   ├── <tumor_dna_id>.esvee.assembly.tsv             # Breakend assemblies
│   ├── <tumor_dna_id>.esvee.alignment.tsv            # Assemblies realigned to the ref genome
│   ├── <tumor_dna_id>.esvee.breakend.tsv             #
│   ├── <tumor_dna_id>.esvee.phased_assembly.tsv      #
│   ├── <tumor_dna_id>.esvee.raw.vcf.gz               # VCF with candidate breakends
│   └── <tumor_dna_id>.esvee.raw.vcf.gz.tbi           # VCF with candidate breakends
├── depth_annotation
│   ├── <tumor_dna_id>.esvee.ref_depth.vcf.gz         # VCF annotated with normal sample read depths
│   └── <tumor_dna_id>.esvee.ref_depth.vcf.gz.tbi     # VCF index
└── caller
    ├── <tumor_dna_id>.esvee.germline.vcf.gz          # VCF with germline breakends
    ├── <tumor_dna_id>.esvee.germline.vcf.gz.tbi      # VCF index
    ├── <tumor_dna_id>.esvee.somatic.vcf.gz           # VCF with somatic breakends
    ├── <tumor_dna_id>.esvee.somatic.vcf.gz.tbi       # VCF index
    ├── <tumor_dna_id>.esvee.unfiltered.vcf.gz        # VCF with unfiltered breakends
    └── <tumor_dna_id>.esvee.unfiltered.vcf.gz.tbi    # VCF index

CNV calling

AMBER: B-allele frequencies

<group_id>/amber/
├── <tumor_dna_id>.amber.baf.pcf                  # Piecewise constant fit on B-allele frequencies
├── <tumor_dna_id>.amber.baf.tsv.gz               # B-allele frequencies
├── <tumor_dna_id>.amber.contamination.tsv        # Contamination TSV
├── <tumor_dna_id>.amber.contamination.vcf.gz     # Contamination sites
├── <tumor_dna_id>.amber.contamination.vcf.gz.tbi # Sample contamination sites index
├── <tumor_dna_id>.amber.qc                       # QC file
├── <normal_dna_id>.amber.homozygousregion.tsv    # Regions of homozygosity
├── <normal_dna_id>.amber.snp.vcf.gz              # SNP sites VCF
├── <normal_dna_id>.amber.snp.vcf.gz.tbi          # VCF index
└── amber.version                                 # Tool version

COBALT: Read depth ratios

<group_id>/cobalt/
├── <tumor_dna_id>.cobalt.gc.median.tsv     # GC median read depths
├── <tumor_dna_id>.cobalt.ratio.pcf         # Piecewise constant fit
├── <tumor_dna_id>.cobalt.ratio.tsv.gz      # Read counts and ratios (with reference or supposed diploid)
├── <normal_dna_id>.cobalt.gc.median.tsv    # GC median read depths
├── <normal_dna_id>.cobalt.ratio.median.tsv # Chromosome median ratios  
├── <normal_dna_id>.cobalt.ratio.pcf        # Piecewise constant fit
└── cobalt.version                          # Tool version

PURPLE: Purity/ploidy estimation, variant annotation

<group_id>/purple/
├── <tumor_dna_id>.purple.cnv.gene.tsv                # Somatic gene copy number
├── <tumor_dna_id>.purple.cnv.somatic.tsv             # Copy number variant segments
├── <tumor_dna_id>.purple.driver.catalog.germline.tsv # Germline DNA sample driver events
├── <tumor_dna_id>.purple.driver.catalog.somatic.tsv  # Somatic DNA sample driver events
├── <tumor_dna_id>.purple.germline.deletion.tsv       # Germline DNA deletions
├── <tumor_dna_id>.purple.germline.vcf.gz             # Germline SAGE SMNVs with PURPLE annotations
├── <tumor_dna_id>.purple.germline.vcf.gz.tbi         # VCF index
├── <tumor_dna_id>.purple.purity.range.tsv            # Purity/ploidy model fit scores across a range of purity values
├── <tumor_dna_id>.purple.purity.tsv                  # Purity/ploidy summary
├── <tumor_dna_id>.purple.qc                          # QC file
├── <tumor_dna_id>.purple.segment.tsv                 # Genomic copy number segments
├── <tumor_dna_id>.purple.somatic.clonality.tsv       # Clonality peak model data
├── <tumor_dna_id>.purple.somatic.hist.tsv            # Somatic variants histogram data
├── <tumor_dna_id>.purple.somatic.vcf.gz              # Tumor SAGE SMNVs with PURPLE annotations
├── <tumor_dna_id>.purple.somatic.vcf.gz.tbi          # VCF index
├── <tumor_dna_id>.purple.sv.germline.vcf.gz          # Germline ESVEE SVs with PURPLE annotations
├── <tumor_dna_id>.purple.sv.germline.vcf.gz.tbi      # VCF index
├── <tumor_dna_id>.purple.sv.vcf.gz                   # Somatic ESVEE SVs with PURPLE annotations
├── <tumor_dna_id>.purple.sv.vcf.gz.tbi               # VCF index
├── circos/         # Circos plot data
├── plot/           # PURPLE plots
└── purple.version  # Tool version

SV and driver event interpretation

LINX: SV and driver event interpretation

<group_id>/linx/
├── germline_annotations
│   ├── <tumor_dna_id>.linx.germline.breakend.tsv       # Normal sample breakend data
│   ├── <tumor_dna_id>.linx.germline.clusters.tsv       # Normal sample clustered events
│   ├── <tumor_dna_id>.linx.germline.disruption.tsv     # 
│   ├── <tumor_dna_id>.linx.germline.driver.catalog.tsv # Normal sample driver events
│   ├── <tumor_dna_id>.linx.germline.links.tsv          # 
│   ├── <tumor_dna_id>.linx.germline.svs.tsv            #
│   └── linx.version                                    # Tool version
├── somatic_annotations
│   ├── <tumor_dna_id>.linx.breakend.tsv                # Tumor sample breakend data
│   ├── <tumor_dna_id>.linx.clusters.tsv                # Tumor sample clustered events
│   ├── <tumor_dna_id>.linx.driver.catalog.tsv          # Tumor sample driver events
│   ├── <tumor_dna_id>.linx.drivers.tsv                 #
│   ├── <tumor_dna_id>.linx.fusion.tsv                  # Tumor sample fusions
│   ├── <tumor_dna_id>.linx.links.tsv                   #
│   ├── <tumor_dna_id>.linx.neoepitope.tsv              #
│   ├── <tumor_dna_id>.linx.svs.tsv                     #
│   ├── <tumor_dna_id>.linx.vis_copy_number.tsv         #
│   ├── <tumor_dna_id>.linx.vis_fusion.tsv              #
│   ├── <tumor_dna_id>.linx.vis_gene_exon.tsv           #
│   ├── <tumor_dna_id>.linx.vis_protein_domain.tsv      #
│   ├── <tumor_dna_id>.linx.vis_segments.tsv            #
│   ├── <tumor_dna_id>.linx.vis_sv_data.tsv             #
│   └── linx.version
└── somatic_plots
    ├── all
    │   └── <tumor_dna_id>.*.png # All cluster plots
    └── reportable
        └── <tumor_dna_id>.*.png # Driver cluster plots

RNA transcript analysis

ISOFOX

<group_id>/isofox/
├── <tumor_rna_id>.isf.alt_splice_junc.csv # Alternative splice junctions
├── <tumor_rna_id>.isf.fusions.csv         # Fusions, unfiltered
├── <tumor_rna_id>.isf.gene_collection.csv # Gene-collection fragment counts
├── <tumor_rna_id>.isf.gene_data.csv       # Gene fragment counts
├── <tumor_rna_id>.isf.pass_fusions.csv    # Fusions, filtered
├── <tumor_rna_id>.isf.retained_intron.csv # Retained introns
├── <tumor_rna_id>.isf.summary.csv         # Analysis summary
└── <tumor_rna_id>.isf.transcript_data.csv # Transcript fragment counts

Oncoviral detection

VIRUSBreakend: Viral content and integration calling

<group_id>/virusbreakend/
├── <tumor_dna_id>.virusbreakend.vcf             # VCF with viral integration sites
└── <tumor_dna_id>.virusbreakend.vcf.summary.tsv # Analysis summary

VirusInterpreter: Post-processing

<group_id>/virusinterpreter/
└── <tumor_dna_id>.virus.annotated.tsv # Processed oncoviral call/annotation data

Immune analysis

LILAC: HLA typing

<group_id>/lilac/
├── <tumor_dna_id>.lilac.candidates.coverage.tsv # Coverage of high scoring candidates
├── <tumor_dna_id>.lilac.qc.tsv                  # QC file
└── <tumor_dna_id>.lilac.tsv                     # Analysis summary

NEO: Neo-epitope prediction

<group_id>/neo/
├── <tumor_dna_id>.lilac.candidates.coverage.tsv # Coverage of high scoring candidates
├── <tumor_dna_id>.lilac.qc.tsv                  # QC file
└── <tumor_dna_id>.lilac.tsv                     # Analysis summary

Mutational signature fitting

SIGS

sigs/
├── <tumor_dna_id>.sig.allocation.tsv
└── <tumor_dna_id>.sig.snv_counts.csv

HRD prediction

CHORD

<group_id>/chord/
├── <tumor_dna_id>.chord.mutation_contexts.tsv # Counts of mutation types
└── <tumor_dna_id>.chord.prediction.tsv        # HRD predictions

Tissue of origin prediction

CUPPA

<group_id>/cuppa/
├── <tumor_dna_id>.cuppa.pred_summ.tsv # Prediction summary               
├── <tumor_dna_id>.cuppa.vis.png       # Prediction visualisation         
├── <tumor_dna_id>.cuppa.vis_data.tsv  # Prediction visualisation raw data
└── <tumor_dna_id>.cuppa_data.tsv.gz   # Input features                   

Summary report

ORANGE

<group_id>/orange/
├── <tumor_dna_id>.orange.pdf # Results of all tools as a PDF
└── <tumor_dna_id>.orange.json # Result raw data

Work directory

When running Oncoanalyser, a work directory (default: <current_directory>/work/) is created that contains the input files, output files, and run logs for a particular tool. Once the tool is done running, the output files are 'published' (copied) to the final output directory.

The work directory has the below structure:

work/
├── 06
│   └── e6f7613f50bdca27662f3d256c09e1
├── 0a
│   └── 9acb05051afef00264593f36058180
├── 1a
│   └── 9997df2e2e9978ec24b5f8e8a7bb3c
...

The subdirectory names are hashes and correspond to those shown in the console when running Oncoanalyser. For example, 0a/9acb05 as shown below is shorthand for work/06/e6f7613f50bdca27662f3d256c09e1 as shown above, and corresponds to the COBALT_PROFILING:COBALT process.

Tip

Use Tab to auto-complete directory names when navigating the work directory

...
executor >  local (28)
[-        ] process > NFCORE_ONCOANALYSER:WGTS:READ_ALIGNMENT_DNA:FASTP                            -
[-        ] process > NFCORE_ONCOANALYSER:WGTS:READ_ALIGNMENT_DNA:BWAMEM2_ALIGN                    -
[-        ] process > NFCORE_ONCOANALYSER:WGTS:READ_ALIGNMENT_RNA:STAR_ALIGN                       -
[-        ] process > NFCORE_ONCOANALYSER:WGTS:READ_ALIGNMENT_RNA:SAMTOOLS_SORT                    -
[-        ] process > NFCORE_ONCOANALYSER:WGTS:READ_ALIGNMENT_RNA:SAMBAMBA_MERGE                   -
[-        ] process > NFCORE_ONCOANALYSER:WGTS:READ_ALIGNMENT_RNA:GATK4_MARKDUPLICATES             -
[48/aa4d5c] process > NFCORE_ONCOANALYSER:WGTS:REDUX_PROCESSING:REDUX (<group_id>_<sample_id>)     [100%] 2 of 2 ✔
[2c/2acf23] process > NFCORE_ONCOANALYSER:WGTS:ISOFOX_QUANTIFICATION:ISOFOX (<group_id>)           [100%] 1 of 1 ✔
[0a/9acb05] process > NFCORE_ONCOANALYSER:WGTS:AMBER_PROFILING:AMBER (<group_id>)                  [100%] 1 of 1 ✔
[06/e6f761] process > NFCORE_ONCOANALYSER:WGTS:COBALT_PROFILING:COBALT (<group_id>)                [100%] 1 of 1 ✔
[7c/828af1] process > NFCORE_ONCOANALYSER:WGTS:ESVEE_CALLING:ESVEE_PREP (<group_id>)               [100%] 1 of 1 ✔
[e1/182433] process > NFCORE_ONCOANALYSER:WGTS:ESVEE_CALLING:ESVEE_ASSEMBLE (<group_id>)           [100%] 1 of 1 ✔
[76/0da3ee] process > NFCORE_ONCOANALYSER:WGTS:ESVEE_CALLING:ESVEE_DEPTH_ANNOTATOR (<group_id>)    [100%] 1 of 1 ✔
[41/49f1f8] process > NFCORE_ONCOANALYSER:WGTS:ESVEE_CALLING:ESVEE_CALL (<group_id>)               [100%] 1 of 1 ✔
[ce/0f6b20] process > NFCORE_ONCOANALYSER:WGTS:SAGE_CALLING:GERMLINE (<group_id>)                  [100%] 1 of 1 ✔
[5e/be6aab] process > NFCORE_ONCOANALYSER:WGTS:SAGE_CALLING:SOMATIC (<group_id>)                   [100%] 1 of 1 ✔
[45/88540d] process > NFCORE_ONCOANALYSER:WGTS:PAVE_ANNOTATION:GERMLINE (<group_id>)               [100%] 1 of 1 ✔
[e2/279465] process > NFCORE_ONCOANALYSER:WGTS:PAVE_ANNOTATION:SOMATIC (<group_id>)                [100%] 1 of 1 ✔
[ff/37883b] process > NFCORE_ONCOANALYSER:WGTS:PURPLE_CALLING:PURPLE (<group_id>)                  [100%] 1 of 1 ✔
[d0/7ebc71] process > NFCORE_ONCOANALYSER:WGTS:SAGE_APPEND:GERMLINE (<group_id>)                   [100%] 1 of 1 ✔
[1c/0b3f55] process > NFCORE_ONCOANALYSER:WGTS:SAGE_APPEND:SOMATIC (<group_id>)                    [100%] 1 of 1 ✔
[87/0118e3] process > NFCORE_ONCOANALYSER:WGTS:LINX_ANNOTATION:GERMLINE (<group_id>)               [100%] 1 of 1 ✔
[1a/9997df] process > NFCORE_ONCOANALYSER:WGTS:LINX_ANNOTATION:SOMATIC (<group_id>)                [100%] 1 of 1 ✔
[a8/22db2b] process > NFCORE_ONCOANALYSER:WGTS:LINX_PLOTTING:VISUALISER (<group_id>)               [100%] 1 of 1 ✔
[dc/da6010] process > NFCORE_ONCOANALYSER:WGTS:BAMTOOLS_METRICS:BAMTOOLS (<group_id>_<sample_id>)  [100%] 2 of 2 ✔
[b5/5c54f6] process > NFCORE_ONCOANALYSER:WGTS:SIGS_FITTING:SIGS (<group_id>)                      [100%] 1 of 1 ✔
[71/701751] process > NFCORE_ONCOANALYSER:WGTS:CHORD_PREDICTION:CHORD (<group_id>)                 [100%] 1 of 1 ✔
[bc/6191b2] process > NFCORE_ONCOANALYSER:WGTS:LILAC_CALLING:LILAC (<group_id>)                    [100%] 1 of 1 ✔
[51/153ee1] process > NFCORE_ONCOANALYSER:WGTS:VIRUSBREAKEND_CALLING:VIRUSBREAKEND (<group_id>)    [100%] 1 of 1 ✔
[88/fee470] process > NFCORE_ONCOANALYSER:WGTS:VIRUSBREAKEND_CALLING:VIRUSINTERPRETER (<group_id>) [100%] 1 of 1 ✔
[28/6e9733] process > NFCORE_ONCOANALYSER:WGTS:CUPPA_PREDICTION:CUPPA (<group_id>)                 [100%] 1 of 1 ✔
[e0/2e5797] process > NFCORE_ONCOANALYSER:WGTS:ORANGE_REPORTING:ORANGE (<group_id>)                [100%] 1 of 1 ✔
...

Below is an example of the contents of the COBALT_PROFILING:COBALT process work directory.

work/06/
└── e6f7613f50bdca27662f3d256c09e1
    ├── .command.begin
    ├── .command.err
    ├── .command.log
    ├── .command.out
    ├── .command.run
    ├── .command.sh
    ├── .command.trace
    ├── .exitcode
    ├── <normal_dna_id>.redux.bam ->  /path/to/work/32/6d0191b876479d1a0c3c4a4c39733d/<normal_dna_id>.redux.bam
    ├── <normal_dna_id>.redux.bam.bai ->  /path/to/work/32/6d0191b876479d1a0c3c4a4c39733d/<normal_dna_id>.redux.bam.bai
    ├── <tumor_dna_id>.redux.bam ->  /path/to/work/48/aa4d5cecc431bfe3fef5e85d922272/<tumor_dna_id>.redux.bam
    ├── <tumor_dna_id>.redux.bam.bai ->  /path/to/work/48/aa4d5cecc431bfe3fef5e85d922272/<tumor_dna_id>.redux.bam.bai
    ├── GC_profile.1000bp.37.cnp -> /path/to/hmftools/dna/copy_number/GC_profile.1000bp.37.cnp
    ├── cobalt
    │   ├── <normal_dna_id>.cobalt.gc.median.tsv
    │   ├── <normal_dna_id>.cobalt.ratio.median.tsv
    │   ├── <normal_dna_id>.cobalt.ratio.pcf
    │   ├── <tumor_dna_id>.cobalt.gc.median.tsv
    │   ├── <tumor_dna_id>.cobalt.ratio.pcf
    │   ├── <tumor_dna_id>.cobalt.ratio.tsv.gz
    │   └── cobalt.version
    └── versions.yml

Tool work directories have a consistent structure:

  • .command.sh: Bash command used to run the tool within the Docker/Singularity container
  • .command.log, .command.err, .command.out: Run logs
  • versions.yml: Tool version
  • Tool outputs generally are written to a directory of the same name (e.g. cobalt/)
  • Input files are symlinked into the tool work directory (e.g. <tumor_dna_id>.redux.bam -> ...). This is done so that under the hood the tool work directory can simply be mounted within the container.

Acknowledgements

Oncoanalyser was written by Stephen Watts at the University of Melbourne Centre for Cancer Research with the support of Oliver Hofmann and the Hartwig Medical Foundation Australia.