Skip to content

pipelines_immuno.cwl

APipe Tester edited this page Dec 12, 2022 · 40 revisions

Documentation for immuno.cwl

This page is auto-generated. Do not edit.

Overview

Immunotherapy Workflow

Inputs

Name Label Description Type Secondary Files
reference_annotation Annotated transcripts in GTF format File
rna_sequence Raw data from rna sequencing; this custom type holds both the data file(s) and readgroup information. Data file(s) may be either a bam file, or paired fastqs. Readgroup information should be given as a series of key:value pairs, each separated by a space. This means that spaces within a value must be double quoted. The first key must be ID; consult the read group description in the header section of the SAM file specification for other, optional keys. Below is an example of an element of the input array: readgroup: "ID:xxx PU:xxx SM:xxx LB:xxx PL:ILLUMINA CN:WUGSC" sequence: fastq1: class: File path: /path/to/reads1.fastq fastq2: class: File path: /path/to/reads2.fastq OR bam: class: File path: /path/to/reads.bam ../types/sequence_data.yml#sequence_data[]
rna_sample_name string
trimming_adapters File
trimming_adapter_trim_end string
trimming_adapter_min_overlap int
trimming_max_uncalled int
trimming_min_readlength int
kallisto_index File
gene_transcript_lookup_table File
strand ['null', {'type': 'enum', 'symbols': ['first', 'second', 'unstranded']}]
refFlat File
ribosomal_intervals File
star_aligner_genome_dir Path to the directory where STAR aligner genome indices are stored. Consult the STAR manual to generate new indices. ['string', 'Directory']
star_fusion_genome_dir Path to the directory where STAR fusion resources are stored. This includes a reference genome and corresponding protein coding gene annotation set. These may be downloaded or generated by a helper script; see the star-fusion manual for more info. ['string', 'Directory']
examine_coding_effect Describe the effect of predicted fusions on coding regions boolean?
fusioninspector_mode If/how to validate fusion transcripts; see STAR-Fusion manual for more info ['null', {'type': 'enum', 'symbols': ['inspect', 'validate']}]
cdna_fasta Fasta with transcripts, used to verify strand File
agfusion_database agfusion reference database File
agfusion_annotate_noncanonical Annotate all gene transcripts, not just canonical isoforms boolean?
reference reference: Reference fasta file for a desired assembly reference contains the nucleotide sequence for a given assembly (hg37, hg38, etc.) in fasta format for the entire genome. This is what reads will be aligned to. Appropriate files can be found on ensembl at https://ensembl.org/info/data/ftp/index.html When providing the reference secondary files corresponding to reference indices must be located in the same directory as the reference itself. These files can be created with samtools index, bwa index, and picard CreateSequenceDictionary. ['string', 'File'] ['.fai', '^.dict', '.amb', '.ann', '.bwt', '.pac', '.sa']
tumor_sequence tumor_sequence: MT sequencing data and readgroup information tumor_sequence represents the sequencing data for the MT sample as either FASTQs or BAMs with accompanying readgroup information. The readgroup field should contain an entire read group header line, as described in the SAM file specification. This is a list of strings, beginning with @RG and followed by key:value pairs; each element of the list should be separated by a tab (\t). Keys ID and SM are required; see below for a formatting example: readgroup: "@RG\tID:xxx\tSM:xx" sequence: fastq1: class: File path: /path/to/reads1.fastq fastq2: class: File path: /path/to/reads2.fastq OR bam: class: File path: /path/to/reads.bam ../types/sequence_data.yml#sequence_data[]
tumor_filename tumor/MT aligned bam filename the filename to be used for bam files produced by the pipeline containing aligned tumor/mutant reads string?
normal_sequence normal_sequence: WT sequencing data and readgroup information normal_sequence represents the sequencing data for the WT sample as either FASTQs or BAMs with accompanying readgroup information. The readgroup field should contain an entire read group header line, as described in the SAM file specification. This is a list of strings, beginning with @RG and followed by key:value pairs; each element of the list should be separated by a tab (\t). Keys ID and SM are required; see below for a formatting example: readgroup: "@RG\tID:xxx\tSM:xx" sequence: fastq1: class: File path: /path/to/reads1.fastq fastq2: class: File path: /path/to/reads2.fastq OR bam: class: File path: /path/to/reads.bam ../types/sequence_data.yml#sequence_data[]
normal_filename normal/WT aligned bam filename the filename to be used for bam files produced by the pipeline containing aligned normal/wild-type reads string?
bqsr_known_sites bqsr_known_sites: One or more databases of known polymorphic sites used to exclude regions around known polymorphisms from analysis. Known polymorphic indels recommended by GATK for a variety of tools including the BaseRecalibrator. This is part of the GATK resource bundle available at http://www.broadinstitute.org/gatk/guide/article?id=1213 File should be in vcf format, and tabix indexed. File[] ['.tbi']
bqsr_intervals bqsr_intervals: Array of strings specifying regions for base quality score recalibration bqsr_intervals provides an array of genomic intervals for which to apply GATK base quality score recalibrations. Typically intervals are given for the entire chromosome (chr1, chr2, etc.), these names should match the format in the reference file. string[]
bait_intervals bait_intervals: interval_list file of baits used in the sequencing experiment bait_intervals is an interval_list corresponding to the baits used in sequencing reagent. These are essentially coordinates for regions you were able to design probes for in the reagent. Typically the reagent provider has this information available in bed format and it can be converted to an interval_list with Picard BedToIntervalList. AstraZeneca also maintains a repo of baits for common sequencing reagents available at https://github.com/AstraZeneca-NGS/reference_data File
target_intervals target_intervals: interval_list file of targets used in the sequencing experiment target_intervals is an interval_list corresponding to the targets for the capture reagent. BED files with this information can be converted to interval_lists with Picard BedToIntervalList. In general for a WES exome reagent bait_intervals and target_intervals are the same. File
target_interval_padding target_interval_padding: number of bp flanking each target region in which to allow variant calls The effective coverage of capture products generally extends out beyond the actual regions targeted. This parameter allows variants to be called in these wingspan regions, extending this many base pairs from each side of the target regions. int
per_base_intervals ../types/labelled_file.yml#labelled_file[]
per_target_intervals ../types/labelled_file.yml#labelled_file[]
summary_intervals ../types/labelled_file.yml#labelled_file[]
omni_vcf File ['.tbi']
picard_metric_accumulation_level string
qc_minimum_mapping_quality int?
qc_minimum_base_quality int?
strelka_cpu_reserved int?
scatter_count scatters each supported variant detector (varscan, pindel, mutect) into this many parallel jobs int
mutect_artifact_detection_mode boolean
mutect_max_alt_allele_in_normal_fraction float?
mutect_max_alt_alleles_in_normal_count int?
varscan_strand_filter int?
varscan_min_coverage int?
varscan_min_var_freq float?
varscan_p_value float?
varscan_max_normal_freq float?
pindel_insert_size int
docm_vcf Common mutations in cancer that will be genotyped and passed through into the merged VCF if they have even low-level evidence of a mutation (by default, marked with filter DOCM_ONLY) File ['.tbi']
filter_docm_variants Determines whether variants found only via genotyping of DOCM sites will be filtered (as DOCM_ONLY) or passed through as variant calls boolean?
vep_cache_dir ['string', 'Directory']
vep_ensembl_assembly genome assembly to use in vep. Examples: GRCh38 or GRCm38 string
vep_ensembl_version ensembl version - Must be present in the cache directory. Example: 95 string
vep_ensembl_species ensembl species - Must be present in the cache directory. Examples: homo_sapiens or mus_musculus string
synonyms_file File?
annotate_coding_only boolean?
vep_pick ['null', {'type': 'enum', 'symbols': ['pick', 'flag_pick', 'pick_allele', 'per_gene', 'pick_allele_gene', 'flag_pick_allele', 'flag_pick_allele_gene']}]
cle_vcf_filter boolean
variants_to_table_fields string[]
variants_to_table_genotype_fields string[]
vep_to_table_fields string[]
vep_custom_annotations custom type, check types directory for input format ../types/vep_custom_annotation.yml#vep_custom_annotation[]
manta_call_regions File? ['.tbi']
manta_non_wgs boolean?
manta_output_contigs boolean?
somalier_vcf File
validated_variants An optional VCF with variants that will be flagged as 'VALIDATED' if found in this pipeline's main output VCF File? ['.tbi']
gvcf_gq_bands string[]
gatk_haplotypecaller_intervals {'type': 'array', 'items': {'type': 'array', 'items': 'string'}}
ploidy int?
optitype_name string?
reference_dict File
clinical_mhc_classI_alleles Clinical HLA typing results, limited to MHC Class I alleles; element format: HLA-X*01:02[/HLA-X...] used to provide clinical HLA typing results in the format HLA-X*01:02[/HLA-X...] when available. string[]?
clinical_mhc_classII_alleles Clinical HLA typing results, limited to MHC Class II alleles used to provide clinical HLA typing results; separated from class I due to nomenclature inconsistencies string[]?
hla_source_mode Source for HLA types used for epitope prediction: in silico and clinical, or just clinical Control whether HLA types passed to pvacseq should be a consensus of optitype predictions and clinical calls, if provided, or if only clinical calls should be used. In this case, optitype predictions and mismatches between optitype and clinical calls will still be reported. Selecting clinical_only without providing clinical calls will result in an error. {'type': 'enum', 'symbols': ['consensus', 'clinical_only']}
readcount_minimum_base_quality int?
readcount_minimum_mapping_quality int?
prediction_algorithms string[]
epitope_lengths_class_i int[]?
epitope_lengths_class_ii int[]?
binding_threshold int?
percentile_threshold int?
allele_specific_binding_thresholds boolean?
minimum_fold_change float?
top_score_metric ['null', {'type': 'enum', 'symbols': ['lowest', 'median']}]
additional_report_columns ['null', {'type': 'enum', 'symbols': ['sample_name']}]
expression_tool string?
fasta_size int?
downstream_sequence_length string?
exclude_nas boolean?
phased_proximal_variants_vcf File? ['.tbi']
maximum_transcript_support_level ['null', {'type': 'enum', 'symbols': ['1', '2', '3', '4', '5']}]
normal_cov int?
tdna_cov int?
trna_cov int?
normal_vaf float?
tdna_vaf float?
trna_vaf float?
expn_val float?
net_chop_method net_chop_method: NetChop prediction method to use ('cterm' for C term 3.0, '20s' for 20S 3.0) net_chop_method is used to specify which NetChop prediction method to use ("cterm" for C term 3.0, "20s" for 20S 3.0). C-term 3.0 is trained with publicly available MHC class I ligands and the authors believe that is performs best in predicting the boundaries of CTL epitopes. 20S is trained with in vitro degradation data. ['null', {'type': 'enum', 'symbols': ['cterm', '20s']}]
net_chop_threshold net_chop_threshold: NetChop prediction threshold net_chop_threshold specifies the threshold to use for NetChop prediction; increasing the threshold results in better specificity, but worse sensitivity. float?
netmhc_stab netmhc_stab: sets an option whether to run NetMHCStabPan or not netmhc_stab sets an option that decides whether it will run NetMHCStabPan after all filtering and add stability predictions to predicted epitopes. boolean?
run_reference_proteome_similarity run_reference_proteome_similarity: sets an option whether to run reference proteome similarity or not run_reference_proteome_similarity sets an option that decides whether it will run reference proteome similarity after all filtering and BLAST peptide sequences against the reference proteome to see if they appear elsewhere in the proteome. boolean?
blastp_db blastp_db: sets the reference proteome database to use with BLASTp blastp_db sets the reference proteome database to use with BLASTp when enabling run_reference_proteome_similarity ['null', {'type': 'enum', 'symbols': ['refseq_select_prot', 'refseq_protein']}]
pvacseq_threads pvacseq_threads: Number of threads to use for parallelizing pvacseq prediction pvacseq_threads specifies the number of threads to use for parallelizing peptide-MHC binding prediction calls. int?
tumor_sample_name tumor_sample_name: Name of the tumor sample tumor_sample_name is the name of the tumor sample being processed. When processing a multi-sample VCF the sample name must be a sample ID in the input VCF #CHROM header line. string
normal_sample_name normal_sample_name: Name of the normal sample normal_sample_name is the name of the normal sample to use for phasing of germline variants. string
tumor_purity float?
iedb_retries int?
pvacfuse_keep_tmp_files boolean?
reference_genome_name string?
dna_sequencing_platform string?
dna_sequencing_instrument string?
dna_sequencing_kit string?
dna_sequencing_type string?
dna_single_or_paired_end string?
normal_dna_spike_in_error_rate string?
tumor_dna_spike_in_error_rate string?
normal_dna_total_DNA string?
tumor_dna_total_DNA string?
rna_sequencing_platform string?
rna_sequencing_instrument string?
rna_sequencing_kit string?
rna_sequencing_type string?
rna_single_or_paired_end string?
rna_spike_in_error_rate string?
rna_total_RNA string?
rna_RIN_score string?
rna_freq_normalization_method string?
rna_annotation_file string?

Outputs

Name Label Description Type Secondary Files
final_bigwig File
final_bam Sorted BAM from tumor RNA Sorted BAM file of sequencing read alignments by STAR with duplicate reads tagged File ['.bai']
stringtie_transcript_gtf Transcript GTF assembled from tumor RNA by StringTie GTF file containing the transcripts assembled from the tumor RNA sample, created by StringTie File
stringtie_gene_expression_tsv Gene abundance table from tumor RNA by StringTie Tab-delimited file containing gene abundances in FPKM and TPM, created by StringTie File
transcript_abundance_tsv Transcript-level abundance table by kallisto Tab-delimited file containing transcript-level abundance estimates in TPM, created by kallisto File
transcript_abundance_h5 Transcript-level abundance table in HDF5 format by kallisto HDF5 binary file containing transcript-level abundance esimates, bootstrap estimate, and so on, created by kallisto File
gene_abundance Gene-level abundance output by tximport with kallisto output Tab-delimited file containing the abundance estimates summarized in the gene level with kallisto output by Bioconductor tximport tool File
metrics RNA-seq Diagnosis/quality metrics from tumor RNA RNA-seq Diagnosis/quality metrics showing the distribution of the bases within the transcripts, created by picard CollectRnaSeqMetrics tool File
chart Plot for RNA-seq diagnosis/quality metrics PDF file for the plot of RNA sequencing coverage at the normalized position across transcript as RNA-seq diagnosis/quality metrics, created by picard CollectRnaSeqMetrics tool File?
rnaseq_cram File ['.crai', '^.crai']
star_fusion_out File
star_junction_out File
star_fusion_log File
star_fusion_predict File
star_fusion_abridge File
strand_info File[]
annotated_fusion_predictions Directory
coding_region_effects File?
fusioninspector_evidence File[]?
tumor_cram Sorted CRAM from tumor DNA Sorted CRAM file of sequencing read alignments by bwa-mem from a tumor DNA sample with duplicate reads tagged File
tumor_mark_duplicates_metrics Sequencing duplicate metrics from tumor DNA Duplication metrics on duplicate sequencing reads from a tumor DNA sample, identified by picard MarkDuplicates tool File
tumor_insert_size_metrics Paired-end sequencing diagnosis/quality metrics from tumor DNA Diagnosis/quality metrics including the insert size distribution and read orientation of the paired-end libraries from a tumor DNA sample File
tumor_alignment_summary_metrics Sequencign alignment summary from tumor DNA Diagnosis/quality metrics summarizing the quality of sequencing read alignments from a tumor DNA sample, reported by the picard CollectAlignmentSummaryMetrics tool File
tumor_hs_metrics Sequencing coverage summary of target intervals from tumor DNA Diagnosis/quality metrics specific for sequencing data generated through hybrid-selection (e.g. whole exome) from a tumor DNA sample, for example to assess target coverage of WES File
tumor_per_target_coverage_metrics Sequencing per-target coverage summary of target intervals from tumor DNA Diagnosis/quality metrics showing detailed sequencing coverage per target interval (optional, 59 genes recommended by ACMG for clinical exome and genome sequencing for example) from a tumor DNA sample File[]
tumor_per_target_hs_metrics Sequencing coverage summary of target intervals from tumor DNA Diagnosis/quality metrics for sequencing coverage for target intervals (optional, 59 genes recommended by ACMG for clinical exome and genome sequencing for example) from a tumor DNA sample File[]
tumor_per_base_coverage_metrics Sequencing per-base coverage summary at target sites from tumor DNA Diagnosis/quality metrics showing detailed sequencing coverage per target site (optional, known variant sites of clinical significance from ClinVar for example) from a tumor DNA sample File[]
tumor_per_base_hs_metrics Sequencing coverage summary at target sites from tumor DNA Diagnosis/quality metrics for sequencing coverage at target sites (optional, known variant sites of clinical significance from ClinVar for example) from a tumor DNA sample File[]
tumor_summary_hs_metrics File[]
tumor_flagstats Sequencing count metrics based on SAM FLAG field from tumor sample Summary with the count numbers of alignments for each FLAG type from a tumor DNA sample, including 13 categories based on the bit flags in the FLAG field File
tumor_verify_bam_id_metrics Sequencing quality assessment metric for tumor sample contamination verifyBamID output files containing the contamination estimate in a tumor DNA sample, across all readGroups and per readGroup separately File
tumor_verify_bam_id_depth Sequencing quality assessment metric for tumor sample genotyping verifyBamID output files showing the sequencing depth distribution at the marker positions from Omni genotype data with a tumor DNA sample, across all readGroups and per readGroup separately File
normal_cram Sorted CRAM from normal DNA Sorted CRAM file of sequencing read alignments by bwa-mem from a normal DNA sample with duplicate reads tagged File
normal_mark_duplicates_metrics Sequencing duplicate metrics from normal DNA Duplication metrics on duplicate sequencing reads from a normal DNA sample, identified by picard MarkDuplicates tool File
normal_insert_size_metrics Paired-end sequencing diagnosis/quality metrics from normal DNA Diagnosis/quality metrics including the insert size distribution and read orientation of the paired-end libraries from a normal DNA sample File
normal_alignment_summary_metrics Sequencign alignment summary from normal DNA Diagnosis/quality metrics summarizing the quality of sequencing read alignments from a normal DNA sample, reported by the picard CollectAlignmentSummaryMetrics tool File
normal_hs_metrics Sequencing coverage summary of target intervals from normal DNA Diagnosis/quality metrics specific for sequencing data generated through hybrid-selection (e.g. whole exome) from a normal DNA sample, for example to assess target coverage File
normal_per_target_coverage_metrics Sequencing per-target coverage summary of target intervals from normal DNA Diagnosis/quality metrics showing detailed sequencing coverage per target interval (optional, 59 genes recommended by ACMG for clinical exome and genome sequencing for example) from a normal DNA sample File[]
normal_per_target_hs_metrics Sequencing coverage summary of target intervals from normal DNA Diagnosis/quality metrics for sequencing coverage for target intervals (optional, 59 genes recommended by ACMG for clinical exome and genome sequencing for example) from a normal DNA sample File[]
normal_per_base_coverage_metrics Sequencing per-base coverage summary at target sites from normal DNA Diagnosis/quality metrics showing detailed sequencing coverage per target site (optional, known variant sites of clinical significance from ClinVar for example) from a normal DNA sample File[]
normal_per_base_hs_metrics Sequencing coverage summary at target sites from normal DNA Diagnosis/quality metrics for sequencing coverage at target sites (optional, known variant sites of clinical significance from ClinVar for example) from a normal DNA sample File[]
normal_summary_hs_metrics File[]
normal_flagstats Sequencing count metrics based on SAM FLAG field from normal sample Summary with the count numbers of alignments for each FLAG type from a normal DNA sample, including 13 categories based on the bit flags in the FLAG field File
normal_verify_bam_id_metrics Sequencing quality assessment metric for normal sample contamination verifyBamID output files containing the contamination estimate in a normal DNA sample, across all readGroups and per readGroup separately File
normal_verify_bam_id_depth Sequencing quality assessment metric for normal sample genotyping verifyBamID output files showing the sequencing depth distribution at the marker positions from Omni genotype data with a normal DNA sample, across all readGroups and per readGroup separately File
mutect_unfiltered_vcf File ['.tbi']
mutect_filtered_vcf File ['.tbi']
strelka_unfiltered_vcf File ['.tbi']
strelka_filtered_vcf File ['.tbi']
varscan_unfiltered_vcf File ['.tbi']
varscan_filtered_vcf File ['.tbi']
pindel_unfiltered_vcf File ['.tbi']
pindel_filtered_vcf File ['.tbi']
docm_filtered_vcf File ['.tbi']
somatic_final_vcf File ['.tbi']
final_filtered_vcf File ['.tbi']
final_tsv File
somatic_vep_summary File
tumor_snv_bam_readcount_tsv File
tumor_indel_bam_readcount_tsv File
normal_snv_bam_readcount_tsv File
normal_indel_bam_readcount_tsv File
intervals_antitarget File?
intervals_target File?
normal_antitarget_coverage File
normal_target_coverage File
reference_coverage File?
cn_diagram File?
cn_scatter_plot File?
tumor_antitarget_coverage File
tumor_target_coverage File
tumor_bin_level_ratios File
tumor_segmented_ratios File
diploid_variants File? ['.tbi']
somatic_variants File? ['.tbi']
all_candidates File ['.tbi']
small_candidates File ['.tbi']
tumor_only_variants File? ['.tbi']
somalier_concordance_metrics File
somalier_concordance_statistics File
cram File
mark_duplicates_metrics File
insert_size_metrics File
insert_size_histogram File
alignment_summary_metrics File
hs_metrics File
per_target_coverage_metrics File[]
per_target_hs_metrics File[]
per_base_coverage_metrics File[]
per_base_hs_metrics File[]
summary_hs_metrics File[]
flagstats File
verify_bam_id_metrics File
verify_bam_id_depth File
germline_raw_vcf File ['.tbi']
germline_final_vcf File ['.tbi']
germline_filtered_vcf File ['.tbi']
germline_vep_summary File
optitype_tsv File
optitype_plot File
phased_vcf File ['.tbi']
allele_string string[]
consensus_alleles string[]
hla_call_files Directory
annotated_vcf File
annotated_tsv File
pvacseq_predictions Directory
pvacfuse_predictions Directory
unaligned_normal_dna_fastqc_data File[]
unaligned_normal_dna_table_metrics File[]
unaligned_normal_dna_md5sums File
unaligned_normal_dna_table1 File
unaligned_tumor_dna_fastqc_data File[]
unaligned_tumor_dna_table_metrics File[]
unaligned_tumor_dna_md5sums File
unaligned_tumor_dna_table1 File
unaligned_tumor_rna_fastqc_data File[]
unaligned_tumor_rna_table_metrics File[]
unaligned_tumor_rna_md5sums File
unaligned_tumor_rna_table1 File
aligned_normal_dna_fastqc_data File[]
aligned_normal_dna_table_metrics File
aligned_normal_dna_md5sums File
aligned_normal_dna_table2 File
aligned_tumor_dna_fastqc_data File[]
aligned_tumor_dna_table_metrics File
aligned_tumor_dna_md5sums File
aligned_tumor_dna_table2 File
aligned_tumor_rna_fastqc_data File[]
aligned_tumor_rna_table_metrics File
aligned_tumor_rna_md5sums File
aligned_tumor_rna_table3 File

Steps

Name CWL Run
rnaseq pipelines/rnaseq_star_fusion.cwl
somatic pipelines/somatic_exome.cwl
germline pipelines/germline_exome_hla_typing.cwl
fda_metrics subworkflows/generate_fda_metrics.cwl
phase_vcf subworkflows/phase_vcf.cwl
extract_alleles tools/extract_hla_alleles.cwl
hla_consensus tools/hla_consensus.cwl
intersect_passing_variants tools/intersect_known_variants.cwl
pvacseq subworkflows/pvacseq.cwl
pvacfuse tools/pvacfuse.cwl
Clone this wiki locally