germline joint detect variants workflow #1043

apaul7 · 2021-07-15T15:49:14Z

This PR adds a joint detect variants subworkflow. This is split into 2 separate subworkflows, joint_detect_snps.cwl and joint_detect_svs.cwl. Tried my best to add informative commit messages. I've itemized most of the overall changes below:

tools/replace_vcf_sample_name.cwl
update naming of output file to full name instead of just the base with renamed in front
tools/gather_to_subdirectory.cwl
added --recursive, --preserve, and --no-clobber to the cp command. allows timestamps to be preserved and errors thrown if files are overwritten
also added the valueFrom field which uses javascript to iterate over the input array and add any secondaryFiles to the destination directory
tools/gather_to_subdirectory_dirs.cwl(new tool)
this follows the same format as the tools/gather_to_subdirectory.cwl format but is used for directories. I tried using the same tool for both use cases but was unable to make Cromwell happy
tools/bcftools_view.cwl(new tool)
This tool is used to split multi sample vcfs into single sample vcfs.
tools/vt_normalize.cwl(new tool)
uses VT to normalize a vcf, alternative to gatk LeftAlignAndTrimVariants found in tools/normalize_variants.cwl
subworkflows/joint_genotype.cwl
added decompose and normalize steps
tools/manta_germline.cwl(new tool)
follows tools/manta_somatic.cwl format. adds the stats directory to outputs and removes somatic and tumor only outputs.
tools/genotype_gvcfs.cwl
added input for minimum confidence threshold for called variants
tools/custom_merge_sv_records.cwl(new tool)
merges copy number called variants that have the same type, and are within x bases
tools/cnvnator.cwl
updated output file names, s/CNV/cnv/
tools/annotsv_filter.cwl
adds ability to merge survivor merged vcf, skips last filtering requirement. survivor does not pass INFO fields to merged vcf
renamed all_cds to no_cds. easier to understand that input removes the coding sequence filter requirement
tools/annotsv.cwl
updated to version 2.3. This is not the latest version. The latest version no longer retains information for individual sv population databases in output files
renamed inputs for the new version
added input for annotation directory instead of having them in the docker image
added unannotated tsv output
subworkflows/merge_svs.cwl
added inputs for population allele frequency, no_cds annotsv filtering, and anntosv annotation directory
output file name replacement, s/SURVIVOR/survivor/
added step for survivor merged annotsv filtering
subworkflows/gatk_soft_filter.cwl
added subworkflow for gatk soft filter based on hard parameters to add PASS/FAIL
https://gatk.broadinstitute.org/hc/en-us/articles/360035531112--How-to-Filter-variants-either-with-VQSR-or-by-hard-filtering#2

update input name to be clear that the no_cds filter does not run the coding sequences filter

allows filtering of survivor merged annotsv tsv. Also allow control over the population allele frequency value, still defaults to 0.05.

version 2.3 requires the annotation directory to be passed as an input. Also capture the unannotated event tsv as an output.

changing output names for consistency

added javascript to pass in any secondary files when staging output files. added --recursive to copy everything added --preserve to keep timestamps(cromwell does not stage files for this to matter...) added --no-clobber to error out if files are overwritten added optional directory input for staging files and a single directory.

gatk soft filtering using hard filtering parameters. Based on https://gatk.broadinstitute.org/hc/en-us/articles/360035531112--How-to-Filter-variants-either-with-VQSR-or-by-hard-filtering#2

This uses VT to normalize a VCF. This is an alternative to GATK4 LeftAlignTrimVariants

This allows vcfs to be split by samples

This allows manta to be ran with multiple samples in joint calling fashion

This runs cnvnator in single sample mode over multiple samples. The sample rename step is required as the sample name in the output vcf can change from the input. examples: input name -> output name sample.1 -> sample sample.1.2 -> sample sample_1 -> sample_1 also stages output vcf name to follow $SAMPLE.cnvnator.vcf.gz format

This runs cnvkit in single sample mode for multiple samples. The sample rename step is required as output sample name in the vcf is based on the input filename. Currently that is hardcoded to be `adjusted.tumor` also stage output file name to follow $SAMPLE.cnvkit.vcf.gz format

This subworkflow runs sv filtering for manta/smoove calls. Final sample names follow the $SAMPLE-$CALLER format. This allows easy tracking for the source of calls in output merged vcfs.

This runs the depth filters for events called by cnvkit/cnvnator. Final sample names follow the $SAMPLE-$CALLER format. This allows easy tracking for the source of calls in final merged vcfs. added custom merge sv records. This allows calls to be merged together if they are of the same type and within a bp window. This does not remove calls just adds a new record in the output vcf.

This runs the sv callers in joint mode, merges, annotates, filters, and stages the results in a directory structure

This generates per sample gvcf files, jointly calls variants with gatk, annotates, filters, and stages the outputs.

This subworkflow calls the joint detect snps and joint detect svs subworkflows outputing the staged results

jasonwalker80

+1, @apaul7 I have not reviewed this in it's entirety. I've asked @tmooney to take a look as well. Both of us have reviewed the PR but not necessarily commit-by-commit or line-by-line. I'm going to give the "looks good to me", but if Tom can look mostly for places where these commits/changes may or may not impact other workflows/pipelines. Then let's merge.

tmooney

I generally think this looks okay. I'll trust that what it does is what you want to have happen.

Part of me sees all the references to SNPs has flashbacks to past instances where we've been told to go back and change it to SNVs instead, but I'll also trust that this name is the one we want to use 😄

definitions/subworkflows/gatk_soft_filter.cwl

definitions/tools/bcftools_view.cwl

definitions/tools/custom_merge_sv_records.cwl

definitions/tools/gather_to_sub_directory_dirs.cwl

Co-authored-by: Thomas B. Mooney <[email protected]>

apaul7 added 25 commits July 15, 2021 08:55

add input to control full output filename

5ef76b2

add minimum confidence input for gatk calls

15db8c7

s/all_cds/no_cds/

cc85818

update input name to be clear that the no_cds filter does not run the coding sequences filter

add survivor merged annotsv tsv filtering

9e8876d

allows filtering of survivor merged annotsv tsv. Also allow control over the population allele frequency value, still defaults to 0.05.

update annotsv to version 2.3

47c368a

version 2.3 requires the annotation directory to be passed as an input. Also capture the unannotated event tsv as an output.

s/SURVIVOR/survivor/ and s/CNVnator/cnvnator/

70ab64a

changing output names for consistency

outputbinding change s/merged_sv_vcf/merged_vcf/

43b7c2c

added min confidence input to genotype_gvcf step

68f43a3

add annotated vcf as output

498236b

add decompose and normalize step to joint genotype

3b4329e

add gatk soft filtering

da8d329

gatk soft filtering using hard filtering parameters. Based on https://gatk.broadinstitute.org/hc/en-us/articles/360035531112--How-to-Filter-variants-either-with-VQSR-or-by-hard-filtering#2

add new normalize tool

4ccb8d3

This uses VT to normalize a VCF. This is an alternative to GATK4 LeftAlignTrimVariants

add gather to subdirectory tool for directories

8a98373

add bcftools view tool

fb99760

This allows vcfs to be split by samples

add manta_germline tool

f894746

This allows manta to be ran with multiple samples in joint calling fashion

add joint sv read caller filtering

66ac589

This subworkflow runs sv filtering for manta/smoove calls. Final sample names follow the $SAMPLE-$CALLER format. This allows easy tracking for the source of calls in output merged vcfs.

add joint detect svs subworkflows

c8214e3

This runs the sv callers in joint mode, merges, annotates, filters, and stages the results in a directory structure

add joint detect snps subworkflow

7623779

This generates per sample gvcf files, jointly calls variants with gatk, annotates, filters, and stages the outputs.

add joint detect variants

065f8b3

This subworkflow calls the joint detect snps and joint detect svs subworkflows outputing the staged results

pass annotsv_annotations input to subworkflow

70005be

pass soft filtered annotated vcf as output

0885e03

jasonwalker80 previously approved these changes Sep 15, 2021

View reviewed changes

tmooney reviewed Sep 16, 2021

View reviewed changes

apaul7 added 3 commits November 17, 2021 10:30

remove doc line for easy to understand input

e0449dc

ubuntu:xenial -> ubuntu:focal docker image

20f96d8

quote parameters in script

91b0e5f

apaul7 added 3 commits December 3, 2021 15:47

fix quotes

8079ab4

move script inline cwl file

9b6a9cb

add input option for output file basename

6e5cb23

apaul7 dismissed jasonwalker80’s stale review via 6e5cb23 December 3, 2021 21:54

apaul7 and others added 4 commits December 6, 2021 10:07

Update definitions/subworkflows/gatk_soft_filter.cwl

ac09653

Co-authored-by: Thomas B. Mooney <[email protected]>

Update definitions/tools/bcftools_view.cwl

1cb0254

Co-authored-by: Thomas B. Mooney <[email protected]>

add doc for output type

c253e6f

use bash arrays to quote multiple vars

140e6eb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

germline joint detect variants workflow #1043

germline joint detect variants workflow #1043

apaul7 commented Jul 15, 2021

jasonwalker80 left a comment •

edited

Loading

tmooney left a comment

germline joint detect variants workflow #1043

Are you sure you want to change the base?

germline joint detect variants workflow #1043

Conversation

apaul7 commented Jul 15, 2021

jasonwalker80 left a comment • edited Loading

Choose a reason for hiding this comment

tmooney left a comment

Choose a reason for hiding this comment

jasonwalker80 left a comment •

edited

Loading