-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
germline joint detect variants workflow #1043
base: master
Are you sure you want to change the base?
Conversation
update input name to be clear that the no_cds filter does not run the coding sequences filter
allows filtering of survivor merged annotsv tsv. Also allow control over the population allele frequency value, still defaults to 0.05.
version 2.3 requires the annotation directory to be passed as an input. Also capture the unannotated event tsv as an output.
changing output names for consistency
added javascript to pass in any secondary files when staging output files. added --recursive to copy everything added --preserve to keep timestamps(cromwell does not stage files for this to matter...) added --no-clobber to error out if files are overwritten added optional directory input for staging files and a single directory.
gatk soft filtering using hard filtering parameters. Based on https://gatk.broadinstitute.org/hc/en-us/articles/360035531112--How-to-Filter-variants-either-with-VQSR-or-by-hard-filtering#2
This uses VT to normalize a VCF. This is an alternative to GATK4 LeftAlignTrimVariants
This allows vcfs to be split by samples
This allows manta to be ran with multiple samples in joint calling fashion
This runs cnvnator in single sample mode over multiple samples. The sample rename step is required as the sample name in the output vcf can change from the input. examples: input name -> output name sample.1 -> sample sample.1.2 -> sample sample_1 -> sample_1 also stages output vcf name to follow $SAMPLE.cnvnator.vcf.gz format
This runs cnvkit in single sample mode for multiple samples. The sample rename step is required as output sample name in the vcf is based on the input filename. Currently that is hardcoded to be `adjusted.tumor` also stage output file name to follow $SAMPLE.cnvkit.vcf.gz format
This subworkflow runs sv filtering for manta/smoove calls. Final sample names follow the $SAMPLE-$CALLER format. This allows easy tracking for the source of calls in output merged vcfs.
This runs the depth filters for events called by cnvkit/cnvnator. Final sample names follow the $SAMPLE-$CALLER format. This allows easy tracking for the source of calls in final merged vcfs. added custom merge sv records. This allows calls to be merged together if they are of the same type and within a bp window. This does not remove calls just adds a new record in the output vcf.
This runs the sv callers in joint mode, merges, annotates, filters, and stages the results in a directory structure
This generates per sample gvcf files, jointly calls variants with gatk, annotates, filters, and stages the outputs.
This subworkflow calls the joint detect snps and joint detect svs subworkflows outputing the staged results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, @apaul7 I have not reviewed this in it's entirety. I've asked @tmooney to take a look as well. Both of us have reviewed the PR but not necessarily commit-by-commit or line-by-line. I'm going to give the "looks good to me", but if Tom can look mostly for places where these commits/changes may or may not impact other workflows/pipelines. Then let's merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally think this looks okay. I'll trust that what it does is what you want to have happen.
Part of me sees all the references to SNPs has flashbacks to past instances where we've been told to go back and change it to SNVs instead, but I'll also trust that this name is the one we want to use 😄
Co-authored-by: Thomas B. Mooney <[email protected]>
Co-authored-by: Thomas B. Mooney <[email protected]>
This PR adds a joint detect variants subworkflow. This is split into 2 separate subworkflows, joint_detect_snps.cwl and joint_detect_svs.cwl. Tried my best to add informative commit messages. I've itemized most of the overall changes below:
tools/replace_vcf_sample_name.cwl
update naming of output file to full name instead of just the base with renamed in front
tools/gather_to_subdirectory.cwl
added --recursive, --preserve, and --no-clobber to the cp command. allows timestamps to be preserved and errors thrown if files are overwritten
also added the valueFrom field which uses javascript to iterate over the input array and add any secondaryFiles to the destination directory
tools/gather_to_subdirectory_dirs.cwl
(new tool)this follows the same format as the tools/gather_to_subdirectory.cwl format but is used for directories. I tried using the same tool for both use cases but was unable to make Cromwell happy
tools/bcftools_view.cwl
(new tool)This tool is used to split multi sample vcfs into single sample vcfs.
tools/vt_normalize.cwl
(new tool)uses VT to normalize a vcf, alternative to gatk LeftAlignAndTrimVariants found in
tools/normalize_variants.cwl
subworkflows/joint_genotype.cwl
added decompose and normalize steps
tools/manta_germline.cwl
(new tool)follows
tools/manta_somatic.cwl
format. adds the stats directory to outputs and removes somatic and tumor only outputs.tools/genotype_gvcfs.cwl
added input for minimum confidence threshold for called variants
tools/custom_merge_sv_records.cwl
(new tool)merges copy number called variants that have the same type, and are within x bases
tools/cnvnator.cwl
updated output file names, s/CNV/cnv/
tools/annotsv_filter.cwl
adds ability to merge survivor merged vcf, skips last filtering requirement. survivor does not pass INFO fields to merged vcf
renamed all_cds to no_cds. easier to understand that input removes the coding sequence filter requirement
tools/annotsv.cwl
updated to version 2.3. This is not the latest version. The latest version no longer retains information for individual sv population databases in output files
renamed inputs for the new version
added input for annotation directory instead of having them in the docker image
added unannotated tsv output
subworkflows/merge_svs.cwl
added inputs for population allele frequency, no_cds annotsv filtering, and anntosv annotation directory
output file name replacement, s/SURVIVOR/survivor/
added step for survivor merged annotsv filtering
subworkflows/gatk_soft_filter.cwl
added subworkflow for gatk soft filter based on hard parameters to add PASS/FAIL
https://gatk.broadinstitute.org/hc/en-us/articles/360035531112--How-to-Filter-variants-either-with-VQSR-or-by-hard-filtering#2