Skip to content

Filtering with Calling via SNVCurate

Victor Mao edited this page May 8, 2020 · 4 revisions

Post-Processing tutorial (under SNVCurate/postProcessing/):

This is a tutorial for processing the calls made by the scripts in SNVCurate. Note that Filter.sh and Annotate.sh will submit SLURM batch jobs, which you should wait for to finish until moving to the next step. To change the parameters of the SLURM batch jobs (ie. time, queue, node count, etc.), change the parameters of the headers of the scripts runRenaming.sh, postProcessing/runAnnotate.sh, and postProcessing/RunFilter.sh.

  1. Load an interactive session with >2G memory:
srun --pty -t 0-2:0:0 --mem 5G -p interactive /bin/bash
  1. Run the script Intersect.sh to intersect the two calls.
sh Intersect.sh /path/to/filtering_output_directory /path/to/Mutect2_output_directory /path/to/MuSE_output_directory
  1. Run the script Filter.sh to filter the intersection.
sh Filter.sh /path/to/filtering_output_directory /path/to/HaplotypeCaller_output_directory True /path/to/tumor_normal.csv 4 10 0.05 0.01 hg19 /path/to/databases /path/to/bam_files /path/to/Annovar.pl True /path/to/panel
  1. Run the script Annotate.sh to finish.
sh Annotate.sh /path/to/Annovar.pl /path/to/created/directory/for/databases /path/to/filtering_output_directory /path/to/Mutect2_output_directory hg19 /path/to/tumor_normal.csv /path/to/HaplotypeCaller_output_directory 

Script Information

  1. Intersect.sh: Bash script to organize and intersect the calls by MuTecT and MuSE.
usage: sh Intersect.sh [OUTPUT_DIRECTORY] [MUTECT2_PATH] [MUSE_PATH]
  • Both the MuTecT2 and MuSE paths should be paths to the list of files directly outputted by MuTecT2 and MuSE. The script will create and organize and manipulate files on its own.
  • The MuSE path is optional, but recommended.
  1. Filter.sh: Bash script to filter the intersection of the calls.
usage: sh Filter.sh [PATH_TO_INTERSECTION] [NORMAL] [MATCHED_NORMAL] [CSV] [ALT_CUT] [TOTAL_CUT] [VAF_CUT] [MAF_CUT]
                    [REFERENCE] [ANNOVAR_DATABASES] [BAM_PATH] [ANNOVAR_SCRIPT] [FILTER_WITH_PANEL] [PANEL]
  • All fields are required unless indicated. All paths should be full paths.
  • [PATH_TO_INTERSECTION]: The full path to the directory of the intersection of the calls.
  • [NORMAL]: The full path to the directory of the normal calls from HaplotypeCaller or a Panel of Normals (ie. a sample of germline calls to filter out).
  • [MATCHED_NORMAL]: A Boolean value indicating whether or not the normal is a matched normal (ie. from GenotypeGVCFs).
  • [CSV]: Path to the original csv file containing matched tumor/normal pairs.
  • [ALT_CUT]/[VAF_CUT]: The alternate read-level depth/VAF to cut at. These will be filtered into a file with bad_somatic_quality in the filename.
  • [TOTAL_CUT]: The total read-level depth (ie. alt + ref) to cut at.
  • [MAF_CUT]: The population germline cutoff to cut at.
  • [REFERENCE]: hg19 or hg38 (for Annovar).
  • [ANNOVAR_DATABASES]: The path to the Annovar databases created from SetupDatabases.sh.
  • [BAM_PATH]: The full path to the directory of BAM files.
  • [ANNOVAR_SCRIPT]: The path to the Annovar Perl script. On Orchestra, this is /home/mk446/bin/annovar/table_annovar.pl.
  • [FILTER_WITH_PANEL]: True (if PoN filtering is desired), False (otherwise). Currently, panel filtering is only supported for hg19/b37.
  • [PANEL] (optional): The path to a Panel of Normals to filter with, if desired. For hg19/b37, the TCGA panel located at /n/data1/hms/dbmi/park/victor/references/ is recommended.
  1. Annotate.sh: Bash script to annotate the filtering results and merge them into final annotated callsets.
usage: sh Annotate.sh [ANNOVAR_SCRIPT] [ANNOVAR_DATABASES] [OUTPUT_DIRECTORY] [PATH_TO_MUTECT2] [REFERENCE] [CSV] [PATH_TO_NORMAL]
  • All paths should be full paths.
  • [ANNOVAR_SCRIPT]: The path to the Annovar Perl script. On Orchestra, this is /home/mk446/bin/annovar/table_annovar.pl.
  • [ANNOVAR_DATABASES]: The path to the Annovar databases created from SetupDatabases.sh.
  • [OUTPUT_DIRECTORY]: The same output directory used before.
  • [PATH_TO_MUTECT]: Path to MuTecT output.
  • [REFERENCE]: hg19 or hg38 (for Annovar).
  • [CSV]: Path to the original csv file containing matched tumor/normal pairs.
  • [PATH_TO_NORMAL] (optional): The full path to the directory of the normal calls from HaplotypeCaller (if used).
Clone this wiki locally