diff --git a/docs/images/germline_metro.png b/docs/images/germline_metro.png index aca56685..05e012ea 100644 Binary files a/docs/images/germline_metro.png and b/docs/images/germline_metro.png differ diff --git a/docs/images/germline_metro.svg b/docs/images/germline_metro.svg index faf927a5..94a543fe 100644 --- a/docs/images/germline_metro.svg +++ b/docs/images/germline_metro.svg @@ -3,8 +3,8 @@ vcf2dbvcf2dbrtgtoolsvcfevalrtgtoolsrocplotMultiQCbcftools normbcftoolsfilterupdioreportsbcftools normautomap + id="path3447-6-6-4-7-5-2-7-9-4-7-3-0-2" + cx="366.07513" + cy="343.63565" + r="2.6458333" />reports diff --git a/docs/index.md b/docs/index.md index f6a20bf9..8d40fe5e 100644 --- a/docs/index.md +++ b/docs/index.md @@ -18,7 +18,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool ## Quick Start -1. Install [`Nextflow`](https://www.nextflow.io/docs/latest/getstarted.html#installation) (`>=23.10.0`) +1. Install [`Nextflow`](https://www.nextflow.io/docs/latest/getstarted.html#installation) (`>=24.04.0`) 2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/) (you can follow [this tutorial](https://singularity-tutorial.github.io/01-installation/)), [`Podman`](https://podman.io/), [`Shifter`](https://nersc.gitlab.io/development/shifter/how-to-use/) or [`Charliecloud`](https://hpc.github.io/charliecloud/) for full pipeline reproducibility _(you can use [`Conda`](https://conda.io/miniconda.html) both to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_. ```csv title="samplesheet.csv" @@ -34,6 +34,8 @@ Now, you can run the pipeline using: nextflow run nf-cmgg/germline --input samplesheet.csv --outdir --genome GRCh38 -profile ``` +This pipeline contains a lot of parameters to customize your pipeline run. Please take a look at the [parameters](parameters.md) documentation for an overview. + !!!warning Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; @@ -43,7 +45,7 @@ nextflow run nf-cmgg/germline --input samplesheet.csv --outdir --genome nf-cmgg/germline was originally written and is maintained by [@nvnieuwk](https://github.com/nvnieuwk). -Special thanks to [@matthdsm](https://github.com/matthdsm) for the many tips and feedback and to [@mvheetve](https://github.com/mvheetve) for testing the pipeline. +Special thanks to [@matthdsm](https://github.com/matthdsm) for the many tips and feedback and to [@mvheetve](https://github.com/mvheetve) and [@ToonRossel](https://github.com/ToonRosseel) for testing the pipeline. ## Contributions and Support diff --git a/docs/output.md b/docs/output.md index 34c10e15..62a3e900 100644 --- a/docs/output.md +++ b/docs/output.md @@ -1,12 +1,10 @@ # nf-cmgg/germline: Output -# nf-cmgg/germline: Output - ## Introduction This page describes the output produced by the pipeline. -The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level output directory (specified by `--outdir `). This is an example output when the pipeline has been run for a WGS sample called `SAMPLE_1` and a WES sample called `SAMPLE_2` which form a family called `FAMILY_1`. The output consists of 4 directories: `yyyy-MM-dd_project_name`, `individuals`, `multiqc_reports` and `pipeline_info`. This run has only been run with `haplotypecaller` (`--callers haplotypecaller`) +The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level output directory (specified by `--outdir `). This is an example output when the pipeline has been run for a WGS sample called `SAMPLE_1` and a WES sample called `SAMPLE_2` which form a family called `FAMILY_1`. The output consists of 4 directories: `yyyy-MM-dd_project_name`, `individuals`, `multiqc_reports` and `pipeline_info`. This run has only been run with `haplotypecaller`: (`--callers haplotypecaller`) ```bash results/ @@ -46,7 +44,7 @@ results/ 2. This directory contains all files for family `FAMILY_1`. -3. This is the BED file used to parallelize the joint-genotyping. It contains all regions that have reads mapped to them for WGS and all regions in the regions of interest that have reads mapped to them for WES. +3. This is the BED file used to parallelize the joint-genotyping. It contains all regions where real variants have been found in all GVCFs in the family. The value of `--merge_distance` (default: `100000` base pairs) is used to pad the region so the BED file contains multiple bigger regions instead of tons of small regions. 4. The PED file detailing the relation between the different members of the family. This file will be inferred when no PED file has been given to this family. @@ -60,11 +58,11 @@ results/ 9. The report created with MultiQC. This contains all statistics generated with `bcftools stats`, Ensembl VEP and other tools. -10. The folder for `SAMPLE_1` containing temporary files that could be useful for re-analysing later. +10. The folder for `SAMPLE_1` containing temporary files that could be useful for re-analysis later. -11. This is the BED file used to parallelize the variant calling. It contains all regions that have reads mapped to them for WGS and all regions in the regions of interest that have reads mapped to them for WES. +11. This is the BED file used to parallelize the variant calling. It contains all regions that are callable in the input files based on the desired regions (WGS = the whole genome; WES = the regions specified in the `roi` BED file). -12. The GVCF file created with `haplotypecaller`. This can used in later runs of the pipeline to skip variant calling for this sample. A major use case for this is to add a new member to a family without having to call all variants of already called members. +12. The GVCF file created with `haplotypecaller`. This can be used in later runs of the pipeline to skip variant calling for this sample. A major use case for this is to add a new member to a family without having to call all variants of already called members. 13. The global distribution of the coverage calculated by `mosdepth`. diff --git a/docs/parameters.md b/docs/parameters.md index e2bd97e4..a966bf8d 100644 --- a/docs/parameters.md +++ b/docs/parameters.md @@ -10,9 +10,9 @@ Define where the pipeline should find input data and save output data. | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | ------- | -------- | ------ | | `input` | Path to comma-separated file containing information about the samples in the experiment.
HelpYou will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with samples, and a header row. See [usage docs](./usage.md).
| `string` | | True | | | `outdir` | The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure. | `string` | | True | | -| `watchdir` | A folder to watch for the creation of files that start with `watch:` in the samplesheet | `string` | | | | +| `watchdir` | A folder to watch for the creation of files that start with `watch:` in the samplesheet. | `string` | | | | | `email` | Email address for completion summary.
HelpSet this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.
| `string` | | | | -| `ped` | Path to a pedigree file for all samples in the run | `string` | | | | +| `ped` | Path to a pedigree file for all samples in the run. All relational data will be fetched from this file. | `string` | | | | ## Reference genome options @@ -20,15 +20,15 @@ Reference genome related files and options required for the workflow. | Parameter | Description | Type | Default | Required | Hidden | | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------------ | -------- | ------ | -| `genome` | Reference genome build
HelpRequires a Genome Reference Consortium reference ID (e.g. GRCh38)
| `string` | GRCh38 | | | +| `genome` | Reference genome build. Used to fetch the right reference files.
HelpRequires a Genome Reference Consortium reference ID (e.g. GRCh38)
| `string` | GRCh38 | | | | `fasta` | Path to FASTA genome file.
HelpThis parameter is _mandatory_ if `--genome` is not specified. The path to the reference genome fasta.
| `string` | | True | | | `fai` | Path to FASTA genome index file. | `string` | | | | -| `dict` | Path to the sequence dictionary generated from the FASTA reference | `string` | | | | -| `strtablefile` | Path to the STR table file generated from the FASTA reference | `string` | | | | -| `sdf` | Path to the SDF folder generated from the reference FASTA file | `string` | | | | -| `genomes_base` | Directory base for CMGG reference store (used when --genomes_ignore false is specified) | `string` | /references/ | | | +| `dict` | Path to the sequence dictionary generated from the FASTA reference. This is only used when `haplotypecaller` is one of the specified callers. | `string` | | | | +| `strtablefile` | Path to the STR table file generated from the FASTA reference. This is only used when `--dragstr` has been given. | `string` | | | | +| `sdf` | Path to the SDF folder generated from the reference FASTA file. This is only required when using `--validate`. | `string` | | | | +| `genomes_base` | Directory base for CMGG reference store (used when `--genomes_ignore false` is specified) | `string` | /references/ | | | | `cmgg_config_base` | The base directory for the local config files | `string` | /conf/ | | True | -| `genomes_ignore` | Do not load the local references from the path specified with --genomes_base | `boolean` | | | True | +| `genomes_ignore` | Do not load the local references from the path specified with `--genomes_base` | `boolean` | | | True | | `igenomes_base` | Directory / URL base for iGenomes references. | `string` | | | True | | `igenomes_ignore` | Do not load the iGenomes reference config.
HelpDo not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`.
| `boolean` | | | True | @@ -36,39 +36,40 @@ Reference genome related files and options required for the workflow. Parameters that define how the pipeline works -| Parameter | Description | Type | Default | Required | Hidden | -| -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------------------------------------------------------------------ | -------- | ------ | -| `scatter_count` | The amount of scattering that should happen per sample.
HelpIncrease this number to increase the pipeline run speed, but at the tradeoff of using more IO and disk space. This can differ from the actual scatter count in some cases (especially with smaller files).
This has an effect on HaplotypeCaller, GenomicsDBImport and GenotypeGVCFs.
| `integer` | 40 | | | -| `merge_distance` | The merge distance for genotype BED files
HelpIncrease this parameter if GenomicsDBImport is running slow. This defines the maximum distance between intervals that should be merged. The less intervals GenomicsDBImport actually gets, the faster it will run.
| `integer` | 100000 | | | -| `dragstr` | Create DragSTR models to be used with HaplotypeCaller
HelpThis currently is only able to run single-core per sample. Due to this, the process is very slow with only very small improvements to the analysis.
| `boolean` | | | | -| `validate` | Validate the found variants
HelpThis only validates individual sample GVCFs that have truth VCF supplied to them via the samplesheet (in row `truth_vcf`, with an optional index in the `truth_tbi` row)
| `boolean` | | | | -| `filter` | Filter the found variants | `boolean` | | | | -| `annotate` | Annotate the found variants | `boolean` | | | | -| `add_ped` | Add PED INFO header lines to the final VCFs | `boolean` | | | | -| `gemini` | Create a Gemini databases from the final VCFs | `boolean` | | | | -| `mosdepth_slow` | Don't run mosdepth in fast-mode
HelpThis is advised if you need exact coverage BED files as output
| `boolean` | | | | -| `project` | The name of the project.
HelpThis will be used to specify the final output files folder in the output directory.
| `string` | | | | -| `skip_date_project` | Don't add the current date to the output project folder | `boolean` | | | | -| `roi` | Path to the default ROI (regions of interest) BED file to be used for WES analysis
HelpThis will be used for all samples that do not have a specific ROI file supplied to them through the samplesheet. Don't supply an ROI file to run the analysis as WGS.
| `string` | | | | -| `dbsnp` | Path to the dbSNP VCF file | `string` | | | | -| `dbsnp_tbi` | Path to the index of the dbSNP VCF file | `string` | | | | -| `somalier_sites` | Path to the VCF file with sites for Somalier to use | `string` | https://github.com/brentp/somalier/files/3412456/sites.hg38.vcf.gz | | | -| `only_call` | Only call the variants without doing any post-processing | `boolean` | | | | -| `only_merge` | Only run the pipeline until the creation of the genomicsdbs and output them | `boolean` | | | | -| `output_genomicsdb` | Output the genomicsDB together with the joint-genotyped VCF | `boolean` | | | | -| `callers` | A comma delimited string of the available callers. Current options are: 'haplotypecaller' and 'vardict' | `string` | haplotypecaller | | | -| `vardict_min_af` | The minimum allele frequency for VarDict when no `vardict_min_af` is supplied in the samplesheet | `number` | 0.1 | | | -| `normalize` | Normalize the VCFs | `boolean` | | | | -| `output_suffix` | A custom suffix to add to the basename of the output files | `string` | | | | -| `only_pass` | Filter out all variants that don't have the PASS filter for vardict. This only works when --filter is also given | `boolean` | | | | -| `keep_alt_contigs` | Keep all aditional contigs for calling instead of filtering them out before | `boolean` | | | | -| `updio` | Run UPDio analysis on the resulting VCFs | `boolean` | | | | -| `updio_common_cnvs` | A TSV file containing common CNVs to be used by UPDio | `string` | | | | -| `automap` | Run AutoMap analysis on the resulting VCFs | `boolean` | | | | -| `automap_repeats` | BED file with repeat regions in the genome.
HelpThis file will be automatically generated for hg38/GRCh38 and hg19/GRCh37 when this parameter has not been given.
| `string` | | | | -| `automap_panel` | TXT file with gene panel regions to be used by AutoMap.
HelpBy default the CMGG gene panel list will be used.
| `string` | | | | -| `automap_panel_name` | The panel name of the panel given with --automap_panel. | `string` | cmgg_bio | | | -| `hc_phasing` | Perform phasing with HaplotypeCaller | `boolean` | | | | +| Parameter | Description | Type | Default | Required | Hidden | +| ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------------------------------------------------------------------ | -------- | ------ | +| `scatter_count` | The amount of scattering that should happen per sample.
HelpIncrease this number to increase the pipeline run speed, but at the tradeoff of using more IO and disk space. This can differ from the actual scatter count in some cases (especially with smaller files).
This has an effect on HaplotypeCaller, GenomicsDBImport and GenotypeGVCFs.
| `integer` | 40 | | | +| `merge_distance` | The merge distance for family BED files
HelpIncrease this parameter if GenomicsDBImport is running slow. This defines the maximum distance between intervals that should be merged. The less intervals GenomicsDBImport actually gets, the faster it will run.
| `integer` | 100000 | | | +| `dragstr` | Create DragSTR models to be used with HaplotypeCaller
HelpThis currently is only able to run single-core per sample. Due to this, the process is very slow with only very small improvements to the analysis.
| `boolean` | | | | +| `validate` | Validate the found variants | `boolean` | | | | +| `filter` | Filter the found variants. | `boolean` | | | | +| `annotate` | Annotate the found variants using Ensembl VEP. | `boolean` | | | | +| `add_ped` | Add PED INFO header lines to the final VCFs. | `boolean` | | | | +| `gemini` | Create a Gemini databases from the final VCFs. | `boolean` | | | | +| `mosdepth_slow` | Don't run mosdepth in fast-mode
HelpThis is advised if you need exact coverage BED files as output.
| `boolean` | | | | +| `project` | The name of the project.
HelpThis will be used to specify the name of the final output files folder in the output directory.
| `string` | | | | +| `skip_date_project` | Don't add the current date to the output project folder. | `boolean` | | | | +| `roi` | Path to the default ROI (regions of interest) BED file to be used for WES analysis.
HelpThis will be used for all samples that do not have a specific ROI file supplied to them through the samplesheet. Don't supply an ROI file to run the analysis as WGS.
| `string` | | | | +| `dbsnp` | Path to the dbSNP VCF file. This will be used to set the variant IDs. | `string` | | | | +| `dbsnp_tbi` | Path to the index of the dbSNP VCF file. | `string` | | | | +| `somalier_sites` | Path to the VCF file with sites for Somalier to use. | `string` | https://github.com/brentp/somalier/files/3412456/sites.hg38.vcf.gz | | | +| `only_call` | Only call the variants without doing any post-processing. | `boolean` | | | | +| `only_merge` | Only run the pipeline until the creation of the genomicsdbs and output them. | `boolean` | | | | +| `output_genomicsdb` | Output the genomicsDB together with the joint-genotyped VCF. | `boolean` | | | | +| `callers` | A comma delimited string of the available callers. Current options are: `haplotypecaller` and `vardict`. | `string` | haplotypecaller | | | +| `vardict_min_af` | The minimum allele frequency for VarDict when no `vardict_min_af` is supplied in the samplesheet. | `number` | 0.1 | | | +| `normalize` | Normalize the variant in the final VCFs. | `boolean` | | | | +| `output_suffix` | A custom suffix to add to the basename of the output files. | `string` | | | | +| `only_pass` | Filter out all variants that don't have the PASS filter for vardict. This only works when `--filter` is also given. | `boolean` | | | | +| `keep_alt_contigs` | Keep all aditional contigs for calling instead of filtering them out before. | `boolean` | | | | +| `updio` | Run UPDio analysis on the final VCFs. | `boolean` | | | | +| `updio_common_cnvs` | A TSV file containing common CNVs to be used by UPDio. | `string` | | | | +| `automap` | Run AutoMap analysis on the final VCFs. | `boolean` | | | | +| `automap_repeats` | BED file with repeat regions in the genome.
HelpThis file will be automatically generated for hg38/GRCh38 and hg19/GRCh37 when this parameter has not been given.
| `string` | | | | +| `automap_panel` | TXT file with gene panel regions to be used by AutoMap.
HelpBy default the CMGG gene panel list will be used.
| `string` | | | | +| `automap_panel_name` | The panel name of the panel given with --automap_panel. | `string` | cmgg_bio | | | +| `hc_phasing` | Perform phasing with HaplotypeCaller. | `boolean` | | | | +| `min_callable_coverage` | The lowest callable coverage to determine callable regions. | `integer` | 5 | | | ## Institutional config options @@ -87,55 +88,55 @@ Parameters used to describe centralised config profiles. These should not be edi Less common options for the pipeline, typically set in a config file. -| Parameter | Description | Type | Default | Required | Hidden | -| ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------- | -------------------------------------------------------- | -------- | ------ | -| `help` | Display help text. | `boolean` | | | | -| `version` | Display version and exit. | `boolean` | | | | -| `publish_dir_mode` | Method used to save pipeline results to output directory.
HelpThe Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.
| `string` | copy | | | -| `email_on_fail` | Email address for completion summary, only when pipeline fails.
HelpAn email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.
| `string` | | | True | -| `plaintext_email` | Send plain-text email instead of HTML. | `boolean` | | | True | -| `max_multiqc_email_size` | File size limit when attaching MultiQC reports to summary emails. | `string` | 25.MB | | True | -| `monochrome_logs` | Do not use coloured log outputs. | `boolean` | | | True | -| `hook_url` | Incoming hook URL for messaging service
HelpIncoming hook URL for messaging service. Currently, MS Teams and Slack are supported.
| `string` | | | | -| `multiqc_title` | MultiQC report title. Printed as page header, used for filename if not otherwise specified. | `string` | | | | -| `multiqc_config` | Custom config file to supply to MultiQC. | `string` | | | | -| `multiqc_logo` | Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file | `string` | | | | -| `multiqc_methods_description` | Custom MultiQC yaml file containing HTML including a methods description. | `string` | | | | -| `validate_params` | Boolean whether to validate parameters against the schema at runtime | `boolean` | True | | True | -| `pipelines_testdata_base_path` | Base URL or local path to location of pipeline test dataset files | `string` | https://raw.githubusercontent.com/nf-core/test-datasets/ | | True | +| Parameter | Description | Type | Default | Required | Hidden | +| ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------- | -------------------------------------------------------- | -------- | ------ | +| `help` | Display help text. Give a parameter name to this option to see the detailed help of that parameter. | `['boolean', 'string']` | | | | +| `helpFull` | See the full help message of all parameters. | `boolean` | | | | +| `version` | Display version and exit. | `boolean` | | | | +| `publish_dir_mode` | Method used to save pipeline results to output directory.
HelpThe Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.
| `string` | copy | | | +| `email_on_fail` | Email address for completion summary, only when pipeline fails.
HelpAn email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.
| `string` | | | True | +| `plaintext_email` | Send plain-text email instead of HTML. | `boolean` | | | True | +| `max_multiqc_email_size` | File size limit when attaching MultiQC reports to summary emails. | `string` | 25.MB | | True | +| `hook_url` | Incoming hook URL for messaging service
HelpIncoming hook URL for messaging service. Currently, MS Teams and Slack are supported.
| `string` | | | | +| `multiqc_title` | MultiQC report title. Printed as page header, used for filename if not otherwise specified. | `string` | | | | +| `multiqc_config` | Custom config file to supply to MultiQC. | `string` | | | | +| `multiqc_logo` | Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file | `string` | | | | +| `multiqc_methods_description` | Custom MultiQC yaml file containing HTML including a methods description. | `string` | | | | +| `validate_params` | Boolean whether to validate parameters against the schema at runtime | `boolean` | True | | True | +| `pipelines_testdata_base_path` | Base URL or local path to location of pipeline test dataset files | `string` | https://raw.githubusercontent.com/nf-core/test-datasets/ | | True | ## Annotation parameters Parameters to configure Ensembl VEP and VCFanno -| Parameter | Description | Type | Default | Required | Hidden | -| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------- | ------------ | -------- | ------ | -| `vep_chunk_size` | The amount of sites per split VCF as input to VEP | `integer` | 50000 | | | -| `species` | The species of the samples
HelpMust be lower case and have underscores as spaces
| `string` | homo_sapiens | | | -| `vep_merged` | Specify if the VEP cache is a merged cache | `boolean` | True | | | -| `vep_cache` | The path to the VEP cache | `string` | | | | -| `vep_dbnsfp` | Use the dbNSFP plugin with Ensembl VEP
HelpThe '--dbnsfp' and '--dbnsfp_tbi' parameters need to be specified when using this parameter.
| `boolean` | | | | -| `vep_spliceai` | Use the SpliceAI plugin with Ensembl VEP
HelpThe '--spliceai_indel', '--spliceai_indel_tbi', '--spliceai_snv' and '--spliceai_snv_tbi' parameters need to be specified when using this parameter.
| `boolean` | | | | -| `vep_spliceregion` | Use the SpliceRegion plugin with Ensembl VEP | `boolean` | | | | -| `vep_mastermind` | Use the Mastermind plugin with Ensembl VEP
HelpThe '--mastermind' and '--mastermind_tbi' parameters need to be specified when using this parameter.
| `boolean` | | | | -| `vep_maxentscan` | Use the MaxEntScan plugin with Ensembl VEP
HelpThe '--maxentscan' parameter need to be specified when using this parameter.
| `boolean` | | | | -| `vep_eog` | Use the custom EOG annotation with Ensembl VEP
HelpThe '--eog' and '--eog_tbi' parameters need to be specified when using this parameter.
| `boolean` | | | | -| `vep_alphamissense` | Use the AlphaMissense plugin with Ensembl VEP
HelpThe '--alphamissense' and '--alphamissense_tbi' parameters need to be specified when using this parameter.
| `boolean` | | | | -| `vep_version` | The version of the VEP tool to be used | `number` | 105.0 | | | -| `vep_cache_version` | The version of the VEP cache to be used | `integer` | 105 | | | -| `dbnsfp` | Path to the dbSNFP file | `string` | | | | -| `dbnsfp_tbi` | Path to the index of the dbSNFP file | `string` | | | | -| `spliceai_indel` | Path to the VCF containing indels for spliceAI | `string` | | | | -| `spliceai_indel_tbi` | Path to the index of the VCF containing indels for spliceAI | `string` | | | | -| `spliceai_snv` | Path to the VCF containing SNVs for spliceAI | `string` | | | | -| `spliceai_snv_tbi` | Path to the index of the VCF containing SNVs for spliceAI | `string` | | | | -| `mastermind` | Path to the VCF for Mastermind | `string` | | | | -| `mastermind_tbi` | Path to the index of the VCF for Mastermind | `string` | | | | -| `alphamissense` | Path to the TSV for AlphaMissense | `string` | | | | -| `alphamissense_tbi` | Path to the index of the TSV for AlphaMissense | `string` | | | | -| `eog` | Path to the VCF containing EOG annotations | `string` | | | | -| `eog_tbi` | Path to the index of the VCF containing EOG annotations | `string` | | | | -| `vcfanno` | Run annotations with vcfanno | `boolean` | | | | -| `vcfanno_config` | The path to the VCFanno config TOML | `string` | | | | -| `vcfanno_lua` | The path to a Lua script to be used in VCFanno | `string` | | | | -| `vcfanno_resources` | A semicolon-seperated list of resource files for VCFanno, please also supply their indices using this parameter | `string` | | | | +| Parameter | Description | Type | Default | Required | Hidden | +| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------------ | -------- | ------ | +| `vep_chunk_size` | The amount of sites per split VCF as input to VEP. | `integer` | 50000 | | | +| `species` | The species of the samples.
HelpMust be lower case and have underscores as spaces.
| `string` | homo_sapiens | | | +| `vep_merged` | Specify if the VEP cache is a merged cache. | `boolean` | True | | | +| `vep_cache` | The path to the VEP cache. | `string` | | | | +| `vep_dbnsfp` | Use the dbNSFP plugin with Ensembl VEP.
HelpThe '--dbnsfp' and '--dbnsfp_tbi' parameters need to be specified when using this parameter.
| `boolean` | | | | +| `vep_spliceai` | Use the SpliceAI plugin with Ensembl VEP.
HelpThe '--spliceai_indel', '--spliceai_indel_tbi', '--spliceai_snv' and '--spliceai_snv_tbi' parameters need to be specified when using this parameter.
| `boolean` | | | | +| `vep_spliceregion` | Use the SpliceRegion plugin with Ensembl VEP. | `boolean` | | | | +| `vep_mastermind` | Use the Mastermind plugin with Ensembl VEP.
HelpThe '--mastermind' and '--mastermind_tbi' parameters need to be specified when using this parameter.
| `boolean` | | | | +| `vep_maxentscan` | Use the MaxEntScan plugin with Ensembl VEP.
HelpThe '--maxentscan' parameter need to be specified when using this parameter.
| `boolean` | | | | +| `vep_eog` | Use the custom EOG annotation with Ensembl VEP.
HelpThe '--eog' and '--eog_tbi' parameters need to be specified when using this parameter.
| `boolean` | | | | +| `vep_alphamissense` | Use the AlphaMissense plugin with Ensembl VEP.
HelpThe '--alphamissense' and '--alphamissense_tbi' parameters need to be specified when using this parameter.
| `boolean` | | | | +| `vep_version` | The version of the VEP tool to be used. | `number` | 105.0 | | | +| `vep_cache_version` | The version of the VEP cache to be used. | `integer` | 105 | | | +| `dbnsfp` | Path to the dbSNFP file. | `string` | | | | +| `dbnsfp_tbi` | Path to the index of the dbSNFP file. | `string` | | | | +| `spliceai_indel` | Path to the VCF containing indels for spliceAI. | `string` | | | | +| `spliceai_indel_tbi` | Path to the index of the VCF containing indels for spliceAI. | `string` | | | | +| `spliceai_snv` | Path to the VCF containing SNVs for spliceAI. | `string` | | | | +| `spliceai_snv_tbi` | Path to the index of the VCF containing SNVs for spliceAI. | `string` | | | | +| `mastermind` | Path to the VCF for Mastermind. | `string` | | | | +| `mastermind_tbi` | Path to the index of the VCF for Mastermind. | `string` | | | | +| `alphamissense` | Path to the TSV for AlphaMissense. | `string` | | | | +| `alphamissense_tbi` | Path to the index of the TSV for AlphaMissense. | `string` | | | | +| `eog` | Path to the VCF containing EOG annotations. | `string` | | | | +| `eog_tbi` | Path to the index of the VCF containing EOG annotations. | `string` | | | | +| `vcfanno` | Run annotations with vcfanno. | `boolean` | | | | +| `vcfanno_config` | The path to the VCFanno config TOML. | `string` | | | | +| `vcfanno_lua` | The path to a Lua script to be used in VCFanno. | `string` | | | | +| `vcfanno_resources` | A semicolon-seperated list of resource files for VCFanno, please also supply their indices using this parameter. | `string` | | | | diff --git a/docs/usage.md b/docs/usage.md index 4866cd53..c19dfd30 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -1,7 +1,5 @@ # nf-cmgg/germline: Usage -# nf-cmgg/germline: Usage - > _Documentation of pipeline parameters can be found in the [parameters documentation](./parameters.md)_ ## Samplesheet input @@ -21,7 +19,7 @@ sample,cram,crai SAMPLE_1,watch:INPUT.cram,watch:INPUT.cram.crai ``` -The files `INPUT.cram` and `INPUT.cram.crai` will now watched for recursively in the watch directory. +The files `INPUT.cram` and `INPUT.cram.crai` will now be watched for recursively in the watch directory. ### Example of the samplesheet @@ -100,12 +98,13 @@ The samplesheet can have following columns: | `ped` | OPTIONAL - Full path to PED file containing the relational information between samples in the same family. File has to have the extension `.ped`. | | `truth_vcf` | OPTIONAL - Full path to the VCF containing all the truth variants of the current sample. The validation subworkflow will be run when this file is supplied and the `--validate true` flag has been given. File has to have the extension `.vcf.gz` | | `truth_tbi` | OPTIONAL - Full path to the index of the truth VCF. This file can either be supplied by the user or generated by the pipeline. File has to have the extensions `.tbi` | +| `truth_bed` | OPTIONAL - Full path to the BED file containing the golden truth regions in the `truth_vcf` file. File has to have the extensions `.bed` | | `roi` | OPTIONAL - Full path to a BED file containing the regions of interest for the current sample to call on. When this file is given, the pipeline will run this sample in WES mode. (The flag `--roi ` can also be given to run WES mode for all samples using the file specified by the flag) File has to have the extension `.bed` or `.bed.gz`. | | `vardict_min_af` | OPTIONAL - The minimum AF value to use for the vardict variant caller (`--callers vardict`). This can be set in the samplesheet when it differs for all samples. A default can be set using the `--vardict_min_af` parameter (whichs defaults to 0.1) | !!!note - The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. Either the `ped` or `family` field can be used to specify the family name. The pipeline automatically extracts the family id from the `ped` file if the `family` field is empty. The `family` is used to specify on which samples the joint-genotyping should be performed. If neither the `ped` or `family` fields are used, the pipeline will default to a single-sample family with the sample name as its ID. + The `sample` fields has to contain the same value when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. Either the `ped` or `family` field can be used to specify the family name. The pipeline automatically extracts the family id from the `ped` file if the `family` field is empty. The `family` is used to specify on which samples the joint-genotyping should be performed. If neither the `ped` or `family` fields are used, the pipeline will default to a single-sample family with the sample name as its ID. This is an example of a working samplesheet used to test this pipeline: @@ -168,15 +167,13 @@ genome: 'GRCh38' When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline. You can also add the `-latest` argument to your run command to automatically fetch the latest version on every run: ```bash -nextflow pull nf-cmgg/germline -nextflow pull nf-cmgg/germline +nextflow pull nf-cmgg/germline -r ``` ### Reproducibility It is a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. -First, go to the [nf-cmgg/germline releases page](https://github.com/nf-cmgg/germline/releases) and find the latest pipeline version - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. Of course, you can switch to another version by changing the number after the `-r` flag. First, go to the [nf-cmgg/germline releases page](https://github.com/nf-cmgg/germline/releases) and find the latest pipeline version - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. Of course, you can switch to another version by changing the number after the `-r` flag. This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. For example, at the bottom of the MultiQC reports. @@ -239,21 +236,14 @@ You can also supply a run name to resume a specific run: `-resume [run-name]`. U ### `-c` -Specify the path to a specific config file (this is a core Nextflow command). See the [nf-core website documentation](https://nf-co.re/usage/configuration) for more information. +Specify the path to a specific config file. See the [nf-core website documentation](https://nf-co.re/usage/configuration) for more information. ## Custom configuration ### Resource requests -Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped. - -To change the resource requests, please see the [max resources](https://nf-co.re/docs/usage/configuration#max-resources) and [tuning workflow resources](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources) section of the nf-core website. - -### Custom Containers +Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here](https://github.com/nf-cmgg/germline/blob/b637c64c2e1eeb1527d481a377f60950c9a114b8/conf/base.config#L17) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped. -In some cases you may wish to change which container or conda environment a step of the pipeline uses for a particular tool. By default nf-core pipelines use containers and software from the [biocontainers](https://biocontainers.pro/) or [bioconda](https://bioconda.github.io/) projects. However in some cases the pipeline specified version maybe out of date. - -To use a different container from the default container or conda environment specified in a pipeline, please see the [updating tool versions](https://nf-co.re/docs/usage/configuration#updating-tool-versions) section of the nf-core website. To change the resource requests, please see the [max resources](https://nf-co.re/docs/usage/configuration#max-resources) and [tuning workflow resources](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources) section of the nf-core website. ### Custom Containers @@ -268,12 +258,6 @@ A pipeline might not always support every possible argument or option of a parti To learn how to provide additional arguments to a particular tool of the pipeline, please see the [customising tool arguments](https://nf-co.re/docs/usage/configuration#customising-tool-arguments) section of the nf-core website. -### Custom Tool Arguments - -A pipeline might not always support every possible argument or option of a particular tool used in pipeline. Fortunately, nf-core pipelines provide some freedom to users to insert additional parameters that the pipeline does not include by default. - -To learn how to provide additional arguments to a particular tool of the pipeline, please see the [customising tool arguments](https://nf-co.re/docs/usage/configuration#customising-tool-arguments) section of the nf-core website. - ### nf-core/configs In most cases, you will only need to create a custom config as a one-off but if you and others within your organisation are likely to be running nf-core pipelines regularly and need to use the same settings regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter. You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. @@ -282,14 +266,6 @@ See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs). -## Azure Resource Requests - -To be used with the `azurebatch` profile by specifying the `-profile azurebatch`. -We recommend providing a compute `params.vm_type` of `Standard_D16_v3` VMs by default but these options can be changed if required. - -Note that the choice of VM size depends on your quota and the overall workload during the analysis. -For a thorough list, please refer the [Azure Sizes for virtual machines in Azure](https://docs.microsoft.com/en-us/azure/virtual-machines/sizes). - ## Running in the background Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished. diff --git a/nextflow_schema.json b/nextflow_schema.json index 6619601f..bd870777 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -31,7 +31,7 @@ "watchdir": { "type": "string", "format": "directory-path", - "description": "A folder to watch for the creation of files that start with `watch:` in the samplesheet", + "description": "A folder to watch for the creation of files that start with `watch:` in the samplesheet.", "fa_icon": "fas fa-folder-open" }, "email": { @@ -46,7 +46,8 @@ "format": "file-path", "exists": true, "pattern": "^\\S+\\.ped$", - "description": "Path to a pedigree file for all samples in the run" + "description": "Path to a pedigree file for all samples in the run. All relational data will be fetched from this file.", + "help": "A PED file given in the samplesheet will be used above this PED file." } } }, @@ -59,7 +60,7 @@ "genome": { "type": "string", "default": "GRCh38", - "description": "Reference genome build", + "description": "Reference genome build. Used to fetch the right reference files.", "help_text": "Requires a Genome Reference Consortium reference ID (e.g. GRCh38)" }, "fasta": { @@ -83,27 +84,30 @@ "dict": { "type": "string", "pattern": "^\\S+\\.dict$", - "description": "Path to the sequence dictionary generated from the FASTA reference", + "description": "Path to the sequence dictionary generated from the FASTA reference. This is only used when `haplotypecaller` is one of the specified callers.", + "help": "The pipeline will autogenerate this file when missing.", "fa_icon": "far fa-file-code", "format": "file-path", "mimetype": "text/plain" }, "strtablefile": { "type": "string", - "description": "Path to the STR table file generated from the FASTA reference", + "description": "Path to the STR table file generated from the FASTA reference. This is only used when `--dragstr` has been given.", + "help": "The pipeline will autogenerate this file when missing.", "fa_icon": "fas fa-folder", "format": "path" }, "sdf": { "type": "string", - "description": "Path to the SDF folder generated from the reference FASTA file", + "description": "Path to the SDF folder generated from the reference FASTA file. This is only required when using `--validate`.", + "help": "The pipeline will autogenerate this file when missing.", "format": "path", "fa_icon": "fas fa-folder" }, "genomes_base": { "type": "string", "default": "/references/", - "description": "Directory base for CMGG reference store (used when --genomes_ignore false is specified)", + "description": "Directory base for CMGG reference store (used when `--genomes_ignore false` is specified)", "fa_icon": "fas fa-download", "format": "directory-path" }, @@ -117,7 +121,7 @@ "genomes_ignore": { "type": "boolean", "hidden": true, - "description": "Do not load the local references from the path specified with --genomes_base", + "description": "Do not load the local references from the path specified with `--genomes_base`", "fa_icon": "fas fa-ban" }, "igenomes_base": { @@ -153,8 +157,9 @@ "merge_distance": { "type": "integer", "default": 100000, - "description": "The merge distance for genotype BED files", - "help_text": "Increase this parameter if GenomicsDBImport is running slow. This defines the maximum distance between intervals that should be merged. The less intervals GenomicsDBImport actually gets, the faster it will run." + "description": "The merge distance for family BED files", + "help_text": "Increase this parameter if GenomicsDBImport is running slow. This defines the maximum distance between intervals that should be merged. The less intervals GenomicsDBImport actually gets, the faster it will run.", + "minimum": 1 }, "dragstr": { "type": "boolean", @@ -164,135 +169,138 @@ "validate": { "type": "boolean", "description": "Validate the found variants", - "help_text": "This only validates individual sample GVCFs that have truth VCF supplied to them via the samplesheet (in row `truth_vcf`, with an optional index in the `truth_tbi` row)" + "help": "A sample should have at least a `truth_vcf` supplied along with it in the samplesheet for it be validated." }, "filter": { "type": "boolean", - "description": "Filter the found variants" + "description": "Filter the found variants." }, "annotate": { "type": "boolean", - "description": "Annotate the found variants" + "description": "Annotate the found variants using Ensembl VEP." }, "add_ped": { "type": "boolean", - "description": "Add PED INFO header lines to the final VCFs" + "description": "Add PED INFO header lines to the final VCFs." }, "gemini": { "type": "boolean", - "description": "Create a Gemini databases from the final VCFs" + "description": "Create a Gemini databases from the final VCFs." }, "mosdepth_slow": { "type": "boolean", "description": "Don't run mosdepth in fast-mode", - "help_text": "This is advised if you need exact coverage BED files as output" + "help_text": "This is advised if you need exact coverage BED files as output." }, "project": { "type": "string", "description": "The name of the project.", - "help_text": "This will be used to specify the final output files folder in the output directory." + "help_text": "This will be used to specify the name of the final output files folder in the output directory." }, "skip_date_project": { "type": "boolean", - "description": "Don't add the current date to the output project folder" + "description": "Don't add the current date to the output project folder." }, "roi": { "type": "string", - "description": "Path to the default ROI (regions of interest) BED file to be used for WES analysis", + "description": "Path to the default ROI (regions of interest) BED file to be used for WES analysis.", "help_text": "This will be used for all samples that do not have a specific ROI file supplied to them through the samplesheet. Don't supply an ROI file to run the analysis as WGS.", "format": "file-path", "pattern": "^\\S+\\.bed(\\.gz)?$", - "mimetype": "text/plain" + "exists": true }, "dbsnp": { "type": "string", - "description": "Path to the dbSNP VCF file", + "description": "Path to the dbSNP VCF file. This will be used to set the variant IDs.", "fa_icon": "far fa-file-alt", "format": "file-path", "pattern": "^\\S+\\.vcf\\.gz$", - "mimetype": "text/plain" + "exists": true }, "dbsnp_tbi": { "type": "string", - "description": "Path to the index of the dbSNP VCF file", + "description": "Path to the index of the dbSNP VCF file.", "fa_icon": "far fa-file-alt", "format": "file-path", "pattern": "^\\S+\\.tbi$", - "mimetype": "text/plain" + "exists": true }, "somalier_sites": { "type": "string", "default": "https://github.com/brentp/somalier/files/3412456/sites.hg38.vcf.gz", "fa_icon": "far fa-file-alt", - "description": "Path to the VCF file with sites for Somalier to use", + "description": "Path to the VCF file with sites for Somalier to use.", "pattern": "^\\S+\\.vcf\\.gz", "format": "file-path", - "mimetype": "text/plain" + "exists": true }, "only_call": { "type": "boolean", - "description": "Only call the variants without doing any post-processing" + "description": "Only call the variants without doing any post-processing." }, "only_merge": { "type": "boolean", - "description": "Only run the pipeline until the creation of the genomicsdbs and output them" + "description": "Only run the pipeline until the creation of the genomicsdbs and output them." }, "output_genomicsdb": { "type": "boolean", - "description": "Output the genomicsDB together with the joint-genotyped VCF" + "description": "Output the genomicsDB together with the joint-genotyped VCF." }, "callers": { "type": "string", - "description": "A comma delimited string of the available callers. Current options are: 'haplotypecaller' and 'vardict'", + "description": "A comma delimited string of the available callers. Current options are: `haplotypecaller` and `vardict`.", "default": "haplotypecaller" }, "vardict_min_af": { "type": "number", - "description": "The minimum allele frequency for VarDict when no `vardict_min_af` is supplied in the samplesheet", - "default": 0.1 + "description": "The minimum allele frequency for VarDict when no `vardict_min_af` is supplied in the samplesheet.", + "default": 0.1, + "minimum": 0 }, "normalize": { "type": "boolean", - "description": "Normalize the VCFs" + "description": "Normalize the variant in the final VCFs." }, "output_suffix": { "type": "string", - "description": "A custom suffix to add to the basename of the output files" + "description": "A custom suffix to add to the basename of the output files." }, "only_pass": { "type": "boolean", - "description": "Filter out all variants that don't have the PASS filter for vardict. This only works when --filter is also given" + "description": "Filter out all variants that don't have the PASS filter for vardict. This only works when `--filter` is also given." }, "keep_alt_contigs": { "type": "boolean", - "description": "Keep all aditional contigs for calling instead of filtering them out before" + "description": "Keep all aditional contigs for calling instead of filtering them out before." }, "updio": { "type": "boolean", - "description": "Run UPDio analysis on the resulting VCFs" + "description": "Run UPDio analysis on the final VCFs." }, "updio_common_cnvs": { "type": "string", - "description": "A TSV file containing common CNVs to be used by UPDio", + "description": "A TSV file containing common CNVs to be used by UPDio.", "format": "file-path", "exists": true, "pattern": "^\\S+\\.tsv$" }, "automap": { "type": "boolean", - "description": "Run AutoMap analysis on the resulting VCFs" + "description": "Run AutoMap analysis on the final VCFs." }, "automap_repeats": { "type": "string", "description": "BED file with repeat regions in the genome.", "help_text": "This file will be automatically generated for hg38/GRCh38 and hg19/GRCh37 when this parameter has not been given.", - "pattern": "^\\S+\\.bed$" + "pattern": "^\\S+\\.bed$", + "exists": true }, "automap_panel": { "type": "string", "description": "TXT file with gene panel regions to be used by AutoMap.", "help_text": "By default the CMGG gene panel list will be used.", - "pattern": "^\\S+\\.txt$" + "pattern": "^\\S+\\.txt$", + "exists": true }, "automap_panel_name": { "type": "string", @@ -301,12 +309,13 @@ }, "hc_phasing": { "type": "boolean", - "description": "Perform phasing with HaplotypeCaller" + "description": "Perform phasing with HaplotypeCaller." }, "min_callable_coverage": { "type": "integer", - "description": "The lowest callable coverage to determine callable regions", - "default": 5 + "description": "The lowest callable coverage to determine callable regions.", + "default": 5, + "minimum": 0 } } }, @@ -366,8 +375,13 @@ "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.", "properties": { "help": { + "type": ["boolean", "string"], + "description": "Display help text. Give a parameter name to this option to see the detailed help of that parameter.", + "fa_icon": "fas fa-question-circle" + }, + "helpFull": { "type": "boolean", - "description": "Display help text.", + "description": "See the full help message of all parameters.", "fa_icon": "fas fa-question-circle" }, "version": { @@ -405,12 +419,6 @@ "fa_icon": "fas fa-file-upload", "hidden": true }, - "monochrome_logs": { - "type": "boolean", - "description": "Do not use coloured log outputs.", - "fa_icon": "fas fa-palette", - "hidden": true - }, "hook_url": { "type": "string", "description": "Incoming hook URL for messaging service", @@ -463,187 +471,188 @@ "vep_chunk_size": { "type": "integer", "default": 50000, - "description": "The amount of sites per split VCF as input to VEP" + "description": "The amount of sites per split VCF as input to VEP.", + "minimum": 1 }, "species": { "type": "string", "default": "homo_sapiens", - "description": "The species of the samples", + "description": "The species of the samples.", "fa_icon": "fas fa-user-circle", "pattern": "^[a-z_]*$", - "help_text": "Must be lower case and have underscores as spaces" + "help_text": "Must be lower case and have underscores as spaces." }, "vep_merged": { "type": "boolean", "default": true, - "description": "Specify if the VEP cache is a merged cache" + "description": "Specify if the VEP cache is a merged cache." }, "vep_cache": { "type": "string", - "description": "The path to the VEP cache", + "description": "The path to the VEP cache.", "format": "path" }, "vep_dbnsfp": { "type": "boolean", - "description": "Use the dbNSFP plugin with Ensembl VEP", + "description": "Use the dbNSFP plugin with Ensembl VEP.", "fa_icon": "fas fa-question-circle", "help_text": "The '--dbnsfp' and '--dbnsfp_tbi' parameters need to be specified when using this parameter." }, "vep_spliceai": { "type": "boolean", - "description": "Use the SpliceAI plugin with Ensembl VEP", + "description": "Use the SpliceAI plugin with Ensembl VEP.", "fa_icon": "fas fa-question-circle", "help_text": "The '--spliceai_indel', '--spliceai_indel_tbi', '--spliceai_snv' and '--spliceai_snv_tbi' parameters need to be specified when using this parameter." }, "vep_spliceregion": { "type": "boolean", - "description": "Use the SpliceRegion plugin with Ensembl VEP", + "description": "Use the SpliceRegion plugin with Ensembl VEP.", "fa_icon": "fas fa-question-circle" }, "vep_mastermind": { "type": "boolean", - "description": "Use the Mastermind plugin with Ensembl VEP", + "description": "Use the Mastermind plugin with Ensembl VEP.", "fa_icon": "fas fa-question-circle", "help_text": "The '--mastermind' and '--mastermind_tbi' parameters need to be specified when using this parameter." }, "vep_maxentscan": { "type": "boolean", - "description": "Use the MaxEntScan plugin with Ensembl VEP", + "description": "Use the MaxEntScan plugin with Ensembl VEP.", "fa_icon": "fas fa-question-circle", "help_text": "The '--maxentscan' parameter need to be specified when using this parameter." }, "vep_eog": { "type": "boolean", - "description": "Use the custom EOG annotation with Ensembl VEP", + "description": "Use the custom EOG annotation with Ensembl VEP.", "fa_icon": "fas fa-question-circle", "help_text": "The '--eog' and '--eog_tbi' parameters need to be specified when using this parameter." }, "vep_alphamissense": { "type": "boolean", - "description": "Use the AlphaMissense plugin with Ensembl VEP", + "description": "Use the AlphaMissense plugin with Ensembl VEP.", "fa_icon": "fas fa-question-circle", "help_text": "The '--alphamissense' and '--alphamissense_tbi' parameters need to be specified when using this parameter." }, "vep_version": { "type": "number", "default": 105.0, - "description": "The version of the VEP tool to be used", + "description": "The version of the VEP tool to be used.", "fa_icon": "fas fa-code-branch" }, "vep_cache_version": { "type": "integer", "default": 105, - "description": "The version of the VEP cache to be used", + "description": "The version of the VEP cache to be used.", "fa_icon": "fas fa-code-branch" }, "dbnsfp": { "type": "string", - "description": "Path to the dbSNFP file", + "description": "Path to the dbSNFP file.", "format": "file-path", "fa_icon": "far fa-file-alt", - "mimetype": "text/plain", - "pattern": "^\\S+\\.gz$" + "pattern": "^\\S+\\.gz$", + "exists": true }, "dbnsfp_tbi": { "type": "string", "format": "file-path", - "description": "Path to the index of the dbSNFP file", + "description": "Path to the index of the dbSNFP file.", "fa_icon": "far fa-file-alt", "pattern": "^\\S+\\.(csi|tbi)$", - "mimetype": "text/plain" + "exists": true }, "spliceai_indel": { "type": "string", "format": "file-path", - "description": "Path to the VCF containing indels for spliceAI", + "description": "Path to the VCF containing indels for spliceAI.", "fa_icon": "far fa-file-alt", "pattern": "^\\S+\\.vcf\\.gz$", - "mimetype": "text/plain" + "exists": true }, "spliceai_indel_tbi": { "type": "string", "format": "file-path", - "description": "Path to the index of the VCF containing indels for spliceAI", + "description": "Path to the index of the VCF containing indels for spliceAI.", "pattern": "^\\S+\\.(csi|tbi)$", - "mimetype": "text/plain" + "exists": true }, "spliceai_snv": { "type": "string", "format": "file-path", - "description": "Path to the VCF containing SNVs for spliceAI", + "description": "Path to the VCF containing SNVs for spliceAI.", "pattern": "^\\S+\\.vcf\\.gz$", - "mimetype": "text/plain" + "exists": true }, "spliceai_snv_tbi": { "type": "string", "format": "file-path", - "description": "Path to the index of the VCF containing SNVs for spliceAI", + "description": "Path to the index of the VCF containing SNVs for spliceAI.", "pattern": "^\\S+\\.(csi|tbi)$", - "mimetype": "text/plain" + "exists": true }, "mastermind": { "type": "string", "format": "file-path", - "description": "Path to the VCF for Mastermind", + "description": "Path to the VCF for Mastermind.", "pattern": "^\\S+\\.vcf\\.gz$", - "mimetype": "text/plain" + "exists": true }, "mastermind_tbi": { "type": "string", "format": "file-path", - "description": "Path to the index of the VCF for Mastermind", + "description": "Path to the index of the VCF for Mastermind.", "pattern": "^\\S+\\.(csi|tbi)$", - "mimetype": "text/plain" + "exists": true }, "alphamissense": { "type": "string", "format": "file-path", - "description": "Path to the TSV for AlphaMissense", + "description": "Path to the TSV for AlphaMissense.", "pattern": "^\\S+\\.tsv\\.gz$", - "mimetype": "text/plain" + "exists": true }, "alphamissense_tbi": { "type": "string", "format": "file-path", - "description": "Path to the index of the TSV for AlphaMissense", + "description": "Path to the index of the TSV for AlphaMissense.", "pattern": "^\\S+\\.(csi|tbi)$", - "mimetype": "text/plain" + "exists": true }, "eog": { "type": "string", "format": "file-path", - "description": "Path to the VCF containing EOG annotations", + "description": "Path to the VCF containing EOG annotations.", "pattern": "^\\S+\\.vcf\\.gz$", - "mimetype": "text/plain" + "exists": true }, "eog_tbi": { "type": "string", "format": "file-path", - "description": "Path to the index of the VCF containing EOG annotations", + "description": "Path to the index of the VCF containing EOG annotations.", "pattern": "^\\S+\\.(csi|tbi)$", - "mimetype": "text/plain" + "exists": true }, "vcfanno": { "type": "boolean", - "description": "Run annotations with vcfanno" + "description": "Run annotations with vcfanno." }, "vcfanno_config": { "type": "string", - "description": "The path to the VCFanno config TOML", + "description": "The path to the VCFanno config TOML.", "pattern": "^\\S+\\.toml$", "format": "file-path", - "mimetype": "text/plain" + "exists": true }, "vcfanno_lua": { "type": "string", - "description": "The path to a Lua script to be used in VCFanno", + "description": "The path to a Lua script to be used in VCFanno.", "pattern": "^\\S+\\.lua$", "format": "file-path", - "mimetype": "text/plain" + "exists": true }, "vcfanno_resources": { "type": "string", - "description": "A semicolon-seperated list of resource files for VCFanno, please also supply their indices using this parameter" + "description": "A semicolon-seperated list of resource files for VCFanno, please also supply their indices using this parameter." } }, "help_text": "Annotation will only run when `--annotate true` is specified."