Skip to content

Commit

Permalink
feat: CLIN-3686 add support for local source in exomiser
Browse files Browse the repository at this point in the history
Other change:
Add extra java -Xmx option (maximum heap size) in exomiser command
as in nf-core processes
  • Loading branch information
LysianeBouchard committed Dec 11, 2024
1 parent a04b33a commit eacaa4c
Show file tree
Hide file tree
Showing 11 changed files with 69 additions and 4 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#46](https://github.com/Ferlab-Ste-Justine/Post-processing-Pipeline/pull/46) Allow to skip the exclude mnp step
- [#47](https://github.com/Ferlab-Ste-Justine/Post-processing-Pipeline/pull/47) Improve pipeline output documentation
- [#48](https://github.com/Ferlab-Ste-Justine/Post-processing-Pipeline/pull/48) Publish only main outputs by default
- [#49](https://github.com/Ferlab-Ste-Justine/Post-processing-Pipeline/pull/49) Add support for local frequency source
- [#49](https://github.com/Ferlab-Ste-Justine/Post-processing-Pipeline/pull/49) Pass java -Xmx option at the command line for exomiser

### `Known issues`
- The nf-core modules that we are using have a potential performance flaw. Typically, the regex used to describe the output files also match the input files (ex: "*.vcf"), which can cause unnecessary file transfers. This has already proven to cause issues on fusion. One fix could be to transfer the whole modules to local to perform the small change necessary to fix this.
Expand Down
Binary file not shown.
Binary file not shown.
1 change: 1 addition & 0 deletions assets/exomiser/test_exomiser_analysis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ inheritanceModes: {
MITOCHONDRIAL: 0.2
}
frequencySources: [
LOCAL,
UK10K
]
pathogenicitySources: [ REVEL, REMM, CADD]
Expand Down
3 changes: 3 additions & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,9 @@ params {
exomiser_remm_filename = "ReMM.v0.3.1.post1.hg38.tsv.gz"
exomiser_analysis_wes = "${projectDir}/assets/exomiser/test_exomiser_analysis.yml"
exomiser_analysis_wgs = "${projectDir}/assets/exomiser/test_exomiser_analysis.yml"
exomiser_local_frequency_path = "${projectDir}/assets/exomiser/local/local_frequency_test_hg38.tsv.gz"
exomiser_local_frequency_index_path = "${projectDir}/assets/exomiser/local/local_frequency_test_hg38.tsv.gz.tbi"


// To be able to run on our public test dataset, which is aligned with an older version of GATK4
allow_old_gatk_data = true
Expand Down
6 changes: 6 additions & 0 deletions docs/reference_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,8 @@ cadd/1.7/

To prepare the exomiser data directory, follow the instructions in the [exomiser installation documentation](https://exomiser.readthedocs.io/en/latest/installation.html#linux-install)

Exomiser allows the use of a custom file for frequency data sources, typically to remove high-frequency variants caused by artifacts. To use this feature, specify the `LOCAL` frequency source in the exomiser analysis file. Then, provide the paths to your custom frequency file and its index using the parameters `exomiser_local_frequency_path` and `exomiser_local_frequency_index_path`. Note that the index file is required if using this feature.

Together with the `exomiser_data_dir` parameter, these parameters must be provided to exomiser and should match the reference data available
- `exomiser_genome`: The genome assembly version to be used by exomiser. Accepted values are `hg38` or `hg19`.
- `exomiser_data_version`: The exomiser data version. Example: `2402`.
Expand All @@ -88,6 +90,8 @@ Together with the `exomiser_data_dir` parameter, these parameters must be provid
- `exomiser_cadd_snv_filename`: The filename of the exomiser CADD snv data file (optional). Example: `whole_genome_SNVs.tsv.gz`
- `exomiser_remm_version`: The version of the REMM data to be used by exomiser (optional). Example:`0.3.1.post1`
- `exomiser_remm_filename`: The filename of the exomiser REMM data file (optional). Example: `ReMM.v0.3.1.post1.hg38.tsv.gz`
- `exomiser_local_frequency_path`: Path to a custom frequency source file (optional).
- `exomiser_local_frequency_index_path`: Path to the index file (.tbi) of the custom frequency source file (optional).

## Exomiser analysis files
In addition to the reference data, exomiser requires an analysis file (.yml/.json) that contains, among others
Expand Down Expand Up @@ -124,6 +128,8 @@ analysis file should contain only the `analysis` section.
| `exomiser_cadd_snv_filename`| _Optional_ | Filename of the exomiser CADD snv data file (e.g., `whole_genome_SNVs.tsv.gz`) |
| `exomiser_remm_version` | _Optional_ | Version of the REMM data to be used by exomiser (e.g., `0.3.1.post1`)|
| `exomiser_remm_filename` | _Optional_ | Filename of the exomiser REMM data file (e.g., `ReMM.v0.3.1.post1.hg38.tsv.gz`) |
| `exomiser_local_frequency_path`| _Optional_ | Path to a custom frequency source file |
| `exomiser_local_frequency_index_path`| _Optional_ | Path to the index file (.tbi) of the custom frequency source file. Required if specifying `exomiser_local_frequency_path`. |
| `exomiser_analysis_wes` | _Optional_ | Path to the exomiser analysis file for WES data, if different from the default |
| `exomiser_analysis_wgs` | _Optional_ | Path to the exomiser analysis file for WGS data, if different from the default |

24 changes: 22 additions & 2 deletions modules/local/exomiser/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,18 @@ process EXOMISER {
path datadir
val exomiserGenome
val exomiserDataVersion


// If specified, the local frequency file path will be inferred from the given path and passed to the exomiser cli.
// It is expected that the file has a corresponding .tbi index file.
tuple path(localFrequencyPath), path(localFrequencyIndexPath)

// If remm/cadd version is specified, remm/cadd reference file(s) path(s) will be inferred from the given filename(s)
// and passed to the exomiser cli. Each remm/cadd reference file should have a corresponding .tbi index file.
// Note that, if nextflow adds support for optional paths, one might prefer to pass the full paths explicitly.
tuple val(remmVersion), val(remmFileName)
tuple val(caddVersion), val(caddSnvFileName),val(caddIndelFileName)


output:
tuple val(meta), path("results/*vcf.gz") , optional:true, emit: vcf
tuple val(meta), path("results/*vcf.gz.tbi") , optional:true, emit: tbi
Expand All @@ -30,6 +35,12 @@ process EXOMISER {
def args = task.ext.args ?: ''
def exactVcfFile = vcfFile.find { it.name.endsWith("vcf.gz") }

def localFrequencyFileArgs = ""
if (localFrequencyPath) {
log.info("Using LOCAL frequency file {}", localFrequencyPath)
localFrequencyFileArgs = "--exomiser.${exomiserGenome}.local-frequency-path=/`pwd`/${localFrequencyPath}"
}

def remmArgs = ""
if (remmVersion) {
log.info("Using REMM version {}", remmVersion)
Expand All @@ -44,16 +55,25 @@ process EXOMISER {
caddArgs += " --exomiser.${exomiserGenome}.cadd-snv-path=/`pwd`/${datadir}/cadd/${caddVersion}/${caddSnvFileName}"
caddArgs += " --exomiser.${exomiserGenome}.cadd-indel-path=/`pwd`/${datadir}/cadd/${caddVersion}/${caddIndelFileName}"
}

def avail_mem = 3072
if (!task.memory) {
log.info '[EXOMISER] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this.'
} else {
avail_mem = (task.memory.mega*0.8).intValue()
}

"""
#!/bin/bash -eo pipefail
java -cp \$( cat /app/jib-classpath-file ) \$( cat /app/jib-main-class-file ) \\
java -Xmx${avail_mem}M -cp \$( cat /app/jib-classpath-file ) \$( cat /app/jib-main-class-file ) \\
--vcf ${exactVcfFile} \\
--assembly "${params.exomiser_genome}" \\
--analysis "${analysisFile}" \\
--sample ${phenoFile} \\
--output-format=HTML,JSON,TSV_GENE,TSV_VARIANT,VCF \\
--exomiser.data-directory=/`pwd`/${datadir} \\
${localFrequencyFileArgs} \\
${remmArgs} \\
${caddArgs} \\
--exomiser.${exomiserGenome}.data-version="${exomiserDataVersion}" \\
Expand Down
6 changes: 4 additions & 2 deletions modules/local/exomiser/tests/main.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,13 @@ nextflow_process {
input[1] = file("data-test/reference/exomiser")
input[2] = "hg38"
input[3] = "2402"
input[4] = "1.7"
input[5] = "1.3.1"
input[4] = [ file("assets/exomiser/local/local_frequency_test_hg38.tsv.gz"), file("assets/exomiser/local/local_frequency_test_hg38.tsv.gz.tbi")]
input[5] = ["1.7", "ReMM.v0.3.1.post1.hg38.tsv.gz"]
input[6] = ["1.3.1", "whole_genome_SNVs.tsv.gz", "gnomad.genomes.r4.0.indel.tsv.gz"]
"""
}
}

then{

def expected_meta = [familyId: "family1"]
Expand Down
2 changes: 2 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ params {
exomiser_remm_filename = null
exomiser_analysis_wes = "${projectDir}/assets/exomiser/default_exomiser_WES_analysis.yml"
exomiser_analysis_wgs = "${projectDir}/assets/exomiser/default_exomiser_WGS_analysis.yml"
exomiser_local_frequency_path = null
exomiser_local_frequency_index_path = null

//Process-specific parameters
exclude_mnps = true
Expand Down
20 changes: 20 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,18 @@
"description": "The filename of the exomiser REMM data file (e.g., ReMM.v0.3.1.post1.hg38.tsv.gz)",
"format": "file-path",
"pattern": "^\\S+\\.tsv.gz$"
},
"exomiser_local_frequency_path": {
"type": "string",
"description": "Path to the local frequency data file",
"format": "file-path",
"pattern": "^\\S+\\.tsv.gz$"
},
"exomiser_local_frequency_index_path": {
"type": "string",
"description": "Path to the index of the local frequency data file",
"format": "file-path",
"pattern": "^\\S+\\.tbi$"
}
},
"allOf": [
Expand Down Expand Up @@ -437,6 +449,14 @@
"exomiser_data_version"
]
}
},
{
"if": {
"required": ["exomiser_local_frequency_path"]
},
"then": {
"required": ["exomiser_local_frequency_index_path"]
}
}
]
}
Expand Down
9 changes: 9 additions & 0 deletions workflows/postprocessing.nf
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ def exomiser(inputChannel,
exomiser_data_dir,
analysis_wes_path,
analysis_wgs_path,
local_frequency_file,
local_frequency_index_file,
remm_version,
remm_filename,
cadd_version,
Expand All @@ -80,10 +82,12 @@ def exomiser(inputChannel,
if (cadd_version) {
cadd_input = [cadd_version, cadd_snv_filename, cadd_indel_filename]
}

return EXOMISER(ch_input_for_exomiser,
file(exomiser_data_dir),
exomiser_genome,
exomiser_data_version,
[local_frequency_file, local_frequency_index_file],
remm_input,
cadd_input
)
Expand Down Expand Up @@ -154,6 +158,9 @@ workflow POSTPROCESSING {
def pathReferenceDict = file(params.referenceGenome + "/" + params.referenceGenomeFasta.substring(0,params.referenceGenomeFasta.indexOf(".")) + ".dict")
def dbsnpFile = params.dbsnpFile? file(params.dbsnpFile) : []
def dbsnpFileIndex = params.dbsnpFileIndex? file(params.dbsnpFileIndex) : []
def exomiserLocalFrequencyFile = params.exomiser_local_frequency_path? file(params.exomiser_local_frequency_path) : []
def exomiserLocalFrequencyIndexFile = params.exomiser_local_frequency_index_path? file(params.exomiser_local_frequency_index_path) : []

file(params.outdir).mkdirs()

take:
Expand Down Expand Up @@ -229,6 +236,8 @@ workflow POSTPROCESSING {
params.exomiser_data_dir,
params.exomiser_analysis_wes,
params.exomiser_analysis_wgs,
exomiserLocalFrequencyFile,
exomiserLocalFrequencyIndexFile,
params.exomiser_remm_version,
params.exomiser_remm_filename,
params.exomiser_cadd_version,
Expand Down

0 comments on commit eacaa4c

Please sign in to comment.