Skip to content

Commit

Permalink
Merge CNV and SV calls (#429)
Browse files Browse the repository at this point in the history
* Clean up pipeline tests

* Refactor CNV-calling

* update snaps

* wip

* fix nf-core modules

* rename channels

* fix modules
  • Loading branch information
fellen31 authored Oct 29, 2024
1 parent 79b836b commit ffcfb6e
Show file tree
Hide file tree
Showing 46 changed files with 1,669 additions and 862 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#417](https://github.com/genomic-medicine-sweden/nallo/pull/417) - Added `FOUND_IN=deepvariant` tags to SNV calling output
- [#418](https://github.com/genomic-medicine-sweden/nallo/pull/418) - Added a check for unique input filenames for each sample
- [#419](https://github.com/genomic-medicine-sweden/nallo/pull/419) - Added support for SV filtering using input BED file ([#348](https://github.com/genomic-medicine-sweden/nallo/issues/348))
- [#429](https://github.com/genomic-medicine-sweden/nallo/pull/429) - Added nf-test to CNV calling
- [#429](https://github.com/genomic-medicine-sweden/nallo/pull/429) - Added SVDB to merge CNV calling results
- [#430](https://github.com/genomic-medicine-sweden/nallo/pull/430) - Added a GitHub action to build and publish docs to GitHub Pages
- [#431](https://github.com/genomic-medicine-sweden/nallo/pull/431) - Added files needed to automatically build and publish docs to GitHub Pages
- [#435](https://github.com/genomic-medicine-sweden/nallo/pull/435) - Added nf-test to rank variants
Expand Down Expand Up @@ -58,6 +60,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#422](https://github.com/genomic-medicine-sweden/nallo/pull/422) - Updated nf-core/tools template to v3.0.1
- [#423](https://github.com/genomic-medicine-sweden/nallo/pull/423) - Updated metro map
- [#428](https://github.com/genomic-medicine-sweden/nallo/pull/428) - Changed from using bcftools to SVDB for SV merging
- [#429](https://github.com/genomic-medicine-sweden/nallo/pull/429) - Updated HiFiCNV to 1.0.0
- [#429](https://github.com/genomic-medicine-sweden/nallo/pull/429) - Refactored the CNV calling subworkflow
- [#429](https://github.com/genomic-medicine-sweden/nallo/pull/429) - Changed SV and CNV calling outputs, merging is now done per family
- [#431](https://github.com/genomic-medicine-sweden/nallo/pull/431) - Changed `CITATIONS.md` to `docs/CITATIONS.md`,
- [#433](https://github.com/genomic-medicine-sweden/nallo/pull/433) - Updated docs and README.
- [#434](https://github.com/genomic-medicine-sweden/nallo/pull/434) - Updated the SVDB merge module to fix unstable CALL_SVS tests
Expand Down Expand Up @@ -116,6 +121,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
| WhatsHap | 2.2 | 2.3 |
| SVDB | | 2.8.1 |
| hifiasm | 0.19.8 | 0.20.0 |
| HiFiCNV | 0.1.7 | 1.0.0 |

> [!NOTE]
> Version has been updated if both old and new version information is present.
Expand Down
6 changes: 3 additions & 3 deletions conf/modules/annotate_svs.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ process {
}

withName: '.*ANNOTATE_SVS:ENSEMBLVEP_SV' {
ext.prefix = { params.skip_cnv_calling ? "${meta.id}_svs_merged_annotated" : "${meta.id}_svs_cnvs_merged_annotated" }
ext.args = { [
"${params.extra_vep_options}",
"--dir_plugins .",
Expand All @@ -38,17 +39,16 @@ process {
'--symbol --tsl --uniprot --vcf',
'--no_stats'
].join(' ') }
ext.prefix = { "${meta.id}_svs_annotated" }
publishDir = [
path: { "${params.outdir}/svs/multi_sample/${meta.id}" },
path: { "${params.outdir}/svs/family/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: '.*ANNOTATE_SVS:TABIX_ENSEMBLVEP_SV' {
publishDir = [
path: { "${params.outdir}/svs/multi_sample/${meta.id}" },
path: { "${params.outdir}/svs/family/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
Expand Down
29 changes: 23 additions & 6 deletions conf/modules/cnv.config → conf/modules/call_cnvs.config
Original file line number Diff line number Diff line change
Expand Up @@ -14,29 +14,46 @@ process {

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CNV
Call CNVs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

withName: '.*:CNV:HIFICNV' {
withName: '.*:CALL_CNVS:.*' {
publishDir = [
path: { "${params.outdir}/cnv_calling/hificnv/${meta.id}" },
enabled: false,
]
}

withName: '.*:CALL_CNVS:HIFICNV' {
ext.prefix = { "${meta.id}_hificnv" }
publishDir = [
path: { "${params.outdir}/visualization_tracks/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') || filename.endsWith('.vcf.gz') ? null : filename }
saveAs: { filename -> filename.endsWith('.bw') || filename.endsWith('.bedgraph') ? filename : null }
]
}

withName: '.*:CNV:ADD_FOUND_IN_TAG' {
withName: '.*:CALL_CNVS:ADD_FOUND_IN_TAG' {
ext.prefix = { "${meta.id}_cnvs" }
ext.args = '--no-version'
ext.args2 = [
'--output-type z',
'--write-index=tbi',
'--no-version'
].join(' ')
publishDir = [
path: { "${params.outdir}/cnv_calling/hificnv/${meta.id}" },
path: { "${params.outdir}/svs/single_sample/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: '.*:CALL_CNVS:SVDB_MERGE' {
ext.prefix = { "${meta.id}_cnvs_merged" }
ext.args = [
'--bnd_distance 10000',
'--overlap .5'
].join(' ')
}

}
17 changes: 11 additions & 6 deletions conf/modules/call_svs.config
Original file line number Diff line number Diff line change
Expand Up @@ -48,31 +48,36 @@ process {
}

withName: '.*:CALL_SVS:SVDB_MERGE' {
ext.prefix = { "${meta.id}_svs" }
ext.prefix = { "${meta.id}_svs_merged" }
ext.args = [
'--bnd_distance 1000',
'--overlap .5'
].join(' ')
publishDir = [
path: { "${params.outdir}/svs/multi_sample/${meta.id}" },
path: { "${params.outdir}/svs/family/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') || !params.skip_sv_annotation ? null : filename }
saveAs: { filename -> filename.equals('versions.yml') || !params.skip_sv_annotation || !params.skip_cnv_calling ? null : filename }
]
}

withName: '.*:CALL_SVS:TABIX_SVDB_MERGE' {
publishDir = [
path: { "${params.outdir}/svs/multi_sample/${meta.id}" },
path: { "${params.outdir}/svs/family/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') || !params.skip_sv_annotation ? null : filename }
saveAs: { filename -> filename.equals('versions.yml') || !params.skip_sv_annotation || !params.skip_cnv_calling ? null : filename }
]
}

withName: '.*:CALL_SVS:BCFTOOLS_REHEADER' {
ext.prefix = { "${meta.id}_${params.sv_caller}" }
ext.prefix = { "${meta.id}_svs" }
ext.args2 = [
'--output-type z',
'--write-index=tbi'
].join(' ')
publishDir = [
path: { "${params.outdir}/svs/single_sample/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
}
17 changes: 17 additions & 0 deletions conf/modules/general.config
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,23 @@ process {
]
}

withName: '.*:NALLO:SVDB_MERGE_SVS_CNVS' {
ext.prefix = { "${meta.id}_svs_cnvs_merged" }
publishDir = [
path: { "${params.outdir}/svs/family/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') || !params.skip_sv_annotation ? null : filename }
]
}

withName: '.*:NALLO:TABIX_SVDB_MERGE_SVS_CNVS' {
publishDir = [
path: { "${params.outdir}/svs/family/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') || !params.skip_sv_annotation ? null : filename }
]
}

withName: '.*:NALLO:ECHTVAR_ENCODE' {
publishDir = [
path: { "${params.outdir}/databases/echtvar/encode/${meta.id}" },
Expand Down
60 changes: 35 additions & 25 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,18 +130,6 @@ If the pipeline is run with phasing, the aligned reads will be happlotagged usin

## Variants

### CNVs

[HiFiCNV](https://github.com/PacificBiosciences/HiFiCNV) is used to call CNVs, producing copy number, depth, and MAF tracks for IGV.

| Path | Description |
| ------------------------------------------------- | ----------------------------------------- |
| `cnv_calling/hificnv/{sample}/*.copynum.bedgraph` | Copy number in bedgraph format |
| `cnv_calling/hificnv/{sample}/*.depth.bw` | Depth track in BigWig format |
| `cnv_calling/hificnv/{sample}/*.maf.bw` | Minor allele frequencies in BigWig format |
| `cnv_calling/hificnv/{sample}/*.vcf.gz` | VCF file containing CNV variants |
| `cnv_calling/hificnv/{sample}/*.vcf.gz.tbi` | Index of the corresponding VCF file |

### Paralogous genes

[Paraphase](https://github.com/PacificBiosciences/paraphase) is used to call paralogous genes.
Expand Down Expand Up @@ -213,26 +201,48 @@ If the pipeline is run with phasing, the aligned reads will be happlotagged usin
| `snvs/multi_sample/{project}/{project}_snv_annotated_ranked.vcf.gz` | VCF file with annotated and ranked variants for all samples |
| `snvs/multi_sample/{project}/{project}_snv_annotated_ranked.vcf.gz.tbi` | Index of the ranked VCF file |

### SVs
### SVs (and CNVs)

[Severus](https://github.com/KolmogorovLab/Severus) or [Sniffles](https://github.com/fritzsedlazeck/Sniffles) is used to call structural variants, and [SVDB](https://github.com/J35P312/SVDB) is used to merge variants within and between samples.
[Severus](https://github.com/KolmogorovLab/Severus) or [Sniffles](https://github.com/fritzsedlazeck/Sniffles) is used to call structural variants.
[HiFiCNV](https://github.com/PacificBiosciences/HiFiCNV) is used to call CNVs. It also produces copy number, depth, and MAF [visualization tracks](#visualization-tracks).
[SVDB](https://github.com/J35P312/SVDB) is used to combine and merge SVs and CNVs within and between samples.

!!!note

Variants are only output without annotation if that subworkflow is turned off.

| Path | Description |
| ----------------------------------------------------- | ------------------------------------------------------------ |
| `svs/multi_sample/{project}/{project}_svs.vcf.gz` | VCF file with merged structural variants for all samples |
| `svs/multi_sample/{project}/{project}_svs.vcf.gz.tbi` | Index of the merged VCF file |
| `svs/single_sample/{sample}/*.vcf.gz` | VCF file with merged structural variants for a single sample |
| `svs/single_sample/{sample}/*.vcf.gz.tbi` | Index of the VCF file |
!!!note

[SVDB](https://github.com/J35P312/SVDB) and [VEP](https://www.ensembl.org/vep) are used to annotate structural variants.
Due to the complexity of SV merging strategies, SVs and CNVs are reported per family rather than per project.
SV and CNV calls are output unmerged per sample, while the family files are first merged between samples for SVs and CNVs separately,
then the merged SV and CNV files are merged again, with priority given to coordinates from the SV calls.

| Path | Description |
| --------------------------------------------------------------- | ------------------------------------------------------------------ |
| `svs/multi_sample/{project}/{project}_svs_annotated.vcf.gz` | VCF file with annotated merged structural variants for all samples |
| `svs/multi_sample/{project}/{project}_svs_annotated.vcf.gz.tbi` | Index of the annotated VCF file |
| `svs/single_sample/{sample}/*.vcf_annotated.gz` | VCF file with annotated structural variants for a single sample |
| `svs/single_sample/{sample}/*.vcf_annotated.gz.tbi` | Index of the annotated VCF file |
| `svs/family/{family_id}/{family_id}_cnvs_svs_merged.vcf.gz` | VCF file with merged CNVs and SVs per family |
| `svs/family/{family_id}/{family_id}_cnvs_svs_merged.vcf.gz.tbi` | Index of the merged VCF file |
| `svs/family/{family_id}/{family_id}_svs_merged.vcf.gz` | VCF file with merged SVs per family (output if CNV-calling is off) |
| `svs/family/{family_id}/{family_id}_svs_merged.vcf.gz.tbi` | Index of the merged VCF file |
| `svs/single_sample/{sample}/{sample}_cnvs.vcf.gz` | VCF file with CNVs per sample |
| `svs/single_sample/{sample}/{sample}_cnvs.vcf.gz.tbi` | VCF file with CNVs per sample |
| `svs/single_sample/{sample}/{sample}_svs.vcf.gz` | VCF file with SVs per sample |
| `svs/single_sample/{sample}/{sample}_svs.vcf.gz.tbi` | VCF file with SVs per sample |

[SVDB](https://github.com/J35P312/SVDB) and [VEP](https://www.ensembl.org/vep) are used to annotate structural variants.

| Path | Description |
| ------------------------------------------------------------------------- | -------------------------------------------------------------------------------- |
| `svs/family/{family_id}/{family_id}_cnvs_svs_merged_annotated.vcf.gz` | VCF file with merged and annotated CNVs and SVs per family |
| `svs/family/{family_id}/{family_id}_cnvs_svs_merged_annotated.vcf.gz.tbi` | Index of the merged VCF file |
| `svs/family/{family_id}/{family_id}_svs_merged_annotated.vcf.gz` | VCF file with merged and annotated SVs per family (output if CNV-calling is off) |
| `svs/family/{family_id}/{family_id}_svs_merged_annotated.vcf.gz.tbi` | Index of the merged VCF file |

## Visualization Tracks

[HiFiCNV](https://github.com/PacificBiosciences/HiFiCNV) is used to call CNVs, but it also produces copy number, depth, and MAF tracks that can be visualized in for example IGV.

| Path | Description |
| --------------------------------------------------- | ----------------------------------------- |
|  `visualization_tracks/{sample}/*.copynum.bedgraph` | Copy number in bedgraph format |
| `visualization_tracks/{sample}/*.depth.bw` | Depth track in BigWig format |
| `visualization_tracks/{sample}/*.maf.bw` | Minor allele frequencies in BigWig format |
Loading

0 comments on commit ffcfb6e

Please sign in to comment.