Skip to content

Commit

Permalink
Merge pull request #165 from Joon-Klaps/add-prodigal
Browse files Browse the repository at this point in the history
Adding prokka for gene detection & annotation
  • Loading branch information
Joon-Klaps authored Feb 25, 2025
2 parents db19c30 + a6f6fda commit 0fb6e45
Show file tree
Hide file tree
Showing 27 changed files with 1,120 additions and 18 deletions.
1 change: 0 additions & 1 deletion .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ lint:
- manifest.name
- manifest.homePage
- config_defaults:
- params.multiqc_comment_headers
- params.custom_table_headers
multiqc_config: false
files_exist:
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Initial release of Joon-Klaps/viralgenie, created with the [nf-core](https://nf-
- Constrain -> Constraint & further python script debugging ([#161](https://github.com/Joon-Klaps/viralgenie/pull/161))
- include empty samples in multiqc sample overview ([#162](https://github.com/Joon-Klaps/viralgenie/pull/162))
- Include samtools stats pre dedup & post dedup in overview tables ([#163](https://github.com/Joon-Klaps/viralgenie/pull/163))
- adding prokka for gene detection & annotation ([#165](https://github.com/Joon-Klaps/viralgenie/pull/165))

### `Fixed`

Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,10 @@
- [picard-tools](http://broadinstitute.github.io/picard)

- [prokka](https://pubmed.ncbi.nlm.nih.gov/24642063/)

> Seemann, Torsten. “Prokka: rapid prokaryotic genome annotation.” Bioinformatics (Oxford, England) vol. 30,14 (2014): 2068-9. doi:10.1093/bioinformatics/btu153
- [QUAST](https://www.ncbi.nlm.nih.gov/pubmed/23422339/)

> Gurevich, Alexey et al. “QUAST: quality assessment tool for genome assemblies.” Bioinformatics (Oxford, England) vol. 29,8 (2013): 1072-5. doi:10.1093/bioinformatics/btt086
Expand Down
18 changes: 18 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,15 @@ process {
]
}

withName: XZ_DECOMPRESS {
publishDir = [
path: { "${params.outdir}/databases/" },
mode: params.publish_dir_mode,
enabled: params.save_databases,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: UNTAR_DB {
publishDir = [
path: { "${params.outdir}/databases/" },
Expand Down Expand Up @@ -1632,6 +1641,15 @@ process {
]
}

withName: PROKKA {
ext.args = '--centre X --compliant --force --kingdom Viruses'
publishDir = [
path: { "${params.outdir}/consensus/quality_control/prokka/${meta.sample}/${meta.step}" },
mode: params.publish_dir_mode,
saveAs: { filename -> params.prefix || params.global_prefix ? "${params.global_prefix}-$filename" : filename }
]
}

withName: BLASTN_QC {
ext.args = '-max_target_seqs 5 -outfmt "6 qseqid sseqid stitle pident qlen slen length mismatch gapopen qstart qend sstart send evalue bitscore"' // don't change outfmt
publishDir = [
Expand Down
3 changes: 2 additions & 1 deletion conf/tests/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,9 @@ params {
kaiju_db = "https://kaiju-idx.s3.eu-central-1.amazonaws.com/2023/kaiju_db_viruses_2023-05-26.tgz"
reference_pool = "https://github.com/Joon-Klaps/nextclade_data/raw/old_datasets/data/nextstrain/sars-cov-2/MN908947/sequences.fasta"

mapping_constraints = "${projectDir}/assets/samplesheets/mapping_constraints.csv"
mapping_constraints = "${projectDir}/assets/samplesheets/mapping_constraints.csv"
checkv_db = "https://github.com/nf-core/test-datasets/raw/phageannotator/modules/nfcore/checkv/endtoend/checkv_minimal_db.tar"
prokka_db = "https://rvdb-prot.pasteur.fr/files/U-RVDBv29.0-prot_clustered.fasta.xz"

save_intermediate_polishing = true
min_mapped_reads = 100
Expand Down
16 changes: 15 additions & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -700,6 +700,16 @@ Consensus quality control is done with multiple tools, the results are stored in
- `<sample-id>/<sample-id>_<cl# | constraint-id>/contamination.tsv`: A detailed overview of how contamination was estimated.
- `<sample-id>/<sample-id>_<cl# | constraint-id>/complete_genomes.tsv`: A detailed overview of putative genomes identified.


### Prokka

[`Prokka`](https://github.com/tseemann/prokka) is a whole genome annotation pipeline for identifying features of interest in a set of genomic DNA sequences, and labelling them with useful information. Prokka is a software tool to annotate bacterial, archaeal and viral genomes.)

???- abstract "Output files"

- `consensus/quality_control/prokka/`
- `<sample-id>/<iteration>/* directories containing the prokka output files.

### BLASTn

[BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a tool for comparing primary biological sequence information. The output from the BLAST run is stored in the directory `consensus/quality_control/blast/`. Final consensus genomes are searched against the `--reference_pool`.
Expand Down Expand Up @@ -790,11 +800,15 @@ Furthermore, viralgenie runs MultiQC 2 times, as it uses the output from multiqc

???- abstract "Output files"
- `multiqc/`
- `overview-tables/`: a directory with a set of commented TSV (comments taken from `--multiqc_comment_headers`) that summarize aspects of the pipeline runs.
- `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
- `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
- `multiqc_dataprep/`: preparation files for the generated custom tables.
- `multiqc_plots/`: directory containing static images from the report in various formats.
- `overview-tables/`: a directory with a set of summary TSV files.
- `contigs_overview_with_iterations.tsv`: A tabular file containing the contig information of the final __contig consensus__ genome and their intermediate iterations.
- `contigs_overview.tsv`: A tabular file containing the contig information of the final __contig consensus__ genome.
- `mapping_overview.tsv`: A tabular file containing the mapping information of the final __mapped consensus__ genome, from the argument `--mapping_constraints`.
- `samples_overview.tsv`: A tabular file containing the sample information combining information from both `contigs_overview.tsv` & `mapping_overview.tsv`.

## Pipeline information

Expand Down
14 changes: 13 additions & 1 deletion docs/workflow/consensus_qc.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,19 @@ Within the MultiQC report, Viralgenie provides a number of custom tables based o

> CheckV can be skipped with `--skip_checkv`.
## BLASTn

## Prokka
[Prokka](https://github.com/tseemann/prokka) is a whole genome annotation pipeline for identifying features of interest in a set of genomic DNA sequences, and labelling them with useful information. Prokka is a software tool to annotate bacterial, archaeal and viral genomes.

!!! Tip "Suboptimal annotation"
Prokka was initially designed for bacterial and archaeal genomes, and may not be optimal for viral genomes. [VIGOR4](https://github.com/JCVenterInstitute/VIGOR4) is a good alternative but is species specific.

!!! Tip "Custom protein database"
Prokka can be given a custom protein database to annotate your genomes with, have a look at [prot-RVDB](https://rvdb-prot.pasteur.fr/) for viral protein databases. Supply the database using `--prokka_db`.

> Prokka can be skipped with `--skip_prokka`.
## BLAST

[blastn](https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a tool for comparing primary biological sequence information. It calculates the similarity between the consensus genome and the reference genome. The similarity is calculated based on the number of identical bases between the two sequences. Viralgenie uses blastn to compare the sequences against the supplied `--reference_pool` dataset.

Expand Down
10 changes: 10 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,11 @@
"installed_by": ["modules"],
"patch": "modules/nf-core/prinseqplusplus/prinseqplusplus.diff"
},
"prokka": {
"branch": "master",
"git_sha": "81880787133db07d9b4c1febd152c090eb8325dc",
"installed_by": ["modules"]
},
"quast": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
Expand Down Expand Up @@ -368,6 +373,11 @@
"git_sha": "d97b335eb448073c1b680710303c02a55f40c77c",
"installed_by": ["modules"],
"patch": "modules/nf-core/vsearch/cluster/vsearch-cluster.diff"
},
"xz/decompress": {
"branch": "master",
"git_sha": "81880787133db07d9b4c1febd152c090eb8325dc",
"installed_by": ["modules"]
}
}
},
Expand Down
3 changes: 0 additions & 3 deletions modules/local/custom_multiqc/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ process CUSTOM_MULTIQC {
path anno_files, stageAs: "?/annotation/*"
path clusters_tsv, stageAs: "?/clusters/*"
path screen_files, stageAs: "?/screen/*"
path comment_headers
path custom_table_headers

output:
Expand Down Expand Up @@ -48,7 +47,6 @@ process CUSTOM_MULTIQC {
def clusters_files = clusters_tsv ? "--clusters_files ${clusters_tsv}" : ''
def mapping_constraints_command = mapping_constraints ? "--mapping_constraints ${mapping_constraints}" : ''
def screen_files_command = screen_files ? "--screen_files ${screen_files}" : ''
def comment_headers_command = comment_headers ? "--comment_dir ${comment_headers}" : ''
def custom_table_headers_command = custom_table_headers ? "--table_headers ${custom_table_headers}" : ''

"""
Expand All @@ -65,7 +63,6 @@ process CUSTOM_MULTIQC {
$clusters_files \\
$mapping_constraints_command \\
$screen_files_command \\
$comment_headers_command \\
$custom_table_headers_command \\
Expand Down
8 changes: 8 additions & 0 deletions modules/nf-core/prokka/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

83 changes: 83 additions & 0 deletions modules/nf-core/prokka/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 0fb6e45

Please sign in to comment.