Skip to content

Commit

Permalink
Merge pull request #1019 from nf-core/dsl2-metagenomics
Browse files Browse the repository at this point in the history
DSL2: metagenomics
  • Loading branch information
merszym authored Sep 3, 2024
2 parents d63da95 + cfbba4d commit 56718ff
Show file tree
Hide file tree
Showing 53 changed files with 2,791 additions and 127 deletions.
50 changes: 25 additions & 25 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -25,28 +25,28 @@ doi: 10.7717/peerj.10947
date-released: 2022-08-02
url: https://github.com/nf-core/eager
prefered-citation:
type: article
authors:
- family-names: Fellows Yates
given-names: James A.
- family-names: Lamnidis
given-names: Thiseas C.
- family-names: Borry
given-names: Maxime
- family-names: Andrades Valtueña
given-names: Aida
- family-names: Fagernãs
given-names: Zandra
- family-names: Clayton
given-names: Stephen
- family-names: Garcia
given-names: Maxime U.
- family-names: Neukamm
given-names: Judith
- family-names: Peltzer
given-names: Alexander
doi: 10.7717/peerj.10947
start: e10947
title: "Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager"
year: 2021
url: https://dx.doi.org/10.1038/10.7717/peerj.10947
type: article
authors:
- family-names: Fellows Yates
given-names: James A.
- family-names: Lamnidis
given-names: Thiseas C.
- family-names: Borry
given-names: Maxime
- family-names: Andrades Valtueña
given-names: Aida
- family-names: Fagernãs
given-names: Zandra
- family-names: Clayton
given-names: Stephen
- family-names: Garcia
given-names: Maxime U.
- family-names: Neukamm
given-names: Judith
- family-names: Peltzer
given-names: Alexander
doi: 10.7717/peerj.10947
start: e10947
title: "Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager"
year: 2021
url: https://dx.doi.org/10.1038/10.7717/peerj.10947
48 changes: 38 additions & 10 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.
> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: [10.1038/s41587-020-0439-x](https://doi.org/10.1038/s41587-020-0439-x). PubMed PMID: 32055031.
## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: [10.1038/nbt.3820](https://doi.org/10.1038/nbt.3820). PubMed PMID: 28398311.
## Pipeline tools

Expand All @@ -16,11 +16,11 @@
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: [10.1093/bioinformatics/btw354](10.1093/bioinformatics/btw354). Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
- [Falco](https://doi.org/10.12688%2Ff1000research.21142.2)

> de Sena Brandine, G., Smith, A.D. (2019) Falco: high-speed FastQC emulation for quality control of sequencing data. F1000Res., 8, 1874. doi: [10.12688%2Ff1000research.21142.2](https://doi.org/10.12688%2Ff1000research.21142.2)
> de Sena Brandine, G., Smith, A.D. (2019). Falco: high-speed FastQC emulation for quality control of sequencing data. F1000Res., 8, 1874. doi: [10.12688%2Ff1000research.21142.2](https://doi.org/10.12688%2Ff1000research.21142.2)
- [fastp](https://doi.org/10.1093/bioinformatics/bty560)

Expand All @@ -32,7 +32,7 @@
- [Picard Tools](https://broadinstitute.github.io/picard/)

> Broad Institute (2019). Picard Toolkit. GitHub Repository: https://broadinstitute.github.io/picard/
> Broad Institute (2019). Picard Toolkit. GitHub Repository: [https://broadinstitute.github.io/picard/](https://broadinstitute.github.io/picard/)
- [SeqKit](https://bioinf.shenwei.me/seqkit/)

Expand Down Expand Up @@ -126,7 +126,35 @@

> Sex.DetERRmine.py Lamnidis, T.C. et al., 2018. Ancient Fennoscandian genomes reveal origin and spread of Siberian ancestry in Europe. Nature communications, 9(1), p.5018. Available at: http://dx.doi.org/10.1038/s41467-018-07483-5. Download: https://github.com/TCLamnidis/Sex.DetERRmine
- [CircularMapper](https://doi.org/10.1186/s13059-016-0918-z)
- [MALT](https://www.nature.com/articles/s41559-017-0446-6)

> Vågene, Å.J., Herbig, A., Campana, M.G., Nelly, M., García, R., Warinner, C., Sabin, S., Spyrou, M.A., Valtueña, A.A., Huson, D., Tuross, N., Bos, K.I. & Krause, J. (2018). Salmonella enterica genomes from victims of a major sixteenth-century epidemic in Mexico. Nat Ecol Evol 2, 520–528. doi: [10.1038/s41559-017-0446-6](https://doi.org/10.1038/s41559-017-0446-6)
- [HOPS](https://doi.org/10.1186/s13059-019-1903-0)

> Hübler, R., Key, F.M., Warinner, C. et al. (2019). HOPS: automated detection and authentication of pathogen DNA in archaeological remains. Genome Biol 20, 280. doi: [10.1186/s13059-019-1903-0](https://doi.org/10.1186/s13059-019-1903-0)
- [MEGAN](https://doi.org/10.1371/journal.pcbi.1004957)

> Huson DH, Beier S, Flade I, Górska A, El-Hadidi M, Mitra S, et al. (2016) MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale Microbiome Sequencing Data. PLoS Comput Biol 12(6): e1004957. doi: [10.1371/journal.pcbi.1004957] https://doi.org/10.1371/journal.pcbi.1004957
- [Kraken2](https://doi.org/10.1186/s13059-019-1891-0)

> Wood, Derrick E., Jennifer Lu, and Ben Langmead. 2019. Improved Metagenomic Analysis with Kraken 2. Genome Biology 20 (1): 257. doi: [10.1186/s13059-019-1891-0](https://doi.org/10.1186/s13059-019-1891-0).
- [KrakenUniq](https://doi.org/10.1186/s13059-018-1568-0)

> Breitwieser, Florian P., Daniel N. Baker, and Steven L. Salzberg. 2018. KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biology 19 (1): 198. doi: [10.1186/s13059-018-1568-0](https://doi.org/10.1186/s13059-018-1568-0)
- [MetaPhlAn](https://doi.org/10.1038/s41587-023-01688-w)

> Blanco-Míguez, A., Beghini, F., Cumbo, F. et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat Biotechnol (2023). doi: [10.1038/s41587-023-01688-w](https://doi.org/10.1038/s41587-023-01688-w)
- [TAXPASTA](https://doi.org/10.21105/joss.05627)

> Beber et al., (2023). TAXPASTA: TAXonomic Profile Aggregation and STAndardisation. Journal of Open Source Software, 8(87), 5627, doi: [10.21105/joss.05627](https://doi.org/10.21105/joss.05627)
- [CircularMapper](https://doi.org/10.1186/s13059-016-0918-z)

> Peltzer, A., Jäger, G., Herbig, A., Seitz, A., Kniep, C., Krause, J., & Nieselt, K. (2016). EAGER: efficient ancient genome reconstruction. Genome Biology, 17(1), 1–14. doi: [10.1186/s13059-016-0918-z](https://doi.org/10.1186/s13059-016-0918-z)
Expand All @@ -138,16 +166,16 @@
- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

> Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.
> Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: [10.1038/s41592-018-0046-7](https://doi.org/10.1038/s41592-018-0046-7). PubMed PMID: 29967506.
- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

> da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.
> da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: [10.1093/bioinformatics/btx192](https://doi.org/10.1093/bioinformatics/btx192). PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.
- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

> Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.
> Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: [10.5555/2600239.2600241](https://doi.org/10.5555/2600239.2600241).
- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

> Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
> Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: [10.1371/journal.pone.0177459](https://doi.org/10.1371/journal.pone.0177459). eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,8 @@ Additional functionality contained by the pipeline currently includes:
#### Metagenomic Screening

- Low-sequenced complexity filtering (`BBduk` or `PRINSEQ++`)
- Taxonomic binner with alignment (`MALT`)
- Taxonomic binner without alignment (`Kraken2`)
- Taxonomic binner with alignment (`MALT` or `MetaPhlAn 4`)
- Taxonomic binner without alignment (`Kraken2`,`KrakenUniq`)
- aDNA characteristic screening of taxonomically binned data from MALT (`MaltExtract`)

#### Functionality Overview
Expand Down
141 changes: 129 additions & 12 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -397,7 +397,7 @@ process {
withName: SAMTOOLS_FASTQ_MAPPED {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
ext.args = [
params.metagenomicscreening_input == 'all' ? '' : '-F 4',
params.metagenomics_input == 'all' ? '' : '-F 4',
].join(' ').trim()
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}_mapped" }
publishDir = [
Expand Down Expand Up @@ -900,6 +900,17 @@ process {
]
}

//
// MT-NUCLEAR RATIO
//
withName: MTNUCRATIO {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
publishDir = [
enabled: false
]
}


//
// METAGENOMIC SCREENING
//
Expand All @@ -909,37 +920,143 @@ process {
params.metagenomics_prinseq_mode == 'dust' ? "-lc_dust=${params.metagenomics_prinseq_dustscore}" : "-lc_entropy=${params.metagenomics_complexity_entropy}",
"-trim_qual_left=0 -trim_qual_left=0 -trim_qual_window=0 -trim_qual_step=0",
].join(' ').trim()
ext.prefix = { "${meta.sample_id}_${meta.library_id}_complexity" }
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}_complexity" }
publishDir = [
[
path: { "${params.outdir}/metagenomic_complexity_filter/" },
path: { "${params.outdir}/metagenomics/complexity_filter/prinseq" },
mode: params.publish_dir_mode,
pattern: '*{_good_out.fastq.gz,_good_out_R1.fastq.gz,_good_out_R2.fastq.gz,log}',
enabled: params.metagenomics_complexity_savefastq
]
]
}

withName: ".*BBMAP_BBDUK" {
withName: BBMAP_BBDUK {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
ext.args = { "entropymask=f entropy=${params.metagenomics_complexity_entropy}" }
ext.prefix = { "${meta.sample_id}_${meta.library_id}_complexity" }
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}_complexity" }
publishDir = [
path: { "${params.outdir}/metagenomic_complexity_filter/" },
path: { "${params.outdir}/metagenomics/complexity_filter/bbduk/" },
mode: params.publish_dir_mode,
pattern: '*.{fastq.gz,log}',
enabled: params.metagenomics_complexity_savefastq
]
}

//
// MT-NUCLEAR RATIO
//
withName: MTNUCRATIO {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
withName: MALT_RUN {
ext.args = [
"-m ${params.metagenomics_malt_mode}",
"-at ${params.metagenomics_malt_alignmentmode}",
"-top ${params.metagenomics_malt_toppercent}",
"-id ${params.metagenomics_malt_minpercentidentity}",
"-mq ${params.metagenomics_malt_maxqueries}",
"--memoryMode ${params.metagenomics_malt_memorymode}",
params.metagenomics_malt_minsupportmode == "percent" ? "-supp ${params.metagenomics_malt_minsupportpercent}" : "-sup ${params.metagenomics_malt_minsupportreads}",
params.metagenomics_malt_savereads ? "--alignments ./" : ""
].join(' ').trim()
publishDir = [
enabled: false
path: { "${params.outdir}/metagenomics/profiling/malt/" },
mode: params.publish_dir_mode,
pattern: '*.{rma6,log,sam.gz}'
]
ext.prefix = { "${meta.label}_${meta.id}-run" }
}

withName: CAT_CAT_MALT {
ext.prefix = { "${meta.id}_runtime_log_concatenated.log" }
publishDir = [
path: { "${params.outdir}/metagenomics/profiling/malt/" },
mode: params.publish_dir_mode,
pattern: '*.{log}'
]
}

withName: KRAKEN2_KRAKEN2 {
ext.args = [
params.metagenomics_kraken2_saveminimizers ? "--report-minimizer-data" : ""
].join(' ').trim()
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}" }
publishDir = [
path: { "${params.outdir}/metagenomics/profiling/kraken2/" },
mode: params.publish_dir_mode,
pattern: '*.{txt,fastq.gz}'
]
}

withName: KRAKENUNIQ_PRELOADEDKRAKENUNIQ {
publishDir = [
path: { "${params.outdir}/metagenomics/profiling/krakenuniq/" },
mode: params.publish_dir_mode,
pattern: '*.{txt,fastq.gz}'
]
}

withName: METAPHLAN_METAPHLAN {
publishDir = [
path: { "${params.outdir}/metagenomics/profiling/metaphlan/" },
mode: params.publish_dir_mode,
pattern: '*.{biom,txt}'
]
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}" }
}

withName: MALTEXTRACT {
ext.args = [
"-f ${params.metagenomics_maltextract_filter}",
"-a ${params.metagenomics_maltextract_toppercent}",
"--minPI ${params.metagenomics_maltextract_minpercentidentity}",
params.metagenomics_maltextract_destackingoff ? "--destackingOff" : "",
params.metagenomics_maltextract_downsamplingoff ? "--downSampOff" : "",
params.metagenomics_maltextract_duplicateremovaloff ? "--dupRemOff" : "",
params.metagenomics_maltextract_matches ? "--matches" : "",
params.metagenomics_maltextract_megansummary ? "--meganSummary" : "",
params.metagenomics_maltextract_usetopalignment ? "--useTopAlignment" : "",
{ meta.strandedness } == "single" ? '--singleStranded' : '',
].join(' ').trim()
publishDir = [
path: { "${params.outdir}/metagenomics/postprocessing/maltextract/" },
mode: params.publish_dir_mode,
pattern: 'results',
saveAs: { "${meta.id}" }
]
}

withName: MEGAN_RMA2INFO {
tag = {"${meta.id}"}
ext.args = "-c2c Taxonomy"
ext.prefix = { "${meta.id}" }
publishDir = [
path: { "${params.outdir}/metagenomics/postprocessing/megan_summaries/" },
mode: params.publish_dir_mode,
pattern: '*.{txt.gz,megan}'
]
}

withName: AMPS {
publishDir = [
path: { "${params.outdir}/metagenomics/postprocessing/maltextract/" },
mode: params.publish_dir_mode,
pattern: 'results'
]
errorStrategy = 'ignore' // required as it fails the run for low reads: https://github.com/rhuebler/HOPS/issues/9
}

withName: TAXPASTA_MERGE {
publishDir = [
path: { "${params.outdir}/metagenomics/postprocessing/taxpasta/" },
mode: params.publish_dir_mode,
pattern: '*.{csv,tsv,ods,xlsx,arrow,parquet,biom}'
]
ext.args = { "--profiler ${meta.profiler} --output ${meta.profiler}_taxpasta_table.tsv" }
}

withName: TAXPASTA_STANDARDISE {
publishDir = [
path: { "${params.outdir}/metagenomics/postprocessing/taxpasta/" },
mode: params.publish_dir_mode,
pattern: '*.{csv,tsv,ods,xlsx,arrow,parquet,biom}'
]
ext.args = { "--profiler ${meta.profiler} --output ${meta.profiler}taxpasta_table.tsv" }
}

withName: 'QUALIMAP_BAMQC_WITHBED|QUALIMAP_BAMQC_NOBED' {
Expand Down
4 changes: 1 addition & 3 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,5 @@ params {
mapstats_bedtools_featurefile = params.pipelines_testdata_base_path + 'eager/reference/Mammoth/Mammoth_MT_Krause.gff3'

// Metagenomic screening
run_metagenomicscreening = false


run_metagenomics = false
}
2 changes: 1 addition & 1 deletion conf/test_humanbam.config
Original file line number Diff line number Diff line change
Expand Up @@ -48,5 +48,5 @@ params {
bamfiltering_mappingquality = 37

// Metagenomic screening
run_metagenomicscreening = false
run_metagenomics = false
}
33 changes: 33 additions & 0 deletions conf/test_kraken2.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test for
metagenomics krakenuniq.
Use as follows:
nextflow run nf-core/eager -profile test_krakenuniq,<docker/singularity> --outdir <OUTDIR>
----------------------------------------------------------------------------------------
*/

params {
config_profile_name = 'Kraken2 test profile'
config_profile_description = 'Minimal test dataset to check the metagenomics kraken2 pipeline function'

// Limit resources so that this can run on GitHub Actions
max_cpus = 2
max_memory = '6.GB'
max_time = '6.h'

// Input data
input = params.pipelines_testdata_base_path + 'eager/testdata/Mammoth/samplesheet_v3.tsv'

// Genome references
fasta = params.pipelines_testdata_base_path + 'eager/reference/Mammoth/Mammoth_MT_Krause.fasta'

// Metagenomics
run_metagenomics = true
metagenomics_profiling_tool = 'kraken2'
metagenomics_profiling_database = params.pipelines_testdata_base_path + 'eager/databases/kraken/eager_test.tar.gz'
}
Loading

0 comments on commit 56718ff

Please sign in to comment.