Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving pplacement #826

Open
wants to merge 28 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
717abb8
Merge pull request #725 from nf-core/dev
d4straub Apr 3, 2024
3f40a1b
Merge pull request #755 from nf-core/dev
d4straub Jun 27, 2024
0473e15
Merge pull request #771 from nf-core/dev
d4straub Aug 5, 2024
fa53cd6
added new module for pplacement called hmmexxtract
Oct 28, 2024
45d650a
include hmmmextract module in main workflow
Oct 28, 2024
909d21b
created a test for pplace_hmmsearch
Oct 28, 2024
5bd8941
add schema_phylosearch_input.json
Oct 28, 2024
92cdeb0
Merge branch 'improving-pplacement' into improving-pplacement-dev
Oct 28, 2024
9931172
adding subworfklow fasta_hmmsearch_rank_fastas + implemented nfschema…
Oct 28, 2024
9185a9a
explicit variable for phylosearch
Oct 29, 2024
504abde
parsing correctly input phylosearch
Oct 29, 2024
fad290d
added subworkflow for phylosearch
Oct 29, 2024
7bc4806
add last piece for running FASTA_NEWICK_EPANG_GAPPA module
Oct 29, 2024
7c20f0d
add last piece for running FASTA_NEWICK_EPANG_GAPPA module
Oct 29, 2024
5469375
change merge function with join
Oct 29, 2024
ccb639f
update cpu limits for test_pplace_hmmsearch
Oct 29, 2024
b43eb97
nf-core fasta_newick_epang_gappa updated
Oct 30, 2024
8f139ce
Release 2.12.0
d4straub Nov 15, 2024
db615b1
a lot of updates and nf-test
Dec 13, 2024
813b133
fix conflict
Dec 13, 2024
f036241
Merge branch 'nf-core:master' into improving-pplacement-dev
danilodileo Dec 13, 2024
fd56c9f
prettier
Dec 13, 2024
f7a8af4
update from dev branch
Jan 20, 2025
cda8d13
fasta newick epang gappa subworkflow updated
Jan 20, 2025
3ab3268
added nf-test
Jan 21, 2025
6ba376e
.gitignore update
Jan 23, 2025
aaa0537
removed .nf-test
Jan 23, 2025
2dc26f7
removed whitespace
Jan 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ testing/
testing*
*.pyc
null/
.nf-test/test
6 changes: 1 addition & 5 deletions .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,4 @@ template:
outdir: .
skip_features:
- igenomes
version: 2.13.0dev
update:
https://github.com/nf-core/modules.git:
nf-core:
mafft: feb29be775d9e41750180539e9a3bdce801d0609
version: 2.12.0dev
33 changes: 33 additions & 0 deletions .nf-test.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
Jan-21 15:44:08.149 [main] INFO com.askimed.nf.test.App - nf-test 0.9.0
Jan-21 15:44:08.182 [main] INFO com.askimed.nf.test.App - Arguments: [test, --profile, test_pplace_hmmsearch,singularity, ./tests/pipeline/pplace_hmmsearch.nf.test]
Jan-21 15:44:12.792 [main] INFO com.askimed.nf.test.App - Nextflow Version: 24.10.3
Jan-21 15:44:12.796 [main] INFO com.askimed.nf.test.commands.RunTestsCommand - Load config from file /cfs/klemming/projects/supr/snic2020-16-76/ddl/dev/ampliseq/nf-test.config...
Jan-21 15:44:19.790 [main] INFO com.askimed.nf.test.lang.dependencies.DependencyResolver - Loaded 212 files from directory /cfs/klemming/projects/supr/snic2020-16-76/ddl/dev/ampliseq in 4.903 sec
Jan-21 15:44:19.796 [main] INFO com.askimed.nf.test.lang.dependencies.DependencyResolver - Found 1 tests.
Jan-21 15:44:19.796 [main] DEBUG com.askimed.nf.test.lang.dependencies.DependencyResolver - Found tests: [/cfs/klemming/projects/supr/snic2020-16-76/ddl/dev/ampliseq/tests/pipeline/pplace_hmmsearch.nf.test]
Jan-21 15:44:19.796 [main] INFO com.askimed.nf.test.commands.RunTestsCommand - Detected 1 test files.
Jan-21 15:44:20.249 [main] INFO com.askimed.nf.test.core.TestExecutionEngine - Started test plan
Jan-21 15:44:20.249 [main] INFO com.askimed.nf.test.core.TestExecutionEngine - Running testsuite 'Test Workflow main.nf' from file '/cfs/klemming/projects/supr/snic2020-16-76/ddl/dev/ampliseq/tests/pipeline/pplace_hmmsearch.nf.test'.
Jan-21 15:44:20.250 [main] INFO com.askimed.nf.test.core.TestExecutionEngine - Run test 'b466648a: test_pplace_hmmsearch'. type: com.askimed.nf.test.lang.pipeline.PipelineTest
Jan-21 15:49:18.369 [main] DEBUG com.askimed.nf.test.lang.extensions.SnapshotFile - Init new snapshot file '/cfs/klemming/projects/supr/snic2020-16-76/ddl/dev/ampliseq/tests/pipeline/pplace_hmmsearch.nf.test.snap'
Jan-21 15:49:18.381 [main] DEBUG com.askimed.nf.test.lang.extensions.Snapshot - Snapshot 'overall_summary_tsv' not found.
Jan-21 15:49:18.406 [main] DEBUG com.askimed.nf.test.lang.extensions.SnapshotFile - Created snapshot 'overall_summary_tsv'
Jan-21 15:49:18.497 [main] DEBUG com.askimed.nf.test.lang.extensions.SnapshotFile - Wrote snapshots to file '/cfs/klemming/projects/supr/snic2020-16-76/ddl/dev/ampliseq/tests/pipeline/pplace_hmmsearch.nf.test.snap'
Jan-21 15:49:18.498 [main] DEBUG com.askimed.nf.test.lang.extensions.Snapshot - Snapshot 'barrnap' not found.
Jan-21 15:49:18.498 [main] DEBUG com.askimed.nf.test.lang.extensions.SnapshotFile - Created snapshot 'barrnap'
Jan-21 15:49:18.535 [main] DEBUG com.askimed.nf.test.lang.extensions.SnapshotFile - Wrote snapshots to file '/cfs/klemming/projects/supr/snic2020-16-76/ddl/dev/ampliseq/tests/pipeline/pplace_hmmsearch.nf.test.snap'
Jan-21 15:49:18.536 [main] DEBUG com.askimed.nf.test.lang.extensions.Snapshot - Snapshot 'cutadapt' not found.
Jan-21 15:49:18.536 [main] DEBUG com.askimed.nf.test.lang.extensions.SnapshotFile - Created snapshot 'cutadapt'
Jan-21 15:49:18.570 [main] DEBUG com.askimed.nf.test.lang.extensions.SnapshotFile - Wrote snapshots to file '/cfs/klemming/projects/supr/snic2020-16-76/ddl/dev/ampliseq/tests/pipeline/pplace_hmmsearch.nf.test.snap'
Jan-21 15:49:18.570 [main] DEBUG com.askimed.nf.test.lang.extensions.Snapshot - Snapshot 'dada2' not found.
Jan-21 15:49:18.571 [main] DEBUG com.askimed.nf.test.lang.extensions.SnapshotFile - Created snapshot 'dada2'
Jan-21 15:49:18.606 [main] DEBUG com.askimed.nf.test.lang.extensions.SnapshotFile - Wrote snapshots to file '/cfs/klemming/projects/supr/snic2020-16-76/ddl/dev/ampliseq/tests/pipeline/pplace_hmmsearch.nf.test.snap'
Jan-21 15:49:18.606 [main] DEBUG com.askimed.nf.test.lang.extensions.Snapshot - Snapshot 'input' not found.
Jan-21 15:49:18.606 [main] DEBUG com.askimed.nf.test.lang.extensions.SnapshotFile - Created snapshot 'input'
Jan-21 15:49:18.659 [main] DEBUG com.askimed.nf.test.lang.extensions.SnapshotFile - Wrote snapshots to file '/cfs/klemming/projects/supr/snic2020-16-76/ddl/dev/ampliseq/tests/pipeline/pplace_hmmsearch.nf.test.snap'
Jan-21 15:49:18.662 [main] DEBUG com.askimed.nf.test.lang.extensions.Snapshot - Snapshot 'multiqc' not found.
Jan-21 15:49:18.662 [main] DEBUG com.askimed.nf.test.lang.extensions.SnapshotFile - Created snapshot 'multiqc'
Jan-21 15:49:18.719 [main] DEBUG com.askimed.nf.test.lang.extensions.SnapshotFile - Wrote snapshots to file '/cfs/klemming/projects/supr/snic2020-16-76/ddl/dev/ampliseq/tests/pipeline/pplace_hmmsearch.nf.test.snap'
Jan-21 15:49:18.726 [main] INFO com.askimed.nf.test.core.TestExecutionEngine - Test 'b466648a: test_pplace_hmmsearch' finished. status: PASSED
Jan-21 15:49:18.739 [main] INFO com.askimed.nf.test.core.TestExecutionEngine - Testsuite 'Test Workflow main.nf' finished. snapshot file: true, skipped tests: false, failed tests: false
Jan-21 15:49:18.743 [main] INFO com.askimed.nf.test.core.TestExecutionEngine - Executed 1 tests. 0 tests failed. Done!
66 changes: 66 additions & 0 deletions assets/schema_phylosearch_input.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://raw.githubusercontent.com/nf-core/ampliseq/master/assets/schema_pplace_sheet.json",
"title": "nf-core/phyloplace pipeline - params.pplace_sheet schema",
"description": "Schema for the file provided with params.pplace_sheet",
"type": "array",
"items": {
"type": "object",
"properties": {
"target": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Target name must be provided and cannot contain spaces",
"meta": ["id"]
},
"min_bitscore": {
"type": "integer",
"errorMessage": "Minimum bitscore for hits to this HMM.",
"meta": ["min_bitscore"]
},
"alignmethod": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Method to use for aligning: 'hmmer' or 'mafft'.",
"meta": ["alignmethod"]
},
"hmm": {
"type": "string",
"pattern": "^\\S+.hmm$",
"errorMessage": "HMMER HMM file to search sequences with.",
"meta": ["hmm"]
},
"extract_hmm": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Name of HMM file in multi-HMM to extract.",
"meta": ["extract_hmm"]
},
"refseqfile": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Multiple sequence alignment of reference sequences. Any format suppored by hmmbuild in HMMER (see queryfile for examples) or MAFFT.",
"meta": ["refseqfile"]
},
"refphylogeny": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Newick formatted file with the reference phylogeny.",
"meta": ["refphylogeny"]
},
"model": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Phylogenetic model to use in placement, see EPA-NG documentation.",
"meta": ["model"]
},
"taxonomy": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Tab separated file with taxonomy assignments of reference sequences.",
"meta": ["taxonomy"]
}
},
"required": ["target", "hmm"]
}
}
4 changes: 4 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -797,6 +797,10 @@ process {
]
}

withName: HMMER_HMMSEARCH {
ext.args = { meta.min_bitscore && "${meta.min_bitscore}" != "null" ? "--incT ${meta.min_bitscore}" : "" }
}

withName: 'QIIME2_INASV|QIIME2_INSEQ|QIIME2_INTAX|QIIME2_INTREE' {
publishDir = [
path: { "${params.outdir}/qiime2/input" },
Expand Down
48 changes: 48 additions & 0 deletions conf/test_pplace_hmmsearch.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.

Use as follows:
nextflow run nf-core/ampliseq -profile test_pplace,<docker/singularity> --outdir <OUTDIR>

----------------------------------------------------------------------------------------
*/

process {
resourceLimits = [
cpus: 20,
memory: '16.GB',
time: '6.h'
]
}

params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'

// Input data
FW_primer = "GTGYCAGCMGCCGCGGTAA"
RV_primer = "GGACTACNVGGGTWTCTAAT"
input = params.pipelines_testdata_base_path + "ampliseq/samplesheets/Samplesheet.tsv"
metadata = params.pipelines_testdata_base_path + "ampliseq/samplesheets/Metadata.tsv"
skip_dada_taxonomy = true
qiime_ref_taxonomy = "greengenes85"
filter_ssu = "bac"

// this is to remove low abundance ASVs to reduce runtime of downstream processes
min_samples = 2
min_frequency = 10

// pplace
pplace_sheet = 'https://raw.githubusercontent.com/erikrikarddaniel/test-datasets/phyloplace/testdata/phylosearch_input.csv'

// Adjust taxonomic levels
tax_agglom_min = 1
tax_agglom_max = 3

// Skip some steps to reduce runtime
skip_alpha_rarefaction = true
skip_fastqc = true
}
83 changes: 69 additions & 14 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,39 @@
"https://github.com/nf-core/modules.git": {
"modules": {
"nf-core": {
"clustalo/align": {
"branch": "master",
"git_sha": "7b32b09fe7787c0fc6924e7b6f223a0b1daf0d2f",
"installed_by": [
"_",
"a",
"c",
"e",
"f",
"g",
"i",
"k",
"n",
"p",
"s",
"t",
"w",
"fasta_newick_epang_gappa"
]
},
"cutadapt": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"epang/place": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["fasta_newick_epang_gappa"]
},
"epang/split": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"git_sha": "868cb0d7fc4862991fb7c2b4cd7289806cd53f81",
"installed_by": ["fasta_newick_epang_gappa"]
},
"fastqc": {
Expand All @@ -27,49 +47,74 @@
},
"gappa/examineassign": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["fasta_newick_epang_gappa"]
},
"gappa/examinegraft": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["fasta_newick_epang_gappa"]
},
"gappa/examineheattree": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["fasta_newick_epang_gappa"]
},
"hmmer/eslalimask": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["fasta_newick_epang_gappa"]
},
"hmmer/eslreformat": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"git_sha": "0e9cb409c32d3ec4f0d3804588e4778971c09b7e",
"installed_by": ["fasta_newick_epang_gappa"]
},
"hmmer/hmmalign": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"git_sha": "03a9f356a1a333923c1177c2912fa7bc61bb46f3",
"installed_by": ["fasta_newick_epang_gappa"]
},
"hmmer/hmmbuild": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"git_sha": "03a9f356a1a333923c1177c2912fa7bc61bb46f3",
"installed_by": ["fasta_newick_epang_gappa"]
},
"hmmer/hmmrank": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["fasta_hmmsearch_rank_fastas"]
},
"hmmer/hmmsearch": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["fasta_hmmsearch_rank_fastas"]
},
"kraken2/kraken2": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"],
"patch": "modules/nf-core/kraken2/kraken2/kraken2-kraken2.diff"
},
"mafft": {
"branch": "master",
"git_sha": "feb29be775d9e41750180539e9a3bdce801d0609",
"installed_by": ["fasta_newick_epang_gappa"]
"mafft/align": {
"branch": "master",
"git_sha": "868cb0d7fc4862991fb7c2b4cd7289806cd53f81",
"installed_by": [
"_",
"a",
"c",
"e",
"f",
"g",
"i",
"k",
"n",
"p",
"s",
"t",
"w",
"fasta_newick_epang_gappa"
]
},
"multiqc": {
"branch": "master",
Expand All @@ -82,6 +127,11 @@
"installed_by": ["modules"],
"patch": "modules/nf-core/pigz/uncompress/pigz-uncompress.diff"
},
"seqtk/subseq": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["fasta_hmmsearch_rank_fastas"]
},
"untar": {
"branch": "master",
"git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa",
Expand All @@ -106,9 +156,14 @@
},
"subworkflows": {
"nf-core": {
"fasta_hmmsearch_rank_fastas": {
"branch": "master",
"git_sha": "15086c852c860f785a3654cba03f0ee00533cd08",
"installed_by": ["subworkflows"]
},
"fasta_newick_epang_gappa": {
"branch": "master",
"git_sha": "cfd937a668919d948f6fcbf4218e79de50c2f36f",
"git_sha": "725f406d25254b40a4bf436159ab841d43c43a17",
"installed_by": ["subworkflows"]
},
"utils_nextflow_pipeline": {
Expand Down
61 changes: 61 additions & 0 deletions modules/local/hmmextract.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
// This is a modified version of nf-core/hmmer/hmmfetch that only extracts, but
// does so from a single input channel to keep things synchronized.
process HMMER_HMMEXTRACT {
tag "$meta.id"
label 'process_single'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/hmmer:3.3.2--h87f3376_2':
'biocontainers/hmmer:3.3.2--h87f3376_2' }"

input:
tuple val(meta), path(hmm), val(key)

output:
tuple val(meta), path("*.hmm"), emit: hmm
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def outfile = ! key && ! keyfile ? '' : "> ${prefix}.hmm"

// Avoid accidentally overwriting the input hmm
def move = ""
if ( "${prefix}.hmm" == "${hmm}" ) {
move = "mv ${hmm} ${prefix}.in.hmm"
hmm = "${prefix}.in.hmm"
}

"""
$move

hmmfetch \\
$args \\
$hmm \\
$key \\
$outfile

cat <<-END_VERSIONS > versions.yml
"${task.process}":
hmmer: \$(hmmsearch -h | grep -o '^# HMMER [0-9.]*' | sed 's/^# HMMER *//')
END_VERSIONS
"""

stub:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"

"""
touch ${prefix}.hmm

cat <<-END_VERSIONS > versions.yml
"${task.process}":
hmmer: \$(hmmsearch -h | grep -o '^# HMMER [0-9.]*' | sed 's/^# HMMER *//')
END_VERSIONS
"""
}
Loading
Loading