Making PROFILE take in ONT data #199

simonleandergrimm · 2025-02-12T22:40:35Z

This PR makes PROFILE take in ONT data. To achieve this, I added a second samtools process, and ribo identification using minimap2. Again, @harmonbhasin I'm using my own ribo reference for minimap2. Let me know if I should add the creation of said reference to the index workflow.

… possible.

harmonbhasin

LGTM. I would add the creation of the human reference to the index workflow. Once you do that I can approve this.

jeffkaufman · 2025-02-13T16:07:21Z

subworkflows/local/profile/main.nf

-        ribo_path = "${ref_dir}/results/ribo-ref-concat.fasta.gz"
-        ribo_ch = BBDUK(reads_ch, ribo_path, min_kmer_fraction, k, bbduk_suffix, !single_end)
+        if (params.ont) {
+            ribo_path = "s3://nao-mgs-simon/ont-indices/2024-12-14/minimap2-ribo-index/ribo-ref-concat-unique.mmi"


This is another reference that needs to be created in the index workflow and referenced in the index

jeffkaufman · 2025-02-13T16:07:28Z

tests/modules/local/minimap2/main.nf.test

+            process {
+                '''
+                input[0] = LOAD_SAMPLESHEET.out.samplesheet
+                input[1] = "s3://nao-mgs-simon/ont-indices/2024-12-14/minimap2-ribo-index/ribo-ref-concat-unique.mmi"


jeffkaufman · 2025-02-13T16:07:47Z

tests/modules/local/samtools/separate.nf.test

+            process {
+                """
+                input[0] = LOAD_SAMPLESHEET.out.samplesheet
+                input[1] = "s3://nao-mgs-simon/ont-indices/2024-12-14/minimap2-human-index/chm13v2.0.mmi"


jeffkaufman · 2025-02-13T16:10:12Z

subworkflows/local/profile/main.nf

+        } else {
+            ribo_path = "${ref_dir}/results/ribo-ref-concat.fasta.gz"
+            ribo_ch = BBDUK(reads_ch, ribo_path, min_kmer_fraction, k, ribo_suffix, !single_end)
+        }
        // Run taxonomic profiling separately on ribo and non-ribo reads
        tax_ribo_ch = TAXONOMY_RIBO(ribo_ch.match, kraken_db_ch, "D", bracken_threshold, single_end)


FYI I expect Kraken to do a bad job on generating a taxonomic profile here

But I still think this PR is in the right direction: if we want to do better taxonomic profiling for ONT data that should be a separately-prioritized followup

willbradshaw · 2025-02-13T21:52:57Z

modules/local/samtools/main.nf

+}
+
+// Return aligned and unaligned reads separately as FASTQs
+process SAMTOOLS_SEPARATE {


This process is a superset of SAMTOOLS_FILTER. You don't need both.

(I would implement this as two calls to a general SAMTOOLS_FASTQ process with different input parameters, followed by a channel merge.)

willbradshaw · 2025-02-13T21:53:39Z

subworkflows/local/profile/main.nf

-        ribo_ch = BBDUK(reads_ch, ribo_path, min_kmer_fraction, k, bbduk_suffix, !single_end)
+        if (params.ont) {
+            ribo_path = "s3://nao-mgs-simon/ont-indices/2024-12-14/minimap2-ribo-index/ribo-ref-concat-unique.mmi"
+            mapped_ch = MINIMAP2_RIBO(reads_ch, ribo_path, ribo_suffix)


Speaking from ignorance here, but why should BBDuk not work on ONT data? It's calculating a fraction of matching k-mers so read length per se shouldn't be fatal.

@simonleandergrimm I'd still like this to be addressed before we move forward here, I remain uneasy about switching to an alignment-based approach without having a good understanding of how this might affect the results.

@willbradshaw Can you say more about what your worry is? Mostly that minimap2 is more resource-intensive? I think the files are small enough that this doesn't matter that much.

On priors I'd assume minimap2 to be better than bbduk (I'd assume the latter needs a bunch of playing around to find the fraction of matching k-mers that gives equivalent performance to minimap2).

Finally, I'd be pretty keen to get the ONT version of the pipeline up and running fast and do optimization later (especially because this part of the pipeline shouldn't affect the Illumina runs).

willbradshaw · 2025-02-13T21:54:27Z

tests/modules/local/minimap2/main.nf.test

+                '''
+            }
+        }
+        then {


As commented in another PR, I'd prefer a test of the actual content, not just that there is content.

This is now fixed.

…iltering branch.

simonleandergrimm · 2025-02-17T16:06:55Z

@willbradshaw Before creating the new index, tests here will fail. It would still be good to get your feedback in the meantime.

simonleandergrimm · 2025-02-17T16:17:57Z

subworkflows/local/profile/main.nf

-        tax_ribo_ch = TAXONOMY_RIBO(ribo_ch.match, kraken_db_ch, "D", bracken_threshold, single_end)
-        tax_noribo_ch = TAXONOMY_NORIBO(ribo_ch.nomatch, kraken_db_ch, "D", bracken_threshold, single_end)
+        tax_ribo_ch = TAXONOMY_RIBO(
+            ont ? ribo_ch.reads_mapped : ribo_ch.match,


not particularly happy with this, but I don't want to call the output of minimap2 "match" and "no_match"

I don't think this is terrible, but given that the two branches of the if statement produce quite different things (k-mer matches vs alignments) it might be better to give them different names in the if statement, then define some new variables (maybe ribo_in and noribo_in) to hold them. Then the calls to TAXONOMY can take ribo_in and noribo_in as their first input.

Also, minor, but I'd prefer if we went back to having multiple arguments per line, I'm not a fan of these very long one-arg-per-line function calls.

willbradshaw · 2025-02-17T23:17:20Z

subworkflows/local/profile/main.nf

-        tax_ribo_ch = TAXONOMY_RIBO(ribo_ch.match, kraken_db_ch, "D", bracken_threshold, single_end)
-        tax_noribo_ch = TAXONOMY_NORIBO(ribo_ch.nomatch, kraken_db_ch, "D", bracken_threshold, single_end)
+        tax_ribo_ch = TAXONOMY_RIBO(
+            ont ? ribo_ch.reads_mapped : ribo_ch.match,


I don't think this is terrible, but given that the two branches of the if statement produce quite different things (k-mer matches vs alignments) it might be better to give them different names in the if statement, then define some new variables (maybe ribo_in and noribo_in) to hold them. Then the calls to TAXONOMY can take ribo_in and noribo_in as their first input.

Also, minor, but I'd prefer if we went back to having multiple arguments per line, I'm not a fan of these very long one-arg-per-line function calls.

tests/modules/local/minimap2/main.nf.test

modules/local/minimap2/main.nf

willbradshaw · 2025-02-17T23:21:23Z

subworkflows/local/profile/main.nf

-        ribo_ch = BBDUK(reads_ch, ribo_path, min_kmer_fraction, k, bbduk_suffix, !single_end)
+        if (params.ont) {
+            ribo_path = "s3://nao-mgs-simon/ont-indices/2024-12-14/minimap2-ribo-index/ribo-ref-concat-unique.mmi"
+            mapped_ch = MINIMAP2_RIBO(reads_ch, ribo_path, ribo_suffix)


@simonleandergrimm I'd still like this to be addressed before we move forward here, I remain uneasy about switching to an alignment-based approach without having a good understanding of how this might affect the results.

simonleandergrimm added 23 commits February 8, 2025 10:12

Adding a manual flag for human read filtering.

bb278f3

Adding stand-alone containers for minimap2 and samtools.

0aab9dd

Adding code for filtering out human reads on ont.

79cff17

Added optional human read filtering to subsetTrim

1ae7470

Added human_read_filtering param to subset_trim

9ff3db1

Merge branch 'dev' into simon-human-read-filtering

2989cde

added comment to human read filtering.

d8c6790

dropping second human read filtering.

acdafe9

added dislcaimer re ONT

49af743

Added a test for minimap2

ca671f5

Added a work in progress test for samtools. Doesn't yet work properly.

edde7be

Merge branch 'dev' into simon-human-read-filtering

5950650

Added samtools and profile edits to make handling ONT data in PROFILE…

9707939

… possible.

fixed human read filtering flag in run.config

7652019

Added proper tests and streaming.

a730568

added resource specification to samtools

3924195

adding comments to main.nf.test samtools

259caec

Merge branch 'dev' into simon-human-read-filtering

5b44d38

Merge branch 'simon-human-read-filtering' into simon-ont-profile-v2

3808e9c

added streaming to samtools.

347cb04

Merge branch 'simon-human-read-filtering' into simon-ont-profile-v2

cb3afbb

fixed testing for samtools

5bd7854

adding ribo index testing.

1e9f933

simonleandergrimm requested a review from harmonbhasin February 12, 2025 22:40

simonleandergrimm assigned harmonbhasin Feb 12, 2025

harmonbhasin requested changes Feb 13, 2025

View reviewed changes

jeffkaufman requested changes Feb 13, 2025

View reviewed changes

jeffkaufman assigned simonleandergrimm and unassigned harmonbhasin Feb 13, 2025

simonleandergrimm mentioned this pull request Feb 13, 2025

Adding human read filtering to subsetTrim #198

Closed

willbradshaw requested changes Feb 13, 2025

View reviewed changes

simonleandergrimm changed the base branch from simon-human-read-filtering to dev February 14, 2025 19:47

simonleandergrimm added 8 commits February 14, 2025 19:54

Reverting the edits that were introduced through merging human_read_f…

2730c11

…iltering branch.

WIP change save, still need to fix reference generation.

683bd95

Created custom container for samtools and minimap2

6714709

fixing minimap2 reference name.

666b0bb

fixed subset input params.

af8eff0

amended profile to use output of new minmap2 process.

6eb0ba2

dropped samtools process and tests

64d649f

resetting style of subset_trim input variable order.

085d8bd

simonleandergrimm requested a review from willbradshaw February 17, 2025 16:07

simonleandergrimm commented Feb 17, 2025

View reviewed changes

willbradshaw reviewed Feb 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making PROFILE take in ONT data #199

Making PROFILE take in ONT data #199

simonleandergrimm commented Feb 12, 2025

harmonbhasin left a comment •

edited

Loading

jeffkaufman Feb 13, 2025

jeffkaufman Feb 13, 2025

jeffkaufman Feb 13, 2025

jeffkaufman Feb 13, 2025

willbradshaw Feb 13, 2025 •

edited

Loading

willbradshaw Feb 13, 2025

willbradshaw Feb 17, 2025

simonleandergrimm Feb 18, 2025

simonleandergrimm Feb 18, 2025

simonleandergrimm Feb 18, 2025

willbradshaw Feb 13, 2025

simonleandergrimm Feb 17, 2025

simonleandergrimm commented Feb 17, 2025 •

edited

Loading

simonleandergrimm Feb 17, 2025

willbradshaw Feb 17, 2025

willbradshaw Feb 17, 2025

willbradshaw Feb 17, 2025

Making PROFILE take in ONT data #199

Are you sure you want to change the base?

Making PROFILE take in ONT data #199

Conversation

simonleandergrimm commented Feb 12, 2025

harmonbhasin left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willbradshaw Feb 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonleandergrimm commented Feb 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harmonbhasin left a comment •

edited

Loading

willbradshaw Feb 13, 2025 •

edited

Loading

simonleandergrimm commented Feb 17, 2025 •

edited

Loading