-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making PROFILE take in ONT data #199
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I would add the creation of the human reference to the index workflow. Once you do that I can approve this.
subworkflows/local/profile/main.nf
Outdated
ribo_path = "${ref_dir}/results/ribo-ref-concat.fasta.gz" | ||
ribo_ch = BBDUK(reads_ch, ribo_path, min_kmer_fraction, k, bbduk_suffix, !single_end) | ||
if (params.ont) { | ||
ribo_path = "s3://nao-mgs-simon/ont-indices/2024-12-14/minimap2-ribo-index/ribo-ref-concat-unique.mmi" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another reference that needs to be created in the index workflow and referenced in the index
process { | ||
''' | ||
input[0] = LOAD_SAMPLESHEET.out.samplesheet | ||
input[1] = "s3://nao-mgs-simon/ont-indices/2024-12-14/minimap2-ribo-index/ribo-ref-concat-unique.mmi" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
process { | ||
""" | ||
input[0] = LOAD_SAMPLESHEET.out.samplesheet | ||
input[1] = "s3://nao-mgs-simon/ont-indices/2024-12-14/minimap2-human-index/chm13v2.0.mmi" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
subworkflows/local/profile/main.nf
Outdated
} else { | ||
ribo_path = "${ref_dir}/results/ribo-ref-concat.fasta.gz" | ||
ribo_ch = BBDUK(reads_ch, ribo_path, min_kmer_fraction, k, ribo_suffix, !single_end) | ||
} | ||
// Run taxonomic profiling separately on ribo and non-ribo reads | ||
tax_ribo_ch = TAXONOMY_RIBO(ribo_ch.match, kraken_db_ch, "D", bracken_threshold, single_end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I expect Kraken to do a bad job on generating a taxonomic profile here
But I still think this PR is in the right direction: if we want to do better taxonomic profiling for ONT data that should be a separately-prioritized followup
modules/local/samtools/main.nf
Outdated
} | ||
|
||
// Return aligned and unaligned reads separately as FASTQs | ||
process SAMTOOLS_SEPARATE { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This process is a superset of SAMTOOLS_FILTER. You don't need both.
(I would implement this as two calls to a general SAMTOOLS_FASTQ process with different input parameters, followed by a channel merge.)
subworkflows/local/profile/main.nf
Outdated
ribo_ch = BBDUK(reads_ch, ribo_path, min_kmer_fraction, k, bbduk_suffix, !single_end) | ||
if (params.ont) { | ||
ribo_path = "s3://nao-mgs-simon/ont-indices/2024-12-14/minimap2-ribo-index/ribo-ref-concat-unique.mmi" | ||
mapped_ch = MINIMAP2_RIBO(reads_ch, ribo_path, ribo_suffix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking from ignorance here, but why should BBDuk not work on ONT data? It's calculating a fraction of matching k-mers so read length per se shouldn't be fatal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonleandergrimm I'd still like this to be addressed before we move forward here, I remain uneasy about switching to an alignment-based approach without having a good understanding of how this might affect the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willbradshaw Can you say more about what your worry is? Mostly that minimap2 is more resource-intensive? I think the files are small enough that this doesn't matter that much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On priors I'd assume minimap2 to be better than bbduk (I'd assume the latter needs a bunch of playing around to find the fraction of matching k-mers that gives equivalent performance to minimap2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, I'd be pretty keen to get the ONT version of the pipeline up and running fast and do optimization later (especially because this part of the pipeline shouldn't affect the Illumina runs).
''' | ||
} | ||
} | ||
then { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As commented in another PR, I'd prefer a test of the actual content, not just that there is content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now fixed.
@willbradshaw Before creating the new index, tests here will fail. It would still be good to get your feedback in the meantime. |
tax_ribo_ch = TAXONOMY_RIBO(ribo_ch.match, kraken_db_ch, "D", bracken_threshold, single_end) | ||
tax_noribo_ch = TAXONOMY_NORIBO(ribo_ch.nomatch, kraken_db_ch, "D", bracken_threshold, single_end) | ||
tax_ribo_ch = TAXONOMY_RIBO( | ||
ont ? ribo_ch.reads_mapped : ribo_ch.match, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not particularly happy with this, but I don't want to call the output of minimap2 "match" and "no_match"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is terrible, but given that the two branches of the if statement produce quite different things (k-mer matches vs alignments) it might be better to give them different names in the if statement, then define some new variables (maybe ribo_in and noribo_in) to hold them. Then the calls to TAXONOMY can take ribo_in and noribo_in as their first input.
Also, minor, but I'd prefer if we went back to having multiple arguments per line, I'm not a fan of these very long one-arg-per-line function calls.
tax_ribo_ch = TAXONOMY_RIBO(ribo_ch.match, kraken_db_ch, "D", bracken_threshold, single_end) | ||
tax_noribo_ch = TAXONOMY_NORIBO(ribo_ch.nomatch, kraken_db_ch, "D", bracken_threshold, single_end) | ||
tax_ribo_ch = TAXONOMY_RIBO( | ||
ont ? ribo_ch.reads_mapped : ribo_ch.match, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is terrible, but given that the two branches of the if statement produce quite different things (k-mer matches vs alignments) it might be better to give them different names in the if statement, then define some new variables (maybe ribo_in and noribo_in) to hold them. Then the calls to TAXONOMY can take ribo_in and noribo_in as their first input.
Also, minor, but I'd prefer if we went back to having multiple arguments per line, I'm not a fan of these very long one-arg-per-line function calls.
subworkflows/local/profile/main.nf
Outdated
ribo_ch = BBDUK(reads_ch, ribo_path, min_kmer_fraction, k, bbduk_suffix, !single_end) | ||
if (params.ont) { | ||
ribo_path = "s3://nao-mgs-simon/ont-indices/2024-12-14/minimap2-ribo-index/ribo-ref-concat-unique.mmi" | ||
mapped_ch = MINIMAP2_RIBO(reads_ch, ribo_path, ribo_suffix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonleandergrimm I'd still like this to be addressed before we move forward here, I remain uneasy about switching to an alignment-based approach without having a good understanding of how this might affect the results.
This PR makes PROFILE take in ONT data. To achieve this, I added a second samtools process, and ribo identification using minimap2. Again, @harmonbhasin I'm using my own ribo reference for minimap2. Let me know if I should add the creation of said reference to the index workflow.