added sample irida_next sample field option #140

mattheww95 · 2024-10-24T19:27:11Z

Added support for the irida_next sample id.

github-actions · 2024-10-24T19:28:36Z

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 899e35b

+| ✅ 229 tests passed       |+
#| ❔  32 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

files_exist - File not found: conf/igenomes_ignored.config
nextflow_config - nf-validation has been detected in the pipeline. Please migrate to nf-schema: https://nextflow-io.github.io/nf-schema/latest/migration_guide/
readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
schema_lint - Schema $id should be https://raw.githubusercontent.com/phac-nml/mikrokondo/master/nextflow_schema.json
Found https://raw.githubusercontent.com/phac-nml/mikrokondo/main/nextflow_schema.json

❔ Tests ignored:

files_exist - File is ignored: CODE_OF_CONDUCT.md
files_exist - File is ignored: assets/nf-core-mikrokondo_logo_light.png
files_exist - File is ignored: docs/images/nf-core-mikrokondo_logo_light.png
files_exist - File is ignored: docs/images/nf-core-mikrokondo_logo_dark.png
files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
files_exist - File is ignored: .github/workflows/awstest.yml
files_exist - File is ignored: .github/workflows/awsfulltest.yml
files_exist - File is ignored: docs/output.md
files_exist - File is ignored: docs/README.md
files_exist - File is ignored: docs/usage.md
nextflow_config - Config variable ignored: manifest.name
nextflow_config - Config variable ignored: manifest.homePage
nextflow_config - Config variable ignored: params.max_cpus
files_unchanged - File does not exist: CODE_OF_CONDUCT.md
files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
files_unchanged - File does not exist: .github/ISSUE_TEMPLATE/config.yml
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/feature_request.yml
files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
files_unchanged - File ignored due to lint config: assets/email_template.html
files_unchanged - File ignored due to lint config: assets/email_template.txt
files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
files_unchanged - File does not exist: assets/nf-core-mikrokondo_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-mikrokondo_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-mikrokondo_logo_dark.png
files_unchanged - File does not exist: docs/README.md
files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/mikrokondo/mikrokondo/.github/workflows/awstest.yml
multiqc_config - multiqc_config

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-mikrokondo_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowMikrokondo.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Found nf-validation plugin
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version does not contain dev for release: 0.4.2
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.platform= illumina
nextflow_config - Config default value correct: params.long_read_opt= nanopore
nextflow_config - Config default value correct: params.override_allele_scheme=
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.max_cpus= 16
nextflow_config - Config default value correct: params.max_memory= 2000.GB
nextflow_config - Config default value correct: params.max_time= 240.h
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.tracedir= null/pipeline_info
nextflow_config - Config default value correct: params.stage_in_mode= symlink
nextflow_config - Config default value correct: params.validationS3PathCheck= true
nextflow_config - Config default value correct: params.skip_read_merging= true
nextflow_config - Config default value correct: params.skip_bakta= true
nextflow_config - Config default value correct: params.skip_ont_header_cleaning= true
nextflow_config - Config default value correct: params.ec_opid= 90
nextflow_config - Config default value correct: params.ec_opcov= 90
nextflow_config - Config default value correct: params.ec_hpid= 95
nextflow_config - Config default value correct: params.ec_hpcov= 50
nextflow_config - Config default value correct: params.ec_enable_verification= true
nextflow_config - Config default value correct: params.sr_full_cgmlst= true
nextflow_config - Config default value correct: params.fp_average_quality= 25
nextflow_config - Config default value correct: params.fp_cut_tail_mean_quality= 15
nextflow_config - Config default value correct: params.fp_cut_tail_window_size= 4
nextflow_config - Config default value correct: params.fp_complexity_threshold= 20
nextflow_config - Config default value correct: params.fp_qualified_phred= 15
nextflow_config - Config default value correct: params.fp_unqualified_percent_limit= 40
nextflow_config - Config default value correct: params.fp_polyg_min_len= 10
nextflow_config - Config default value correct: params.fp_polyx_min_len= 10
nextflow_config - Config default value correct: params.fp_illumina_length_min= 35
nextflow_config - Config default value correct: params.fp_illumina_length_max= 400
nextflow_config - Config default value correct: params.fp_single_end_length_min= 1000
nextflow_config - Config default value correct: params.lx_min_evalue= 0.0001
nextflow_config - Config default value correct: params.lx_min_dna_len= 1
nextflow_config - Config default value correct: params.lx_min_aa_len= 1
nextflow_config - Config default value correct: params.lx_max_dna_len= 10000000
nextflow_config - Config default value correct: params.lx_max_aa_len= 10000000
nextflow_config - Config default value correct: params.lx_min_dna_ident= 80.0
nextflow_config - Config default value correct: params.lx_min_aa_ident= 80.0
nextflow_config - Config default value correct: params.lx_min_dna_match_cov= 80.0
nextflow_config - Config default value correct: params.lx_min_aa_match_cov= 80.0
nextflow_config - Config default value correct: params.lx_max_target_seqs= 10
nextflow_config - Config default value correct: params.lx_extraction_mode= raw
nextflow_config - Config default value correct: params.lx_report_mode= normal
nextflow_config - Config default value correct: params.lx_report_prop= locus_name
nextflow_config - Config default value correct: params.lx_report_max_ambig= 0
nextflow_config - Config default value correct: params.lx_report_max_stop= 0
nextflow_config - Config default value correct: params.target_depth= 100
nextflow_config - Config default value correct: params.min_reads= 1000
nextflow_config - Config default value correct: params.ba_min_contig_length= 200
nextflow_config - Config default value correct: params.qt_min_contig_length= 1000
nextflow_config - Config default value correct: params.mh_min_kmer= 10
nextflow_config - Config default value correct: params.flye_read_type= hq
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
pipeline_todos - No TODO strings found
plugin_includes - No wrong validation plugin imports have been found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (0 files)
schema_lint - Schema lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: doc.yml
actions_schema_validation - Workflow validation passed: spellcheck.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
base_config - CUSTOM_DUMPSOFTWAREVERSIONS found in conf/base.config and Nextflow scripts.
modules_config - conf/modules.config found and not ignored.
modules_config - PUBLISH_FINAL_READS found in conf/modules.config and Nextflow scripts.
modules_config - PUBLISH_FINAL_ASSEMBLIES found in conf/modules.config and Nextflow scripts.
modules_config - LOCIDEX_EXTRACT found in conf/modules.config and Nextflow scripts.
modules_config - LOCIDEX_SEARCH found in conf/modules.config and Nextflow scripts.
modules_config - LOCIDEX_REPORT found in conf/modules.config and Nextflow scripts.
modules_config - REPORT found in conf/modules.config and Nextflow scripts.
modules_config - IDENTIFY_POINTDB found in conf/modules.config and Nextflow scripts.
modules_config - LOCIDEX_SELECT found in conf/modules.config and Nextflow scripts.
modules_config - LOCIDEX_SUMMARIZE found in conf/modules.config and Nextflow scripts.
modules_config - REPORT_AGGREGATE found in conf/modules.config and Nextflow scripts.
modules_config - BIN_KRAKEN2 found in conf/modules.config and Nextflow scripts.
modules_config - COMBINE_DATA found in conf/modules.config and Nextflow scripts.
modules_config - GZIP_FILES found in conf/modules.config and Nextflow scripts.
modules_config - CHECK_ONT found in conf/modules.config and Nextflow scripts.
modules_config - PARSE_MASH found in conf/modules.config and Nextflow scripts.
modules_config - PARSE_KRAKEN found in conf/modules.config and Nextflow scripts.
modules_config - READ_SCAN found in conf/modules.config and Nextflow scripts.
modules_config - PARSE_FASTP found in conf/modules.config and Nextflow scripts.
modules_config - FASTQC found in conf/modules.config and Nextflow scripts.
modules_config - SEQKIT_STATS found in conf/modules.config and Nextflow scripts.
modules_config - SEQKIT_FILTER found in conf/modules.config and Nextflow scripts.
modules_config - SEQTK_SAMPLE found in conf/modules.config and Nextflow scripts.
modules_config - RASUSA found in conf/modules.config and Nextflow scripts.
modules_config - SEQTK_SIZE found in conf/modules.config and Nextflow scripts.
modules_config - QUAST found in conf/modules.config and Nextflow scripts.
modules_config - CHECKM_LINEAGEWF found in conf/modules.config and Nextflow scripts.
modules_config - BANDAGE_IMAGE found in conf/modules.config and Nextflow scripts.
modules_config - KRAKEN found in conf/modules.config and Nextflow scripts.
modules_config - MASH_ESTIMATE found in conf/modules.config and Nextflow scripts.
modules_config - MLST found in conf/modules.config and Nextflow scripts.
modules_config - STARAMR found in conf/modules.config and Nextflow scripts.
modules_config - MOBSUITE_RECON found in conf/modules.config and Nextflow scripts.
modules_config - MASH_SKETCH found in conf/modules.config and Nextflow scripts.
modules_config - MASH_PASTE found in conf/modules.config and Nextflow scripts.
modules_config - MASH_SCREEN found in conf/modules.config and Nextflow scripts.
modules_config - REMOVE_CONTAMINANTS found in conf/modules.config and Nextflow scripts.
modules_config - FLYE_ASSEMBLE found in conf/modules.config and Nextflow scripts.
modules_config - SPADES_ASSEMBLE found in conf/modules.config and Nextflow scripts.
modules_config - UNICYCLER_ASSEMBLE found in conf/modules.config and Nextflow scripts.
modules_config - FASTP_TRIM found in conf/modules.config and Nextflow scripts.
modules_config - MINIMAP2_INDEX found in conf/modules.config and Nextflow scripts.
modules_config - MINIMAP2_MAP found in conf/modules.config and Nextflow scripts.
modules_config - SAM_TO_BAM found in conf/modules.config and Nextflow scripts.
modules_config - RACON_POLISH found in conf/modules.config and Nextflow scripts.
modules_config - PILON_ITER found in conf/modules.config and Nextflow scripts.
modules_config - MEDAKA_POLISH found in conf/modules.config and Nextflow scripts.
modules_config - BAKTA_DB_DOWNLOAD found in conf/modules.config and Nextflow scripts.
modules_config - BAKTA_ANNOTATE found in conf/modules.config and Nextflow scripts.
modules_config - ABRICATE found in conf/modules.config and Nextflow scripts.
modules_config - ECTYPER found in conf/modules.config and Nextflow scripts.
modules_config - KLEBORATE found in conf/modules.config and Nextflow scripts.
modules_config - SPATYPER found in conf/modules.config and Nextflow scripts.
modules_config - SISTR found in conf/modules.config and Nextflow scripts.
modules_config - LISSERO found in conf/modules.config and Nextflow scripts.
modules_config - SHIGEIFINDER found in conf/modules.config and Nextflow scripts.
modules_config - CUSTOM_DUMPSOFTWAREVERSIONS found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.0.2
version_consistency - Version tags are numeric and consistent between container, release tag and config.
included_configs - Pipeline config includes custom configs.

Run details

nf-core/tools version 3.0.2
Run at 2024-11-12 20:17:26

…idanext config

mattheww95 · 2024-10-28T22:05:05Z

If these tests pass, a sample with the name .iridanext_output. should be passed as a sample name to verify it is valid and data passes through.

kylacochrane

Great work Matthew 😸
I don’t have any specific comments - this sample_name solution looks solid to me. I tried adding a helper function to simplify the inx_string_suffix extraction logic in updated_samples within main.nf, but it ended up making things more complicated than expected, haha!

kylacochrane · 2024-10-30T17:52:16Z

tests/pipelines/main.from_assemblies.nf.test

@@ -796,4 +796,66 @@ nextflow_pipeline {
        }
    }

+        test("Test Stupid Name in Input Sheet") {
+        tag "from_assemblies_stupidnames"
+


😆 Great test name

apetkau

This looks great Matthew. Thanks so much for your work on including sample names 😄

I have a few suggestions and comments for you (given in-line below).

apetkau · 2024-10-31T16:29:53Z

assets/schema_input.json

@@ -47,6 +54,6 @@
                "unique": true
            }
        },
-        "required": ["sample"]
+        "required": ["sample_name"]


The key sample should still be required as in IRIDA Next it contains the IRIDA Next identifiers. The key sample_name should be optional.

ahh, I see sorry I misunderstood what was being requested

fixed in: 6cb6d8c

Additional commits were related to updating test file column names.

sample is required, but you don't need to swap the item properties as in 6cb6d8c it was correct before.

Quick Summary
sample is the IRIDA-ID column of the samplesheet so it has to be unique, and is required. This is why we named the meta meta.irida_id (or in your case meta.external_id).

sample_name is an optional column to simply rename file-outputs, or use in results so that the user can interperate them better. This is why @kylacochrane introduced this at the start of the workflow (I did it for all other pipelines too) which is basically:

if (!meta.id) { meta.id = meta.irida_id

This means it if it is run locally it'll default to the old "just use sample column" and not break anything.

Thanks for the comment Steven. This is correct. Could you swap back sample and sample_name Matthew? But make it so that sample is the one that is required?

Can you clarify what you mean by swap back sample and sample_name? Do you mean just within the schema_input.json or for the whole pipeline?

Yes exactly! Just for the schema_input.json

all is made well in this commit: 71260a9

apetkau · 2024-10-31T16:33:20Z

assets/schema_input.json

+            },
+            "sample_name": {
+                "type": "string",
+                "pattern": "^[^\\.]\\S+$",


I think you should remove any restrictions on sample_name and instead follow a similar pattern as this block of code to replace any restricted characters with underscores _:

https://github.com/phac-nml/snvphylnfc/blob/f1e5fae76af276acf0a8c98174978cb21ca5d7e0/workflows/snvphylnfc.nf#L98-L109

The reason being that restricting sample_name means that mikrokondo will fail to run sample names don't match the above pattern (and spaces and periods are allowed in sample names in IRIDA Next). Allowing all patterns through but cleaning them up in the workflow code means that mikrokondo will still run for samples with any name.

I added this check here to verify that only the sample name does not start with a period as there are issues with nf-prov later on when it aggregates files for the providence reports so here was my intention.

But with your clarification above about sample_name vs sample I think I can revert this cahnge.

fixed here: 71260a9

apetkau · 2024-10-31T16:48:30Z

bin/report_summaries.py

    default_samp_suffix = "_flat_sample.json"
    parser = argparse.ArgumentParser("Table Summary")
    parser.add_argument("-f", "--file-in", help="Path to the mikrokondo json summary")
    parser.add_argument("-s", "--sample-tag", help="Optional suffix and extension to name output samples.", default=default_samp_suffix)
    parser.add_argument("-o", "--out-file", help="output name plus the .tsv extension e.g. prefix.tsv")
+    parser.add_argument("-x", "--inx-id-token", help="A token to insert into the flattened json file names for separation of the irida next sample id.")


I am curious if, rather than inserting the IRIDA Next id into the flattened JSON file names, it can be used to create a folder with contents being the flattened JSON report? That way, you don't have to worry about inserting tokens into a filename and then parsing them out from the string later on. You can then also insert the sample name as part of the flattened report file name.

That is name output files like:

FlattenedReports/IRIDA_NEXT_ID/SAMPLE_NAME.flat_sample.json

Then, you can iterate over all subdirectories in FlattenedReports/, and parse the IRIDA Next identifier from the sub-directory. That is in:

mikrokondo/main.nf

Lines 113 to 119 in c036fb5

def inx_string_suffix = params.report_aggregate.inx_string_insertion

def name_trim = sample.getName()

def trimmed_name = name_trim.substring(0, name_trim.length() - params.report_aggregate.sample_flat_suffix.length())

def output_map = [

"id": trimmed_name,

"sample": trimmed_name,

"external_id": trimmed_name]

Pull out the IRIDA Next id (external_id) from the directory name instead of from the file name.

good idea!! I think I will do that, that is much cleaner.

fixed in: 39c8505

The output directory structure did not change, just the outputs from the script are structured.

…o be updated

sgsutcliffe · 2024-11-01T19:57:21Z

subworkflows/local/input_check.nf

@@ -20,7 +22,6 @@ workflow INPUT_CHECK {
            meta -> tuple(meta.id[0], meta[0])


I wonder if we should use the meta.external_id for checking for the when there are reads that need to be combined. As we created the sample_name column to allow for repeat values. meta.id is used to here to find reads to be merged.

grouped_tuples = reads_in.groupTuple(by: 0).branch { it -> merge_data: it[1].size() > 1 format: true }

This is a very good question, as the only way (after reverting sample_name to sample) to merge reads now would be if the IRIDANext ID is the same. But going forward reads are only getting merged within IRIDANext now? @apetkau

After talking with Aaron, we decided to not let mikrokonod merge reads by default. We added a parameter to allow you too, but this will be a CLI feature. As in IRIDANext it is better to merge reads in the system where it is an auditable event and not something that may occur accidentally in a pipeline.
but it is fixed here: db5f420

sgsutcliffe

Following up on my comment, there needs to be a renaming of meta.id with meta.external_id when no sample_name is provided because it becomes null and then wants to group everything in the COMBINE_DATA() process. I tried using the map{} we have used in other pipelines but it wasn't working. I can give it more of a try.

What I tried doing was:

    // Track processed IDs
    def processedIDs = [] as Set

    input = Channel.fromSamplesheet("input")
    // and remove non-alphanumeric characters in sample_names (meta.id), whilst also correcting for duplicate sample_names (meta.id)
    .map { meta ->
            if (!meta.id) {
                meta.id = meta.external_id
            } else {
                // Non-alphanumeric characters (excluding _,-,.) will be replaced with "_"
                meta.id = meta.id.replaceAll(/[^A-Za-z0-9_.\-]/, '_')
            }
            // Ensure ID is unique by appending meta.external_id if needed
            while (processedIDs.contains(meta.id)) {
                meta.id = "${meta.id}_${meta.external_id}"
            }
            // Add the ID to the set of processed IDs
            processedIDs << meta.id

            tuple(meta)}.view()

in the input_check subworkflow but it tells me it cannot perform replaceAll because it is an ArrayList type.

sgsutcliffe · 2024-11-04T21:35:11Z

One last comment! I promise, and a suggestion. Could we use meta.irida_id instead of meta.external_id, that way it will be consistent with the other phac-nml nextflow pipelines.

mattheww95 · 2024-11-06T20:03:01Z

One last comment! I promise, and a suggestion. Could we use meta.irida_id instead of meta.external_id, that way it will be consistent with the other phac-nml nextflow pipelines.

Just to provide the rationale for the name.

I had used irida_id at first, but I wanted a name that was more generalized so that the purpose of the parameter was better communicated to users that may be using mikrokondo externally from the NML.

sgsutcliffe · 2024-11-08T17:01:06Z

subworkflows/local/input_check.nf

+            meta ->
+
+                // Remove any unallowed charactars in the meta.id field
+                meta[0].id = meta[0].id.replaceAll(/[^A-Za-z0-9_\-]/, '_')


You can remove this. meta.id only needs to be scrubbed of unallowed characters if sample_name is provided in the samplesheet. This relates to my next comment.

apetkau

This looks great. Thanks so much for all your work @mattheww95 . A few inline comments.

apetkau · 2024-11-07T22:15:52Z

CHANGELOG.md

+
+### `Changed`
+
+- Added a `sample_name` field, `sample` still exists but is used for different purposes [PR 140](https://github.com/phac-nml/mikrokondo/pull/140)


This should probably be under Added. Also, maybe state that sample_name is used primarily to incorporate an additional name/identifier when running the pipeline through IRIDA Next.

fixed in: 899e35b

apetkau · 2024-11-07T22:16:21Z

CHANGELOG.md

+
+- RASUSA now used for down sampling of Nanopore or PacBio data. [PR 125](https://github.com/phac-nml/mikrokondo/pull/125)
+
+- Sample names (`sample_name` field) can no longer begin with a period. [PR 125](https://github.com/phac-nml/mikrokondo/pull/125)


I think you could remove this statement since sample_name was added as a new field in this PR.

fixed in: a1c3f3e

apetkau · 2024-11-07T22:16:43Z

CHANGELOG.md


 - Added RASUSA for down sampling of Nanopore or PacBio data. [PR 125](https://github.com/phac-nml/mikrokondo/pull/125)

+- Added a new field to the `schema_input.json` file to allow for sample ID's from external systems such as IRIDA Next: [PR 140](https://github.com/phac-nml/mikrokondo/pull/140)


I think you can remove this statement and just have one statement about adding sample_name.

fixed in: a2c56a8

apetkau · 2024-11-07T22:18:11Z

assets/schema_input.json

-                "errorMessage": "Sample name must be provided and cannot contain spaces",
-                "meta": ["id"]
+                "meta": ["id"],
+                "errorMessage": "Sample name to be used in report generation. Invalid characters are replaces with underscores."


Maybe change the error message to state: Default sample identifier used by the pipeline. Also, invalid characters should not be replaced by underscores for sample, so you can remove that statement.

fixed in: a2c56a8

We made need to review this though. I was implementing what was discussed in our meeting so if anything is wrong apologies!

apetkau · 2024-11-07T22:18:58Z

assets/schema_input.json

+            },
+            "sample_name": {
+                "type": "string",
+                "errorMessage": "Optional. Used to override sample when used in tools like IRIDA-Next. Invalid characters will be replaced with underscores.",


Can you list the valid characters (e.g., valid characters include alphanumeric and . and _. All other characters will be replaced by underscores).

fixed in: b1e60dd

apetkau · 2024-11-08T17:05:58Z

subworkflows/local/input_check.nf

+            meta ->
+
+                // Remove any unallowed charactars in the meta.id field
+                meta[0].id = meta[0].id.replaceAll(/[^A-Za-z0-9_\-]/, '_')


As discussed in our call, this should be changed to be more similar to the way SNVPhyl handles this: https://github.com/phac-nml/snvphylnfc/blob/f1e5fae76af276acf0a8c98174978cb21ca5d7e0/workflows/snvphylnfc.nf#L98-L103

That is, meta.id should correspond to the sample_name by default, but if that column is empty it should be set to sample. The meta.external_id should instead correspond (when run through IRIDA Next) to the IRIDA Next identifier.

fixed in: b1e60dd

sgsutcliffe · 2024-11-08T17:08:56Z

subworkflows/local/input_check.nf

+                // Remove any unallowed charactars in the meta.id field
+                meta[0].id = meta[0].id.replaceAll(/[^A-Za-z0-9_\-]/, '_')
+
+                if (meta[0].external_id != null) {


It's sample_name (i.e. meta.id) that is optional, and contains the ability to have unallowed characters. So the if/else should be:

if (meta[0].id != null) { // remove any charactars in the external_id that should not be used meta[0].id = meta[0].id.replaceAll(/[^A-Za-z0-9_\-]/, '_') }else{ meta[0].id = meta[0].external_id }

Everything is named with meta.id but if not provided use the old-fashioned sample. Basically keep it as is for non-IRIDA users.

fixed in: b1e60dd

sgsutcliffe · 2024-11-08T17:22:44Z

subworkflows/local/input_check.nf

+                if (meta[0].external_id != null) {
+                    // remove any charactars in the external_id that should not be used
+                    meta[0].id = meta[0].external_id.replaceAll(/[^A-Za-z0-9_\-]/, '_')
+                }else{


Based on the other comments suggested, I won't need this else clause as grouping is by meta.id which if duplicated by either sample or sample_name will take place.

fixed in: b1e60dd

added sample irida_next sample field option

365f47b

mattheww95 added 6 commits October 24, 2024 17:01

identified sticking point for sample names not being passed to the ir…

7edf2aa

…idanext config

updated iridanext external name id

351a8f1

updated changelog and docs

acdb884

udpated samples sheet names

dcdce6d

updated inx id parsing

fd4ea24

updated sample sheet parsing

0d81ebf

updated tests

c036fb5

mattheww95 marked this pull request as ready for review October 29, 2024 18:42

mattheww95 requested review from apetkau, kylacochrane and sgsutcliffe October 29, 2024 18:42

kylacochrane approved these changes Oct 31, 2024

View reviewed changes

apetkau requested changes Oct 31, 2024

View reviewed changes

mattheww95 added 6 commits October 31, 2024 13:08

updating commits for feedback

6cb6d8c

updated samplesheets

db34308

updated sample sheet name

0c6e6d1

updated external_id parsing, tests will fail as path locations need t…

d1e5609

…o be updated

updated output of flattened sample reports

a48fb95

fixed erroneous comment

39c8505

mattheww95 requested a review from apetkau November 1, 2024 19:27

sgsutcliffe reviewed Nov 1, 2024

View reviewed changes

sgsutcliffe requested changes Nov 4, 2024

View reviewed changes

mattheww95 added 4 commits November 5, 2024 16:10

updated sample field orders

14653fb

updated logic for renaming sample id

9c0bad4

updated sample parsing

c8827fe

updated docs, changelog and nextflow_schema.json

db5f420

mattheww95 added 8 commits November 6, 2024 15:06

updated test cases

52af4a9

updating inputcheck tests

eb75969

added missing files

45ce5a2

updated tests

eec62b3

fixed failing tests

733db44

updated tests

71260a9

fixed my own mistakes

3c4e1c4

fixed failing test

738943b

mattheww95 requested a review from sgsutcliffe November 7, 2024 21:34

mattheww95 mentioned this pull request Nov 7, 2024

COMBINE_DATA() fails to use symbolic link to staged fastq files #141

Open

sgsutcliffe reviewed Nov 8, 2024

View reviewed changes

apetkau requested changes Nov 8, 2024

View reviewed changes

sgsutcliffe reviewed Nov 8, 2024

View reviewed changes

mattheww95 added 6 commits November 8, 2024 12:05

swapped external_id and id

b1e60dd

updating information before the weekend

70d0291

fixed stupid name issue report keys not found

6ba57b0

fixed failig test case

a2c56a8

updated changelog

a1c3f3e

updated changelog

899e35b

mattheww95 requested review from sgsutcliffe and apetkau November 12, 2024 20:16

	def inx_string_suffix = params.report_aggregate.inx_string_insertion
	def name_trim = sample.getName()
	def trimmed_name = name_trim.substring(0, name_trim.length() - params.report_aggregate.sample_flat_suffix.length())
	def output_map = [
	"id": trimmed_name,
	"sample": trimmed_name,
	"external_id": trimmed_name]

		@@ -20,7 +22,6 @@ workflow INPUT_CHECK {
		meta -> tuple(meta.id[0], meta[0])


		### `Changed`

		- Added a `sample_name` field, `sample` still exists but is used for different purposes [PR 140](https://github.com/phac-nml/mikrokondo/pull/140)


		- RASUSA now used for down sampling of Nanopore or PacBio data. [PR 125](https://github.com/phac-nml/mikrokondo/pull/125)

		- Sample names (`sample_name` field) can no longer begin with a period. [PR 125](https://github.com/phac-nml/mikrokondo/pull/125)


		- Added RASUSA for down sampling of Nanopore or PacBio data. [PR 125](https://github.com/phac-nml/mikrokondo/pull/125)

		- Added a new field to the `schema_input.json` file to allow for sample ID's from external systems such as IRIDA Next: [PR 140](https://github.com/phac-nml/mikrokondo/pull/140)

added sample irida_next sample field option #140

Are you sure you want to change the base?

added sample irida_next sample field option #140

Conversation

mattheww95 commented Oct 24, 2024

github-actions bot commented Oct 24, 2024 • edited Loading

nf-core pipelines lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

mattheww95 commented Oct 28, 2024

kylacochrane left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apetkau left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattheww95 Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgsutcliffe left a comment

Choose a reason for hiding this comment

sgsutcliffe commented Nov 4, 2024

mattheww95 commented Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

apetkau left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 24, 2024 •

edited

Loading

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

mattheww95 Nov 1, 2024 •

edited

Loading

mattheww95 commented Nov 6, 2024 •

edited

Loading