refactor/Samplefile channel + functions #2

FelixAntoineLeSieur · 2024-07-03T19:13:17Z

This branch is based on the first PR: InitialSchemas. Only review/merge after that PR is merged.

Adds the first function encountered in the main pipeline cqdg-denovo: sampleChannel, and the required local functions.
This will create the initial input channel required for the rest of the pipeline.

I also added a few tests to verify the consistency of sampleFileFormat and sequencingType.

LysianeBouchard · 2024-07-09T02:55:09Z

CHANGELOG.md

@@ -8,8 +8,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 Initial release of ferlab/postprocessing, created with the [nf-core](https://nf-co.re/) template.

 ### `Added`
+[#2]https://github.com/FelixAntoineLeSieur/Post-processing-Pipeline/pull/2 Added tests and samplefile channel functions


Je crois qu'il faudra corriger le petit typo ici aussi.

LysianeBouchard · 2024-07-09T03:13:31Z

assets/schema_input.json

            }
-        },
-        "required": ["sample", "fastq_1"]
+        }


Merci :) 🧹 ✨

LysianeBouchard · 2024-07-09T04:25:22Z

subworkflows/local/utils_nfcore_postprocessing_pipeline/main.nf

@@ -77,9 +77,32 @@ workflow PIPELINE_INITIALISATION {
    //
    validateInputParameters()



Je crois que ce serait plus propre de mettre la logique de parsing du sample sheet dans un subworfklow séparé et de l'invoquer ici. J'ai vu un example de ça dans SAREK: https://github.com/nf-core/sarek/blob/master/subworkflows/local/utils_nfcore_sarek_pipeline/main.nf#L133

Bien sûr, je crois qu'il faudra aussi épurer un peu ce fichier et enlever ce qu'on n'utilise pas.

Oui je crois que je vais enlever tout de suite les fonctions qu'on ne va pas utiliser. Aussi pour SAREK je suis allé voir, je crois qu'ils ont fait leur propre subworkflow parce qu'ils acceptent beaucoup d'extensions/formats différents. Dans notre cas, le parse n'est pas très long à faire donc je ne sais pas si ça vaut la peine d'avoir un subworkflow juste pour ça...

LysianeBouchard · 2024-07-09T04:37:23Z

subworkflows/local/utils_nfcore_postprocessing_pipeline/main.nf

+        versions = ch_versions
+
+    sampleMeta | view{"Meta: $it"}
+    sampleFiles | view{"files: $it"}


Les view c'est pour débugger?

LysianeBouchard · 2024-07-09T04:38:26Z

subworkflows/local/utils_nfcore_postprocessing_pipeline/main.nf

@@ -103,6 +126,7 @@ workflow PIPELINE_INITIALISATION {
    emit:


Je préfère en fait ce style pour émettre les channels d'output. Je trouve que c'est plus facilement visible. Mais bon, c'est subjectif.

J'essayais de suivre le plus possible ce qu'on avait déjà écrit, mais je suis d'accord.

LysianeBouchard · 2024-07-09T04:44:14Z

subworkflows/local/utils_nfcore_postprocessing_pipeline/main.nf

+
+//Transform a row from the sample file in V1 format from a list structure to a map structure.
+def rowMapperV1(columns, sequencingType) {
+    if ((columns[1] == "WGS") || (columns[1] == "WES")){


C'est super pour le message. Est-ce que c'est possible cependant que l'équivalent va être fait avec un tool de validation de nf-core?

Je crois que tu pourrais écrire l'expression dans le if de cette façon: ["WGS", "WES"].contains(columns[1]) .

Si on garde l'enum, je crois que ce serait aussi plus clean de s'en servir au lieu d'utiliser des string hard codés. C'est possible de définir une fonction à l'intérieur de l'enum qui fait ça:

enum SequencingType { WES, WGS public static boolean contains(String s) { for(SequencingType sequencingType in SequencingType.values()){ if(sequencingType.name().equals(s)){ return true } } return false } }

Ensuite, tu pourrais remplacer l'expression dans le if par: SequencingType.contains(columns[1])

Commentaires similaires applicables à rowMapperV2.

ok c'est ajouté

LysianeBouchard · 2024-07-09T05:02:07Z

subworkflows/local/utils_nfcore_postprocessing_pipeline/main.nf

+enum SampleFileFormat {
+    V1,
+    V2
+}


Je me demande si on ne devrait pas utiliser juste des constantes de type string au lieu des enums? C'est bien l'enum mais c'est un concept quand même avancé qui peut peut-être rendre le code plus difficile à comprendre pour des bioinformaticiens. Je crois que ça rendrait le code aussi pas mal plus concis.

Il y a des avantages intéressants avec l'enum, mais ce n'est pas absolument essentiel et peut-être que d'autres outils nf-core couvrent déjà ça.

On peut le faire dans un deuxième temps bien sûr. On a déjà un bon chantier ...

Petit hint pour plus tard, il faudra probablement que tu importes les enum dans le subworkflow qui va exécuter la logique de post processing.

LysianeBouchard · 2024-07-09T05:06:52Z

subworkflows/local/utils_nfcore_postprocessing_pipeline/main.nf

+    }
+    return enumInstance.valueOf(paramValue)
+}
+def getSampleFileFormat() {


Je crois que le block if ici n'est pas nécessaire, car on définit une valeur par défaut dans nextflow.config. Si tu veux tu peux l'enlever. Quand j'ai écrit ce code, je ne savais pas que le fichier nextflow.config est toujours applicable même s'il n'est pas spécifié à la ligne de commande.

Idem pour getSequencingType().

J'ai enlevé cette fonction finalement, on a seulement 1 enum (sequencingType) donc pas besoin de réutiliser ce code

LysianeBouchard · 2024-07-09T05:10:01Z

subworkflows/local/utils_nfcore_postprocessing_pipeline/main.nf

+--sampleFileFormat V2 as needed")
+        exit(0)
+    }
+    print(columns[1])


C'est peut-être un artéfact de debugging ça?

LysianeBouchard · 2024-07-09T05:12:24Z

subworkflows/local/utils_nfcore_postprocessing_pipeline/main.nf

+
+//Transform a row from the sample file in V2 format from a list structure to a map structure
+def rowMapperV2(columns) {
+    def sampleSeqType = columns[1]


Je crois qu'ici tu pourrais utiliser columns[1].toUpperCase() pour supporter le sequencing type passé en lettres minuscules.
Peut-être que tu voudrais utiliser une logique similaire pour rowMapperV1.

ok c'est fait

LysianeBouchard · 2024-07-09T05:52:21Z

subworkflows/local/utils_nfcore_postprocessing_pipeline/main.nf

+--sampleFileFormat V1 as needed")
+        exit(0)
+    }
+    if (sampleSeqType != params.sequencingType){


Je ne comprends pas bien pourquoi on fait ça ... Il me semble qu'il n'y a pas d'utilité au format V2 si on est forcé d'utiliser toujours la même valeur que le paramètre sequencingType.
Je crois que c'est peut-être même possible peut-être même sur qlin d'avoir, dans une batch, à la fois WES et WGS.

J'ai enlevé cette partie dans V2, maintenant on accepte la présence de WGS ou WES, même s'il y a plusieurs types parmis les sample. Je ne sais juste pas encore on va séparer Hard-Filtering vs VQSR de cette façon par contre... à discuter

I also added a few tests to validate the sampleFileFormat and sequencingType

FelixAntoineLeSieur

review #2

FelixAntoineLeSieur · 2024-07-10T14:20:44Z

subworkflows/local/utils_nfcore_postprocessing_pipeline/main.nf

@@ -103,6 +126,7 @@ workflow PIPELINE_INITIALISATION {
    emit:


J'essayais de suivre le plus possible ce qu'on avait déjà écrit, mais je suis d'accord.

FelixAntoineLeSieur · 2024-07-10T17:47:23Z

subworkflows/local/utils_nfcore_postprocessing_pipeline/main.nf

+
+//Transform a row from the sample file in V1 format from a list structure to a map structure.
+def rowMapperV1(columns, sequencingType) {
+    if ((columns[1] == "WGS") || (columns[1] == "WES")){


ok c'est ajouté

FelixAntoineLeSieur · 2024-07-10T17:51:01Z

subworkflows/local/utils_nfcore_postprocessing_pipeline/main.nf

+    }
+    return enumInstance.valueOf(paramValue)
+}
+def getSampleFileFormat() {


J'ai enlevé cette fonction finalement, on a seulement 1 enum (sequencingType) donc pas besoin de réutiliser ce code

FelixAntoineLeSieur · 2024-07-10T18:01:14Z

subworkflows/local/utils_nfcore_postprocessing_pipeline/main.nf

+
+//Transform a row from the sample file in V2 format from a list structure to a map structure
+def rowMapperV2(columns) {
+    def sampleSeqType = columns[1]


ok c'est fait

FelixAntoineLeSieur · 2024-07-10T18:04:40Z

subworkflows/local/utils_nfcore_postprocessing_pipeline/main.nf

+--sampleFileFormat V1 as needed")
+        exit(0)
+    }
+    if (sampleSeqType != params.sequencingType){


J'ai enlevé cette partie dans V2, maintenant on accepte la présence de WGS ou WES, même s'il y a plusieurs types parmis les sample. Je ne sais juste pas encore on va séparer Hard-Filtering vs VQSR de cette façon par contre... à discuter

FelixAntoineLeSieur self-assigned this Jul 3, 2024

FelixAntoineLeSieur added the #2 Second PR in line, review after #1 is merged label Jul 5, 2024

LysianeBouchard reviewed Jul 9, 2024

View reviewed changes

assets/schema_input.json Outdated

}

},

"required": ["sample", "fastq_1"]

}

Copy link

Contributor

LysianeBouchard Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merci :) 🧹 ✨

LysianeBouchard reviewed Jul 9, 2024

View reviewed changes

FelixAntoineLeSieur changed the title ~~Samplefile channel + functions~~ refactor/Samplefile channel + functions Jul 10, 2024

FelixAntoineLeSieur added 3 commits July 10, 2024 10:04

Adds the functions needed for the creation of the sample channel.

9832e45

I also added a few tests to validate the sampleFileFormat and sequencingType

Changelog + small format change

d77de69

Removed errors concerning test data and emit on multiple channels

35a672c

FelixAntoineLeSieur commented Jul 10, 2024

View reviewed changes

Refactor: Rework sample channel creation

6063676

FelixAntoineLeSieur force-pushed the SampleChannelFunctions branch 2 times, most recently from 6063676 to d77de69 Compare July 10, 2024 18:41

LysianeBouchard approved these changes Jul 11, 2024

View reviewed changes

FelixAntoineLeSieur merged commit 2c25cd4 into main Jul 11, 2024

FelixAntoineLeSieur deleted the SampleChannelFunctions branch July 11, 2024 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor/Samplefile channel + functions #2

refactor/Samplefile channel + functions #2

FelixAntoineLeSieur commented Jul 3, 2024

LysianeBouchard Jul 9, 2024

LysianeBouchard Jul 9, 2024

LysianeBouchard Jul 9, 2024

LysianeBouchard Jul 9, 2024 •

edited

Loading

FelixAntoineLeSieur Jul 10, 2024

LysianeBouchard Jul 9, 2024

LysianeBouchard Jul 9, 2024

FelixAntoineLeSieur Jul 10, 2024

LysianeBouchard Jul 9, 2024

LysianeBouchard Jul 9, 2024 •

edited

Loading

LysianeBouchard Jul 9, 2024

FelixAntoineLeSieur Jul 10, 2024

LysianeBouchard Jul 9, 2024 •

edited

Loading

LysianeBouchard Jul 9, 2024

LysianeBouchard Jul 9, 2024

FelixAntoineLeSieur Jul 10, 2024

LysianeBouchard Jul 9, 2024

LysianeBouchard Jul 9, 2024 •

edited

Loading

FelixAntoineLeSieur Jul 10, 2024

LysianeBouchard Jul 9, 2024

FelixAntoineLeSieur Jul 10, 2024

FelixAntoineLeSieur left a comment

FelixAntoineLeSieur Jul 10, 2024

FelixAntoineLeSieur Jul 10, 2024

FelixAntoineLeSieur Jul 10, 2024

FelixAntoineLeSieur Jul 10, 2024

FelixAntoineLeSieur Jul 10, 2024

		@@ -77,9 +77,32 @@ workflow PIPELINE_INITIALISATION {
		//
		validateInputParameters()

refactor/Samplefile channel + functions #2

refactor/Samplefile channel + functions #2

Conversation

FelixAntoineLeSieur commented Jul 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LysianeBouchard Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LysianeBouchard Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LysianeBouchard Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LysianeBouchard Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FelixAntoineLeSieur left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LysianeBouchard Jul 9, 2024 •

edited

Loading

LysianeBouchard Jul 9, 2024 •

edited

Loading

LysianeBouchard Jul 9, 2024 •

edited

Loading

LysianeBouchard Jul 9, 2024 •

edited

Loading