From 17c23ef597915a49b6ea671a6455d06c18fa0728 Mon Sep 17 00:00:00 2001 From: Matiss Date: Tue, 2 Apr 2024 16:07:41 +0100 Subject: [PATCH 1/4] Update output.md readme update --- docs/output.md | 90 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 87 insertions(+), 3 deletions(-) diff --git a/docs/output.md b/docs/output.md index 0ee8aac9..b000e076 100755 --- a/docs/output.md +++ b/docs/output.md @@ -9,9 +9,31 @@ This document describes the output produced by the pipeline. # Pipeline overview The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: +The overall results folder will look simmillar to this: + +![Screenshot 2024-04-02 at 15 08 13](https://github.com/wtsi-hgi/yascp/assets/22347136/12cc3575-8772-43ee-b64d-bb396e10ba82) + +Where we have outputs from different steps of pipeline: +* [cellsnp](#cellsnp) +* [celltype identification](#celltype-identification) +* citeseq data processing +* [clustering and integration](#integration-and-clustering) +* [sample deconvolution](#vireo) +* [doublet detection](#doublet-detection) +* [genotype match](#vireo) to determine sample matches +* handover folder where summary statistics and plots are stored +* infered genotypes - output from vireo that has generated vcf files for each of the deconvoluted donors in pool. +* merged_h5ads - different preprocessing step merged h5ads (these allow to start the pipeline again in a clustering only mode) +* [nf-preprocessing](#ambient-rna-removal) - contains cellbender results +* pipeline info - statistics of the pipeline run. +* plots - some quality control plots. +* recourses - reference genome used in data processing. +* UMAPS - summary plot UMAPS - for a quick look. + +Each of these steps and the outputs produced are decribed more in detail bellow: ## Alignment step -#### [Cellranger](#Cellranger) - Curently users have to run Cellranger (6.11) upstream of pipeline, but an option to run it will be added shortly +#### [Cellranger](#Cellranger) - Curently users have to run Cellranger upstream of pipeline - we suggest to use the [no-cores pipeline](https://nf-co.re/scrnaseq/2.5.1) - https://nf-co.re/scrnaseq/2.5.1 ## Ambient RNA removal #### [Ambient RNA Removal using Cellbender](#Cellbender) - Reads the Cellranger outputs and removes the ambient RNA using [Cellbender](https://github.com/broadinstitute/CellBender) @@ -33,6 +55,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d #### [Genotype processing](#Genotype_processing) - If users provide the genotypes this step slices and dices the genotypes to prepeare these for the CellSNP/Vireo deconvolutions and GT matches #### [Donor Deconvolution using CellSnp/Vireo](#CellSnp/Vireo) - We run cellsnp and vireo to deconvolute donors if the input file has indicated that there are more than 1 donors in the pool. +#### Cellsnp
Cellsnp Output files: @@ -43,18 +66,79 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
Vireo Output files: +#### Vireo * Vireo takes the cellsnp variant pileups and assigns donors the particular cell to the donor cluster: * ![Vireo output structure](../assets/images/Vireo_outputs.png)
+#### Doublet Detection +![Screenshot 2024-04-02 at 15 43 16](https://github.com/wtsi-hgi/yascp/assets/22347136/781ce3b7-ea5e-4fe4-9ca3-d16e8b47123e)
Scrublet Output files: * By default we always run Scrublet - if we have no donors pooled in the run (i.e if we have only 1 donor), then the doublets will be removed by scrublet instead of vireo: - * ![Scrublet output structure](../assets/images/Scrublet.png)
+
+DoubletDecon Output files: + +* DoubletDecon output files contain barcode and label of whether its a singlet or a doublet: + * ![Screenshot 2024-04-02 at 15 51 26](https://github.com/wtsi-hgi/yascp/assets/22347136/603d27e1-42e3-4be7-bbfd-ebb3412b3ec4) + +
+ +
+doubletdetection Output files: + +* doubletdetection output files contain barcode and label of whether its a singlet or a doublet: + * ![Screenshot 2024-04-02 at 15 59 15](https://github.com/wtsi-hgi/yascp/assets/22347136/c798d675-c96d-4137-92c2-6fa9340437c5) + +
+ +
+DoubletFinder Output files: + +* DoubletFinder output files contain barcode and label of whether its a singlet or a doublet: + * ![Screenshot 2024-04-02 at 16 00 47](https://github.com/wtsi-hgi/yascp/assets/22347136/4cdd8ba2-5d16-4c9b-a64e-aa9423514208) + + +
+ +
+scDblFinder Output files: + +* scDblFinder output files contain barcode and label of whether its a singlet or a doublet: + * ![Screenshot 2024-04-02 at 16 01 49](https://github.com/wtsi-hgi/yascp/assets/22347136/69f7b19f-3b22-46bb-aafe-403ca3c399ae) + +
+ + +
+SCDS Output files: + +* SCDS output files contain barcode and label of whether its a singlet or a doublet: + * ![Screenshot 2024-04-02 at 16 02 23](https://github.com/wtsi-hgi/yascp/assets/22347136/b2b8ca81-449b-4a94-a1a9-2bec592e74f4) + +
+ +
+SCDS Output files: + +* SCDS output files contain barcode and label of whether its a singlet or a doublet: + * ![Screenshot 2024-04-02 at 16 03 11](https://github.com/wtsi-hgi/yascp/assets/22347136/c7d2bd3b-e1f4-4ca0-a53b-541c2f622288) + +
+ + +
+SCDS Output files: + +* SCDS output files contain barcode and label of whether its a singlet or a doublet: + * ![Screenshot 2024-04-02 at 16 03 11](https://github.com/wtsi-hgi/yascp/assets/22347136/c7d2bd3b-e1f4-4ca0-a53b-541c2f622288) + +
+ #### [Donor Deconvolution using Souporcell](#Souporcell) - Souporcell option both removes the ambioent RNA and deconvolutes the donors [currently however this option is broken and will be fixed soon] #### [GT match](#GT_match) - This step utilises the prepeared genotypes and the infered genotypes by Vireo and picks out the donor that corresponds to the right reads. @@ -227,4 +311,4 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d #### [Lisi](#Lisi) We also have a capability in running LISI cluster assesments, however curently this option does not run by default as it is memory demanding and requires some further optimisations -[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. \ No newline at end of file +[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. From 40444532804e351da16e017a7a6b1ba51d5c337f Mon Sep 17 00:00:00 2001 From: Matiss Date: Tue, 2 Apr 2024 17:47:11 +0100 Subject: [PATCH 2/4] Update output.md --- docs/output.md | 17 ----------------- 1 file changed, 17 deletions(-) diff --git a/docs/output.md b/docs/output.md index b000e076..510860af 100755 --- a/docs/output.md +++ b/docs/output.md @@ -122,23 +122,6 @@ Each of these steps and the outputs produced are decribed more in detail bellow:
-
-SCDS Output files: - -* SCDS output files contain barcode and label of whether its a singlet or a doublet: - * ![Screenshot 2024-04-02 at 16 03 11](https://github.com/wtsi-hgi/yascp/assets/22347136/c7d2bd3b-e1f4-4ca0-a53b-541c2f622288) - -
- - -
-SCDS Output files: - -* SCDS output files contain barcode and label of whether its a singlet or a doublet: - * ![Screenshot 2024-04-02 at 16 03 11](https://github.com/wtsi-hgi/yascp/assets/22347136/c7d2bd3b-e1f4-4ca0-a53b-541c2f622288) - -
- #### [Donor Deconvolution using Souporcell](#Souporcell) - Souporcell option both removes the ambioent RNA and deconvolutes the donors [currently however this option is broken and will be fixed soon] #### [GT match](#GT_match) - This step utilises the prepeared genotypes and the infered genotypes by Vireo and picks out the donor that corresponds to the right reads. From 4bb58e116bd1612927c9dd71aeb462457ec54d55 Mon Sep 17 00:00:00 2001 From: Matiss Date: Wed, 3 Apr 2024 16:56:59 +0100 Subject: [PATCH 3/4] Update output.md --- docs/output.md | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/docs/output.md b/docs/output.md index 510860af..5851315d 100755 --- a/docs/output.md +++ b/docs/output.md @@ -21,7 +21,7 @@ Where we have outputs from different steps of pipeline: * [sample deconvolution](#vireo) * [doublet detection](#doublet-detection) * [genotype match](#vireo) to determine sample matches -* handover folder where summary statistics and plots are stored +* [handover](#handover) folder where summary statistics, plots and final qcd and annotated h5ads per donor are stored at. * infered genotypes - output from vireo that has generated vcf files for each of the deconvoluted donors in pool. * merged_h5ads - different preprocessing step merged h5ads (these allow to start the pipeline again in a clustering only mode) * [nf-preprocessing](#ambient-rna-removal) - contains cellbender results @@ -293,5 +293,21 @@ Each of these steps and the outputs produced are decribed more in detail bellow: #### [Lisi](#Lisi) We also have a capability in running LISI cluster assesments, however curently this option does not run by default as it is memory demanding and requires some further optimisations +## [Handover](#handover): Summary Statistics, Per Donor h5ad files, Summary Plots + * ![Screenshot 2024-04-03 at 16 44 35](https://github.com/wtsi-hgi/yascp/assets/22347136/64bd3ca8-cb10-48bb-8334-f12482e4ebfd) +In this folder we can see 3 different folders: +* Donor_Quantification - where we can see the Cellranger filtered, Cellranger raw, Cellbender filtered files that are used to produce the filal per donor h5ad files and the metadata features in the per donor tsv files + * ![Screenshot 2024-04-03 at 16 47 28](https://github.com/wtsi-hgi/yascp/assets/22347136/8524243f-4bf1-4713-9076-0e1d3fcb99e1) + +* Donor_Quantification_summary folder where we have summary statistics per donor and summary statistics per tranche (collection of all pools that were run in this run). + * ![Screenshot 2024-04-03 at 16 49 59](https://github.com/wtsi-hgi/yascp/assets/22347136/a6107709-f83e-45e1-9008-1cdde1510c67) + +* Summary _plots contains the most important plots per each of the steps for a quick inversigations of the performance of the scRNA runs and the performance of the analysis. + * ![Screenshot 2024-04-03 at 16 51 35](https://github.com/wtsi-hgi/yascp/assets/22347136/7c63e2c0-6251-4a7d-8e14-be434c0e017b) + + [Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. + + + From d063ed6297196de5741647c59fe8f21214255c64 Mon Sep 17 00:00:00 2001 From: Matiss Date: Wed, 3 Apr 2024 17:12:07 +0100 Subject: [PATCH 4/4] Update output.md --- docs/output.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/docs/output.md b/docs/output.md index 5851315d..437242cb 100755 --- a/docs/output.md +++ b/docs/output.md @@ -293,8 +293,23 @@ Each of these steps and the outputs produced are decribed more in detail bellow: #### [Lisi](#Lisi) We also have a capability in running LISI cluster assesments, however curently this option does not run by default as it is memory demanding and requires some further optimisations +## [Citeseq](#citeseq): Citeseq folder will be present if your data contains citeseq +![Screenshot 2024-04-03 at 17 02 05](https://github.com/wtsi-hgi/yascp/assets/22347136/5a6dc2df-85ad-4be9-9338-12880bdb8b5c) +In this folder we have a couple of subfolders: +* DSB - folder contains DSB citeseq normalisation statistics and RDS files + * ![Screenshot 2024-04-03 at 17 04 38](https://github.com/wtsi-hgi/yascp/assets/22347136/4529980d-6afb-41dc-bf92-89e8d081299a) +* all_data_integrated - contains Seurats integration of Citeseq and if available VDJ data as well as some UMAPs produced by these processes + * ![Screenshot 2024-04-03 at 17 05 28](https://github.com/wtsi-hgi/yascp/assets/22347136/ad6c8f3b-82ca-415f-a9b0-a1242f7e90f7) +* filtered - folder contains data modalities split appart - i.e if the data is hastaged this layer is stored speratelly to the antibody data and also seperatelly to GEX data + * ![Screenshot 2024-04-03 at 17 06 32](https://github.com/wtsi-hgi/yascp/assets/22347136/b4f900f7-2c7b-4423-9a1c-5f7d166ec664) +* raw - similarly to the above, but the difference is that these are the raw cellranger files split according to the modality. + * ![Screenshot 2024-04-03 at 17 07 01](https://github.com/wtsi-hgi/yascp/assets/22347136/8882eba1-4dcb-4870-915e-4025f085bd17) + + ## [Handover](#handover): Summary Statistics, Per Donor h5ad files, Summary Plots - * ![Screenshot 2024-04-03 at 16 44 35](https://github.com/wtsi-hgi/yascp/assets/22347136/64bd3ca8-cb10-48bb-8334-f12482e4ebfd) + +![Screenshot 2024-04-03 at 16 44 35](https://github.com/wtsi-hgi/yascp/assets/22347136/64bd3ca8-cb10-48bb-8334-f12482e4ebfd) + In this folder we can see 3 different folders: * Donor_Quantification - where we can see the Cellranger filtered, Cellranger raw, Cellbender filtered files that are used to produce the filal per donor h5ad files and the metadata features in the per donor tsv files * ![Screenshot 2024-04-03 at 16 47 28](https://github.com/wtsi-hgi/yascp/assets/22347136/8524243f-4bf1-4713-9076-0e1d3fcb99e1)