From 17c23ef597915a49b6ea671a6455d06c18fa0728 Mon Sep 17 00:00:00 2001 From: Matiss Date: Tue, 2 Apr 2024 16:07:41 +0100 Subject: [PATCH] Update output.md readme update --- docs/output.md | 90 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 87 insertions(+), 3 deletions(-) diff --git a/docs/output.md b/docs/output.md index 0ee8aac9..b000e076 100755 --- a/docs/output.md +++ b/docs/output.md @@ -9,9 +9,31 @@ This document describes the output produced by the pipeline. # Pipeline overview The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: +The overall results folder will look simmillar to this: + +![Screenshot 2024-04-02 at 15 08 13](https://github.com/wtsi-hgi/yascp/assets/22347136/12cc3575-8772-43ee-b64d-bb396e10ba82) + +Where we have outputs from different steps of pipeline: +* [cellsnp](#cellsnp) +* [celltype identification](#celltype-identification) +* citeseq data processing +* [clustering and integration](#integration-and-clustering) +* [sample deconvolution](#vireo) +* [doublet detection](#doublet-detection) +* [genotype match](#vireo) to determine sample matches +* handover folder where summary statistics and plots are stored +* infered genotypes - output from vireo that has generated vcf files for each of the deconvoluted donors in pool. +* merged_h5ads - different preprocessing step merged h5ads (these allow to start the pipeline again in a clustering only mode) +* [nf-preprocessing](#ambient-rna-removal) - contains cellbender results +* pipeline info - statistics of the pipeline run. +* plots - some quality control plots. +* recourses - reference genome used in data processing. +* UMAPS - summary plot UMAPS - for a quick look. + +Each of these steps and the outputs produced are decribed more in detail bellow: ## Alignment step -#### [Cellranger](#Cellranger) - Curently users have to run Cellranger (6.11) upstream of pipeline, but an option to run it will be added shortly +#### [Cellranger](#Cellranger) - Curently users have to run Cellranger upstream of pipeline - we suggest to use the [no-cores pipeline](https://nf-co.re/scrnaseq/2.5.1) - https://nf-co.re/scrnaseq/2.5.1 ## Ambient RNA removal #### [Ambient RNA Removal using Cellbender](#Cellbender) - Reads the Cellranger outputs and removes the ambient RNA using [Cellbender](https://github.com/broadinstitute/CellBender) @@ -33,6 +55,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d #### [Genotype processing](#Genotype_processing) - If users provide the genotypes this step slices and dices the genotypes to prepeare these for the CellSNP/Vireo deconvolutions and GT matches #### [Donor Deconvolution using CellSnp/Vireo](#CellSnp/Vireo) - We run cellsnp and vireo to deconvolute donors if the input file has indicated that there are more than 1 donors in the pool. +#### Cellsnp
Cellsnp Output files: @@ -43,18 +66,79 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
Vireo Output files: +#### Vireo * Vireo takes the cellsnp variant pileups and assigns donors the particular cell to the donor cluster: * ![Vireo output structure](../assets/images/Vireo_outputs.png)
+#### Doublet Detection +![Screenshot 2024-04-02 at 15 43 16](https://github.com/wtsi-hgi/yascp/assets/22347136/781ce3b7-ea5e-4fe4-9ca3-d16e8b47123e)
Scrublet Output files: * By default we always run Scrublet - if we have no donors pooled in the run (i.e if we have only 1 donor), then the doublets will be removed by scrublet instead of vireo: - * ![Scrublet output structure](../assets/images/Scrublet.png)
+
+DoubletDecon Output files: + +* DoubletDecon output files contain barcode and label of whether its a singlet or a doublet: + * ![Screenshot 2024-04-02 at 15 51 26](https://github.com/wtsi-hgi/yascp/assets/22347136/603d27e1-42e3-4be7-bbfd-ebb3412b3ec4) + +
+ +
+doubletdetection Output files: + +* doubletdetection output files contain barcode and label of whether its a singlet or a doublet: + * ![Screenshot 2024-04-02 at 15 59 15](https://github.com/wtsi-hgi/yascp/assets/22347136/c798d675-c96d-4137-92c2-6fa9340437c5) + +
+ +
+DoubletFinder Output files: + +* DoubletFinder output files contain barcode and label of whether its a singlet or a doublet: + * ![Screenshot 2024-04-02 at 16 00 47](https://github.com/wtsi-hgi/yascp/assets/22347136/4cdd8ba2-5d16-4c9b-a64e-aa9423514208) + + +
+ +
+scDblFinder Output files: + +* scDblFinder output files contain barcode and label of whether its a singlet or a doublet: + * ![Screenshot 2024-04-02 at 16 01 49](https://github.com/wtsi-hgi/yascp/assets/22347136/69f7b19f-3b22-46bb-aafe-403ca3c399ae) + +
+ + +
+SCDS Output files: + +* SCDS output files contain barcode and label of whether its a singlet or a doublet: + * ![Screenshot 2024-04-02 at 16 02 23](https://github.com/wtsi-hgi/yascp/assets/22347136/b2b8ca81-449b-4a94-a1a9-2bec592e74f4) + +
+ +
+SCDS Output files: + +* SCDS output files contain barcode and label of whether its a singlet or a doublet: + * ![Screenshot 2024-04-02 at 16 03 11](https://github.com/wtsi-hgi/yascp/assets/22347136/c7d2bd3b-e1f4-4ca0-a53b-541c2f622288) + +
+ + +
+SCDS Output files: + +* SCDS output files contain barcode and label of whether its a singlet or a doublet: + * ![Screenshot 2024-04-02 at 16 03 11](https://github.com/wtsi-hgi/yascp/assets/22347136/c7d2bd3b-e1f4-4ca0-a53b-541c2f622288) + +
+ #### [Donor Deconvolution using Souporcell](#Souporcell) - Souporcell option both removes the ambioent RNA and deconvolutes the donors [currently however this option is broken and will be fixed soon] #### [GT match](#GT_match) - This step utilises the prepeared genotypes and the infered genotypes by Vireo and picks out the donor that corresponds to the right reads. @@ -227,4 +311,4 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d #### [Lisi](#Lisi) We also have a capability in running LISI cluster assesments, however curently this option does not run by default as it is memory demanding and requires some further optimisations -[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. \ No newline at end of file +[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.