Skip to content

Latest commit

 

History

History
58 lines (36 loc) · 4.73 KB

README.md

File metadata and controls

58 lines (36 loc) · 4.73 KB

SPLASH 2

GitHub downloads Bioconda downloads

Introduction

SPLASH is an unsupervised and reference-free unifying framework to discover regulated sequence variation through statistical analysis of k-mer composition in both DNA and RNA sequence. It leverages our observation that detecting sample-regulated sequence variation, such as alternative splicing, RNA editing, gene fusions, V(D)J, transposable element mobilization, allele-specific splicing, genetic variation in a population, and many other regulated events can be unified in theory and in practice. SPLASH analyzes the k-mer composition of raw sequencing reads to identify constant sequences (anchors) that are followed by sample-specific target variation and provides valid p-values (Chaung et al. 2023). SPLASH is reference-free, sidestepping the computational challenges associated with alignment, enabling fast discovery and statistical precision.

The first version of SPLASH implementedf in Python proved its usefulness. Here we provide SPLASH2, a new and improved implementation in C++ and Python (Kokot et al. 2024). This new version is much more efficient and allows for the analysis of datasets >1TB size in hours on a workstation or even a laptop.

How does it work

A key concept of SPLASH is the analysis of composition of pairs of substrings anchortarget across many samples. The substrings can be adjacent in reads or can be separated by a gap.

The image below presents the SPLASH pipeline on a high-level. image

sc-SPLASH

We have also extended the SPLASH framework (sc-SPLASH) to barcoded single-cell and spatial analysis (Dehghannasiri et al. 2024), enabling the detection of regulated sequence variation at single-cell resolution in high-throughput single-cell (10x) and spatial (Visium) transcriptomics. sc-SPLASH is integrated into the SPLASH2 pipeline and can be invoked by setting the input parameter technology = 10x for 10x scRNA-Seq analysis or technology = visium for Visium spatial analysis.

image

Compactors

Compactors is a new statistical approach to local seed-based assembly. It comes as a part of SPLASH package and was particularly suited to assemble regions divere across across samples (see figure below). However, it can be used as an independent assembler on any types of seeds provided by the user.

compactors-idea-v2

Installation, usage, example

Please visit our Wiki page.

References

Marek Kokot*, Roozbeh Dehghannasiri*, Tavor Baharav, Julia Salzman, and Sebastian Deorowicz. Scalable and unsupervised discovery from raw sequencing reads using SPLASH2, Nature Biotechnology (2024)

Roozbeh Dehghannasiri*, Marek Kokot*, Sebastian Deorowicz, and Julia Salzman. sc-SPLASH provides ultra-efficient reference-free discovery in barcoded single-cell sequencing, bioRxiv (2024)

Kaitlin Chaung*, Tavor Baharav*, George Henderson, Ivan Zheludev, Peter Wang, and Julia Salzman. SPLASH: A statistical, reference-free genomic algorithm unifies biological discovery, Cell (2023)

Tavor Baharav, David Tse, and Julia Salzman. OASIS: An interpretable, finite-sample valid alternative to Pearson’s X2 for scientific discovery, PNAS (2024)

George Henderson, Adam Gudys, Tavor Baharav, Punit Sundaramurthy, Marek Kokot, Peter L. Wang, Sebastian Deorowicz, Allison F. Carey, and Julia Salzman. Ultra-efficient, unified discovery from microbial sequencing with SPLASH and precise statistical assembly bioRxiv 2024.01.18.576133 (2024)