Skip to content

HEARTSVG: a high-efficiency and robust method for identifying spatial expression patterns in large-scale spatial transcriptomic data

License

Notifications You must be signed in to change notification settings

cz0316/HEARTSVG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HEARTSVG

HEARTSVG: a high-efficiency and robust method for identifying spatial expression patterns in large-scale spatial transcriptomic data We proposed HEARTSVG, which builds upon serial correlation tests for rapidly and precisely detecting SVGs without relying on the underlying data-generative model or pre-defining spatial patterns.Furthermore, we developed the software HEARTSVG to provide an SVG auto-clustering module for predictions of spatial domain and visualizations, allowing users to discover novel biological phenomena.

To be able to use the R package HEARTSVG successfully, we suggest manually installing the following packages and their dependent packages first: spatstat, Seurat, plotly, plot3D, TTR, stringr, fpp2, seasonal, cluster, poolr, reshape2, data.table, plyr, dplyr, ggplot2, MASS, EnvStats, BreakPoints, gprofiler2

library(spatstat);library(Seurat);library(plot3D);library(plotly);library(TTR);
library(stringr);library(fpp2);library(seasonal);library(cluster);library(poolr);
library(reshape2);library(data.table);library(plyr);library(dplyr);library(ggplot2);
library(MASS);library(EnvStats);library(BreakPoints);library(gprofiler2)

Required R modules

R (>= 3.5.0)

Dependencies

fpp2 (>=2.4)
seasonal (>=1.9.0)
cluster (>=2.1.3)
poolr (>=1.1.1)
Seurat (4.1.1)
spatstat (>=2.3.4)
ggplot2 (>=3.4.2)
EnvStats (>=2.7.0)
BreakPoints (>=1.2)
plotly (>=4.10.0)
plot3D (>=1.4)
gprofiler2 (>=0.2.1)


Installation

library(devtools)
devtools::install_github('https://github.com/cz0316/HEARTSVG',force = T)
library(HEARTSVG)

Operating systems tested on (HEARTSVG 1.1.0) :

macOS Ventura 13.4.1
Windows 10

2023/12 Update

We optimized the code to achieve reductions in runtime and memory usage, and the optimized function is named 'heartsvg_fast'. However, due to an unknown error, we are unable to compile the 'heartsvg_fast' function into the R package. As a result, we are providing it separately (file_name: heartsvg_fast.R). We are currently investigating the cause of the error.

0 Load data

We use the ST data from the Wu et al. study as an example to introduce the usage of the R package HEARTSVG. The article and data download link are: https://doi.org/10.1158/2159-8290.CD-21-0316; http://www.cancerdiversity.asia/scCRLM/.

stc2.dt=readRDS(file = '~/HEARTSVG/data/HEARTSVG_demo.RDS')
dim(stc2.dt)
## [1] 4174 15429

If your data is of type seuratObject, we provide two functions, filter.gene and seurat.trans , for converting the data into a format suitable for HEARTSVG. We use the function filter.gene to filter the non-expressed genes and low expression proportion genes. Then we use the function seurat.trans to transform the data type, from SeuratObject to a data.frame.

## stc2=filter.gene(stc2,thr=0.01) # thr=0.01 represents the minimum expression proportion of each gene across all cells.
## dim(stc2)
## [1] 8812 4174

## stc2.dt=seurat.trans(stc2)
## dim(stc2.dt)
## [1] 4174 8814

1 Find SVG

The input data of the function heartsvg is a data.frame with dimensions n*(p+2) contains gene coordinates and gene expression counts.

The first 2 columns should represent coordinates and their column names must be 'row' and 'col', respectively. The remaining p columns correspond to the expression levels of p genes. There are n rows, representing n spots.

demo=stc2.dt[,1:1002]
demo[1:6,1:6]
##                    row col SAMD11 NOC2L HES4 ISG15
## AAACAACGAATAGTTC-1   0  16      0     0    1     1
## AAACAAGTATCTCCCA-1  50 102      0     2    0     2
## AAACAATCTACTAGCA-1   3  43      0     1    0     0
## AAACACCAATAACTGC-1  59  19      1     0   11     6
## AAACAGAGCGACTCCT-1  14  94      0     0    1     2
## AAACAGCTTTCAGAAG-1  43   9      0     0    0     2
## find SVG
result=heartsvg(demo,scale=T)
head(result)
##       gene pval p_adj rank
## 729  RPS27    0     0    1
## 604   PIGR    0     0    2
## 746 S100A6    0     0    3
## 185  CSRP1    0     0    4
## 725  RPL11    0     0    5
## 732   RPS8    0     0    6

Regarding the parameter 'scale', the default value is T. This parameter is used to mitigate the impact of extreme values caused by technical factors.

If you are using technologies such as 10X Visium, which produce data with a moderate level of sparsity and some noise, we recommend setting scale=T.

Conversely, if you are using technologies such as HDST, which generate highly sparse data with very low expression levels, we recommend setting scale=F.

Additionally, if you have already performed noise reduction, normalization, or other preprocessing steps on your ST data, we also recommend setting scale=F.

2 Visualization

Different visualizations of top SVGs.

2.1 2D plot

svg.2Dplot(data = demo,gene=result$gene[1:3],method = 'raw')

image image image

2.2 Marginal expression plot of SVG

The marginal expression of genes are obtained by the semi-pooling process.

svg.Mplot(data=demo,gene=result$gene[1:3],method = 'raw')

image image image

2.3 3D plot

2.3.1 3D scatter plot

svg.3Dplot(data=demo,gene = c('RPS8'),type = 'scatter')

newplot

2.3.1 3D surface plot

svg.3Dplot(data=demo,gene=c('RPS27'),type = 'surface')

image

3 Prediction of spatial domains

We offers an auto-clustering module for SVGs that facilitates further analyses, including predicting spatial domains, conducting functional analyses, and visualization based on the set of SVG files.
We categorize SVGs based on their similarity in spatial expression and adopt a data-driven approach to estimate the number of spatially coherent regions in both SVG expression and histology.

3.1 clustering for SVGs

clust_demo=svg.clust(data=demo,svg=result$gene[1:500],method = 'h')
table(clust_demo$cluster)
## 
##  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17 
## 403  16  59   6   1   1   2   1   3   1   1   1   1   1   1   1   1 

km_demo=svg.clust(data=demo,svg=result$gene[1:500],method = 'k',n.c=6)
table(km_demo$cluster)
## 
##   1   2   3   4   5   6 
## 114 139 123  27  68  29

3.2 visualization of spatial domains

svg.2Dplot(data=demo,gene=clust_demogene[clustdemogene[clust_democluster==1],method = 'mean',title = 'domain-1')

image

svg.2Dplot(data=demo,gene=clust_demogene[clustdemogene[clust_democluster==2],method = 'mean',title = 'domain-2')

cl1.en=svg.enrichment(svg=clust_demo$gene[clust_demo$cluster==1])

head(cl1.en)

4 Enrichment

We perform enrichment analysis of SVG sets based on the package gprofiler2.

cl1.en=svg.enrichment(svg=clust_demogene[clustdemogene[clust_democluster==1])

head(cl1.en)
##     query significant      p_value term_size query_size intersection_size
## 1 query_1        TRUE 5.007490e-08      5861        405               174
## 2 query_1        TRUE 2.599282e-07      4088        405               132
## 3 query_1        TRUE 5.064712e-07      6348        405               181
## 4 query_1        TRUE 5.795109e-07      5657        405               166
## 5 query_1        TRUE 1.015270e-06      6446        405               182
## 6 query_1        TRUE 2.018340e-06      2664        405                95
##   precision     recall    term_id source
## 1 0.4296296 0.02968777 GO:0048856  GO:BP
## 2 0.3259259 0.03228963 GO:0042221  GO:BP
## 3 0.4469136 0.02851292 GO:0048518  GO:BP
## 4 0.4098765 0.02934418 GO:0048522  GO:BP
## 5 0.4493827 0.02823456 GO:0032502  GO:BP
## 6 0.2345679 0.03566066 GO:0010033  GO:BP
##                                   term_name effective_domain_size source_order
## 1          anatomical structure development                 21128        13923
## 2                      response to chemical                 21128        10676
## 3 positive regulation of biological process                 21128        13622
## 4   positive regulation of cellular process                 21128        13626
## 5                     developmental process                 21128         8073
## 6             response to organic substance                 21128         3939
##                              parents
## 1                         GO:0032502
## 2                         GO:0050896
## 3             GO:0008150, GO:0050789
## 4 GO:0009987, GO:0048518, GO:0050794
## 5                         GO:0008150
## 6                         GO:0042221

Rplot

Citation

Yuan, X., Ma, Y., Gao, R. et al. HEARTSVG: a fast and accurate method for identifying spatially variable genes in large-scale spatial transcriptomics. Nat Commun 15, 5700 (2024). https://doi.org/10.1038/s41467-024-49846-1

doi: https://doi.org/10.1038/s41467-024-49846-1

About

HEARTSVG: a high-efficiency and robust method for identifying spatial expression patterns in large-scale spatial transcriptomic data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages