DeepSpot: Leveraging Spatial Context for Enhanced Spatial Transcriptomics Prediction from H&E Images

Authors: Kalin Nonchev, Sebastian Dawo, Karina Selina, Holger Moch, Sonali Andani, Tumor Profiler Consortium, Viktor Hendrik Koelzer, and Gunnar Rätsch

The preprint is available here.

Do you want to generate spatial transcriptomics data using your H&E images?

We introduce DeepSpot, a novel deep-learning model that predicts spatial transcriptomics from H&E images. DeepSpot employs a deep-set neural network to model spots as bags of sub-spots and integrates multi-level tissue details and spatial context. This integration, supported by the robust pre-trained H&E models, significantly enhances the accuracy and granularity of gene predictions from H&E images.

Fig.: DeepSpot leverages pathology foundation models and spatial tissue context. Workflow of DeepSpot: H&E slides are first divided into tiles, each corresponding to a spot. For each spot, we create a bag of sub-spots by dividing it into sub-tiles that capture the local morphology, and a bag of neighboring spots to represent the global context. A pretrained pathology model extracts tile features, which are input to the model. The concatenated representations are then fed into the gene head predictor, ρgene, to predict spatial gene expression.

Setup

git clone https://github.com/ratschlab/DeepSpot
cd DeepSpot
conda env create --file=environment.yaml
conda activate deepspot
python setup.py install

NB: Please ensure you have installed pyvips depending on your machine's requirements. We suggest installing pyvips through conda:

conda install conda-forge::pyvips

Install jupyter kernel

python -m ipykernel install --user --name deepspot --display-name "deepspot"

Getting Started

Please take a look at our notebook collection to get started with DeepSpot. We provide a small toy example.

Pretrained DeepSpot weights

Moreover, we provide pretrained weights for DeepSpot, which were generated during the training of the model in our publication and were used, for example, to generate spatial transcriptomics data for TCGA skin melanoma and kidney cancer slides. Download DeepSpot weights here.

Pathology foundation models

Please ensure that you download the weights for the pathology foundation models and update their file path deepspot/utils/utils_image.py. You may need to agree to specific terms and conditions before downloading.

TCGA spatial transcriptomics data

We provide publicly the predicted spatial transcriptomics data with over 37 million spots from ~1 792 TCGA patients with melanoma or kidney cancer. You can find the data here. Please navigate to the Hugging Face dataset card for more information.

How to start?

pip install datasets

Logging

from huggingface_hub import login, hf_hub_download, snapshot_download
import squidpy as sq
import pandas as pd
import scanpy as sc
import datasets

login(token="YOUR HUGGINGFACE TOKEN")

Load metadata information

# Define dataset details
repo_id = "nonchev/TCGA_digital_spatial_transcriptomics"
filename = "metadata_2025-01-11.csv"

# Create path
file_path = hf_hub_download(repo_id=repo_id, filename=filename, repo_type="dataset")
# Load metata
metadata = pd.read_csv(file_path)
metadata.head()

        dataset slide_type                                          sample_id  n_spots                                          file_path
0     TCGA_SKCM       FFPE  TCGA-BF-AAP6-01Z-00-DX1.EFF1D6E1-CDBC-4401-A10...     5860  TCGA_SKCM/FFPE/TCGA-BF-AAP6-01Z-00-DX1.EFF1D6E...
1     TCGA_SKCM       FFPE  TCGA-FS-A1ZU-06Z-00-DX3.0C477EE6-C085-42BE-8BA...     2856  TCGA_SKCM/FFPE/TCGA-FS-A1ZU-06Z-00-DX3.0C477EE...
2     TCGA_SKCM       FFPE  TCGA-D9-A1X3-06Z-00-DX1.17AC16CC-5B22-46B3-B9C...     6236  TCGA_SKCM/FFPE/TCGA-D9-A1X3-06Z-00-DX1.17AC16C...

Download a single TCGA spatial transcriptomics sample

local_dir = 'TCGA_data'  # Change the folder path as needed

snapshot_download("nonchev/TCGA_digital_spatial_transcriptomics", 
                  local_dir=local_dir,
                  allow_patterns="TCGA_SKCM/FFPE/TCGA-D9-A3Z3-06Z-00-DX1.C4820632-C64D-4661-94DD-9F27F75519C3.h5ad.gz",
                  repo_type="dataset")

adata = sc.read_h5ad("path/to/h5ad.gz")
sq.pl.spatial_scatter(adata, 
                      color=["SOX10", "CD37", "COL1A1", "predicted_label"],
                      size=20, img_alpha=0.8, ncols=2)

Download the entire TCGA digital spatial transcriptomics dataset

local_dir = 'TCGA_data'  # Change the folder path as needed

# Note that the full dataset is around 2TB

snapshot_download("nonchev/TCGA_digital_spatial_transcriptomics", 
                  local_dir=local_dir,
                  repo_type="dataset")

NB: To distinguish in-tissue spots from the background, tiles with a mean RGB value above 200 (near white) were discarded. Additional preprocessing can remove potential image artifacts.

Citation

In case you found our work useful, please consider citing us:

@article{nonchev2025deepspot,
  title={DeepSpot: Leveraging Spatial Context for Enhanced Spatial Transcriptomics Prediction from H\&E Images},
  author={Nonchev, Kalin and Dawo, Sebastian and Silina, Karina and Moch, Holger and Andani, Sonali and Tumor Profiler Consortium and Koelzer, Viktor H and Raetsch, Gunnar},
  journal={medRxiv},
  pages={2025--02},
  year={2025},
  publisher={Cold Spring Harbor Laboratory Press}
}

The code for reproducing the paper results can be found here.

Contact

In case, you have questions, please get in touch with Kalin Nonchev.

NB: Computational data analysis was performed at Leonhard Med (https://sis.id.ethz.ch/services/sensitiveresearchdata/) secure trusted research environment at ETH Zurich. Our pipeline aligns with the specific cluster requirements and resources.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
deepspot		deepspot
example_data/data		example_data/data
example_notebook		example_notebook
figures		figures
pretrained_model_weights/example_model		pretrained_model_weights/example_model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
clean_and_install.sh		clean_and_install.sh
environment.yaml		environment.yaml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepSpot: Leveraging Spatial Context for Enhanced Spatial Transcriptomics Prediction from H&E Images

Do you want to generate spatial transcriptomics data using your H&E images?

Setup

NB: Please ensure you have installed pyvips depending on your machine's requirements. We suggest installing pyvips through conda:

Getting Started

Pretrained DeepSpot weights

Pathology foundation models

TCGA spatial transcriptomics data

How to start?

Logging

Load metadata information

Download a single TCGA spatial transcriptomics sample

Download the entire TCGA digital spatial transcriptomics dataset

NB: To distinguish in-tissue spots from the background, tiles with a mean RGB value above 200 (near white) were discarded. Additional preprocessing can remove potential image artifacts.

Citation

Contact

NB: Computational data analysis was performed at Leonhard Med (https://sis.id.ethz.ch/services/sensitiveresearchdata/) secure trusted research environment at ETH Zurich. Our pipeline aligns with the specific cluster requirements and resources.

About

Releases

Packages

Languages

License

ratschlab/DeepSpot

Folders and files

Latest commit

History

Repository files navigation

DeepSpot: Leveraging Spatial Context for Enhanced Spatial Transcriptomics Prediction from H&E Images

Do you want to generate spatial transcriptomics data using your H&E images?

Setup

NB: Please ensure you have installed pyvips depending on your machine's requirements. We suggest installing pyvips through conda:

Getting Started

Pretrained DeepSpot weights

Pathology foundation models

TCGA spatial transcriptomics data

How to start?

Logging

Load metadata information

Download a single TCGA spatial transcriptomics sample

Download the entire TCGA digital spatial transcriptomics dataset

NB: To distinguish in-tissue spots from the background, tiles with a mean RGB value above 200 (near white) were discarded. Additional preprocessing can remove potential image artifacts.

Citation

Contact

NB: Computational data analysis was performed at Leonhard Med (https://sis.id.ethz.ch/services/sensitiveresearchdata/) secure trusted research environment at ETH Zurich. Our pipeline aligns with the specific cluster requirements and resources.

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages