-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathst-activity-instructions.Rmd
370 lines (285 loc) · 13.4 KB
/
st-activity-instructions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
---
title: Getting started with Spatial Transcriptomics
output:
html_document:
toc: true
toc_float:
toc_collapsed: true
toc_depth: 3
number_sections: false
css: style.css
editor_options:
markdown:
wrap: 72
---
```{r echo = FALSE, show = FALSE}
knitr::opts_chunk$set(number_sections = FALSE)
```
## Get workshop files
<input type="checkbox"> Download the files for this activity clicking
here:
<https://github.com/fhdsl/ITN_Workshops_2024/archive/refs/heads/main.zip>\
<input type="checkbox"> Put this file on your desktop so it is easily
findable.\
<input type="checkbox"> Double click the zip file (or right click and
choose "unzip" or "decompress" to unzip the file.\
<input type="checkbox"> Open up your activity files you downloaded so we
can see what's here.
### Get familiar with the data set
Within the folder, we should see a metadata file
(`meylan_etal_2022_tumor_grade.csv`), a PDF of the manuscript that
describes these data, and a `visium_samples` folder that includes four
samples.
Each sample's folder contains a several files resulting from the
`spaceranger` pipeline. However, we will use the following files:
- sample_a_01
- GSM5924042_frozen_a_1_filtered_feature_bc_matrix.h5 (the gene
expression data)
- GSM5924042_frozen_a_1_tissue_positions_list.csv (spot x,y
spatial positions)
- GSM5924042_frozen_a_1_tissue_hires_image.png (The H&E-stained
tissue image)
- GSM5924042_frozen_a_1_scalefactors_json.json (to scale the H&E
image and plots)
- sample_b_01 ...
The README describes these samples:
> These samples result from tumor biopsies collected from a patient
> cohort with clear cell renal cell carcinoma (ccRCC). The researchers
> profiled the samples with 10X Visium to study the gene expression in
> TLS in a spatial context.
The metadata file (`meylan_etal_2022_tumor_grade.csv`) contains two
variables of interest. One is the tumor grade, and the other is
positivity for a tertiary lymphoid structure (TLS) as ascertained using
immunohistochemistry staining. The file looks like this:
| samplename | cohort | ID | pT | tls |
|:------------|:---------|:-----|:-----|:----|
| sample_b_18 | IMM | b_18 | pT2 | pos |
| sample_b_07 | IMM | b_7 | pT3 | neg |
| sample_b_01 | IMM | b_1 | pT1 | neg |
| sample_a_01 | ExhauCRF | a_1 | pT1b | pos |
## Create an account with SpatialGE
<input type="checkbox"> Go to <https://spatialge.moffitt.org/>\
<input type="checkbox"> Click on "Sign Up" in the upper right corner.
## Starting a new project
<input type="checkbox"> Click the blue "New Project" button.\
<input type="checkbox"> For
`What spatial transcriptomics platform are you using for this project?`
choose `Visium` -- this is the type of data our example data are.\
<input type="checkbox"> Make your own project name and description
that's sensible. Could be something related to the workshop such as
"ITN-Moffitt workshop".\
<input type="checkbox"> Then click "Create".
## Uploading the dataset
*For each sample we will repeat the following steps to upload each
sample's set of files.*
### Uploading one sample's data {#uploading-one-samples-data}
<input type="checkbox"> For `Sample Name` put the ID indicating on the
folder, e.g. `sample_b_18`. This is very important, as sample IDs need
to match exactly the sample IDs in the metadata file
(`meylan_etal_2022_tumor_grade.csv`). Otherwise, no metadata is
imported.\
<input type="checkbox"> For the `Gene expression` box upload the `.h5`
file e.g. `GSM5924049_frozen_b_18_raw_feature_bc_matrix.h5`. You can
upload files by dragging and dropping or by clicking on them to
navigate.\
<input type="checkbox"> For the `Coordinates` box upload the `.csv` file
e.g. `GSM5924049_frozen_b_18_tissue_positions_list.csv`.\
<input type="checkbox"> For the `Tissue image` box upload the `.png`
file e.g. `GSM5924049_frozen_b_18_tissue_hires_image.png`.\
<input type="checkbox"> For the `Scale factor` box upload the `.json`
file e.g. `GSM5924049_frozen_b_18_scalefactors_json.json`. The scaling
factor file is output automatically by the `10X Space Ranger` pipeline,
and contains information to approximate the size of the tissue image and
the expression plots.
<input type="checkbox"> Once the above steps are done click the green
`Import Sample`.
Now [return to the beginning of these
steps](#uploading-one-samples-data) to repeat the same steps for the
other sample.
*You can use this checklist to keep track as you upload and follow the
steps for each sample.*
<input type="checkbox"> `sample_b_18` data entered\
<input type="checkbox"> `sample_b_07` data entered\
<input type="checkbox"> `sample_b_01` data entered\
<input type="checkbox"> `sample_a_01` data entered
### Adding metadata
<input type="checkbox"> Now click `Option 1: Upload metadata file`.\
<input type="checkbox"> Upload the `meylan_etal_2022_tumor_grade.csv`
file. You can drag and drop the file or by click on the (+) button to
navigate.\
#### {.click_to_expand_block}
<details>
Metadata can also be added manually. To do so, follow these steps:
<input type="checkbox"> Click `Option 2: Add metadata manually`.\
<input type="checkbox"> Click `Add new metadata column`. Add a column
named `patient`.\
<input type="checkbox"> Click `Add new metadata column` again. Add a
column named `therapy`.
You can reference the `meylan_etal_2022_tumor_grade.csv` file's contents
to add these data for each sample:
| samplename | pT | tls |
|:------------|:-----|:----|
| sample_b_18 | pT2 | pos |
| sample_b_07 | pT3 | neg |
| sample_b_01 | pT1 | neg |
| sample_a_01 | pT1b | pos |
<input type="checkbox"> Add this `sample_b_18` corresponding `pT` and
`tls` information.\
<input type="checkbox"> Add this `sample_b_07` corresponding `pT` and
`tls` information.\
<input type="checkbox"> Add this `sample_b_01` corresponding `pT` and
`tls` information.\
<input type="checkbox"> Add this `sample_a_01` corresponding `pT` and
`tls` information.\
####
**Remember:** The sample IDs in the metadata should match exactly the
sample names used during file import.
### After you've entered the data and metadata:
**NOTE**: Make sure to upload all the samples before clicking the
`Import Data` button
<input type="checkbox"> You will not be able to edit the project (unless
you start a new project completely) after you click `Import Data`. So
make sure everything is as you intend and then click `Import Data`.
This will take a little bit of time. Note you can have it send you an
email instead of waiting on the page.
## Filtering your data
Each ST technology will require different filtering parameters. Compared
to single-cell ST, spot-level ST (e.g., Visium), tends to yield more
counts per spot. Even among spot-level ST projects, these parameters
will need adjustment considering the sequencing depth and cellularity
(i.e., cells per area unit). For these reasons, the values used here
should not be taken as "golden rule", but rather, users are encouraged
to try different parameters and see what filtering procedure produces
the most "noise" reduction without loosing too much relevant
information. spatialGE provides statistics and plots to help the user
assess the effect of filtering.
<input type="checkbox"> Go to the "Filter data" tab.\
<input type="checkbox"> Click "Filter spots/cells".\
<input type="checkbox"> Enter the minimum number of counts a spot needs
to have to be kept in the data set. In this case, 500 will be input.\
<input type="checkbox"> Enter the minimum number of genes a spot needs
to have to be kept in the data set. In this case, 100 will be input.\
<input type="checkbox"> Click the "Mitochondrial genes (\^MT-)" box to
filter spots by mitochondrial gene content. Keep in mind that some ST
platforms do not quantify mitochondrial genes.\
<input type="checkbox"> Enter the maximum percentage of mitochondrial
counts. Use 20% in this case.\
#### {.click_to_expand_block}
<details>
Users can also remove genes with low number of counts. This is advisable
in most cases. However, since ST featurs a high gene dropout (i.e.,
genes that the technology fails to quantify), imposing a filter too
stringent might lead to keep very little usable information in the data
set.
To perform a gene count filter:
<input type="checkbox"> Now, to filter out genes, click "Filter genes".\
<input type="checkbox"> Filter out genes with less than 2000 counts.\
<input type="checkbox"> Filter out genes expressed in less than 20
spots.\
####
<input type="checkbox"> Once you have all the filter settings as you'd
like click the blue "APPLY FILTER" button.\
<input type="checkbox"> Users can also download a "parameter file",
which contains the filtering settings used for reproducibility. To do
this, locate the "Download parameter log" link below the "APPLY FILTER"
button.
### Visualize filtering results
#### Count distributions
<input type="checkbox"> Click "Violin plots" to visualize count
distribution after filtering.\
<input type="checkbox"> Currently, "total_counts" and "total_genes" per
spot can be visualized.\
<input type="checkbox"> When changing the variable to plot, click the
blue "GENERATE PLOTS" button to update.
#### {.click_to_expand_block}
<details>
#### Quilt plot
The quilt plot tab within the QC and Data Transformation module allows visualization of
the counts or detected genes per spot. This functionality might be useful to assess
the localization of areas with low cellularity or necrotic.
<input type="checkbox"> Click `Quilt plot` to visualize the total number
of genes or counts per spot and their spatial context.\
<input type="checkbox"> Select `total_counts`.\
<input type="checkbox"> Select one sample underneath the `First sample`
dropdown menu.\
<input type="checkbox"> And select a second sample to compare to
underneath the `Second sample` drop-down menu.\
<input type="checkbox"> Click blue "GENERATE PLOTS" button to create the
plot.
####
## Normalize Data
<input type="checkbox"> Click the "Normalize data" tab.\
<input type="checkbox"> Click "Use SCTransform" to apply Seurat's
normalization method.\
<input type="checkbox"> Click the blue "NORMALIZE DATA" to start
normalization.\
#### {.click_to_expand_block}
<details>
<input type="checkbox"> The distribution of counts per spot for a given
gene can also be plotted. For example, *MAP2K2*. When querying a gene,
keep in mind that the query is case-sensitive. Since these are human
samples, use all-upper case letters.\
<input type="checkbox"> Click "GENERATE PLOTS" to show the number of
*MAP2K2* counts per spot.
####
## Visualization
### Gene expression comparative visualization
<input type="checkbox"> Click the `Visualization` module on the left
side menu.\
<input type="checkbox"> You can search for your favorite gene in the
`Search and select genes` menu. For this example query and click
*IGKC*.\
<input type="checkbox"> Also query and click *MS4A1* gene.\
<input type="checkbox"> Lastly, also query and click *COL1A1*.\
<input type="checkbox"> Click blue "GENERATE PLOTS" button to create the
plot.
Images can be exported in multiple formats (PNG/SVG/PDF).
#### {.click_to_expand_block}
<details>
<summary>Click here for instructions on gene expression
surfaces.</summary>
Expression surface
Alternatively, an "expression surface" can be generated. This is a type
of plot where expression values are inferred for the spaces not
quantified between spots
<input type="checkbox"> Click the "Expression surface" tab.\
<input type="checkbox"> In the `Search and select genes` menu search and
select `IGKC`.\
<input type="checkbox"> Click "ESTIMATE SURFACES" button to create the
plot.
</details>
####
## Spatial Domain Detection
<input type="checkbox"> Click the "Spatial domain detection" on the left
side menu.\
<input type="checkbox"> Now in the `Number of domains` slider put 3 to 5
domains will be detected in the samples. This is how many clusters will
attempt to be identified.\
<input type="checkbox"> For `Number of most variable genes to use`
choose 3000 with high variation will be used to detect the domains.\
<input type="checkbox"> Finally, click "RUN STCLUST" to find clusters.\
<input type="checkbox"> Explore the results by clicking each `K=` tab.
Images can be exported in multiple formats (PNG/SVG/PDF).
## Phenotyping
In spatialGE, inferring cell types on Visium data sets is achieved with
STdeconvolve ([Miller et al.
2022](https://doi.org/10.1038/s41467-022-30033-z)). The STdeconvolve is
performed in two stages:
1. The method attempts to fit a series of models composed of "latent
topics". Each latent topic represents a cell type, a cell state, or
even a functional niche.
2. The latent topics are assigned a biological identity based on a list
of reference genes. The assignments are obtained via gene set
enrichment analysis (GSEA).
<input type="checkbox"> Click the "Phenotyping" module on the left
side menu.\
<input type="checkbox"> To begin stage 1, select 7 to 10 topics in the
`Fit LDA models with this many topics` slider. This is the number of topics within
each model: One model with 7 topics, another with 8, another with 9,
and one with 10 topics.\
<input type="checkbox"> For `Use this many variable genes`
choose 5000 with high variation will be used to detect the domains.\
<input type="checkbox"> Finally, click "RUN LDA MODELS".\
<input type="checkbox"> To begin stage 2, select the "CellMarker signatures (v2.0, Human-Cancer)"
reference data set from the "Gene signatures" drop-down.\
<input type="checkbox"> Then, click "ASSIGN IDENTITIES" to begin GSEA.\