-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathst-activity-instructions.Rmd
597 lines (435 loc) · 19.3 KB
/
st-activity-instructions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
---
title: Getting started with Spatial Transcriptomics
output:
html_document:
toc: true
toc_float:
toc_collapsed: true
toc_depth: 3
number_sections: false
css: style.css
editor_options:
markdown:
wrap: 72
---
```{r echo = FALSE, show = FALSE}
knitr::opts_chunk$set(number_sections = FALSE)
```
## Get workshop files
<input type="checkbox"> Download the files for this activity clicking
here:
<https://github.com/fhdsl/Moffitt_ITN_Workshop/archive/refs/heads/main.zip>\
<input type="checkbox"> Put this file on your desktop so it is easily
findable.\
<input type="checkbox"> Double click the zip file (or right click and
choose "unzip" or "decompress" to unzip the file.\
<input type="checkbox"> Open up your activity files you downloaded so we
can see what's there.
### Get familiar with the data set
Within the folder, navigate to `activity-files` and then
`spatial_transcriptomics_activity_files` we should see a metadata file
(`meylan_etal_2022_tumor_grade.csv`), a PDF of the manuscript that
describes these data, and a `visium_samples` folder that includes two
samples.
Each sample's folder contains a several files resulting from the
`spaceranger` pipeline. However, we will use the following files:
- sample_b_01
- GSM5924046_frozen_b_1_filtered_feature_bc_matrix.h5 (the gene
expression data)
- spatial
- GSM5924046_frozen_b_1_tissue_positions_list.csv (spot x,y
spatial positions)
- GSM5924046_frozen_b_1_tissue_hires_image.png (The
H&E-stained tissue image)
- GSM5924046_frozen_b_1_scalefactors_json.json (to scale the
H&E image and plots)
- sample_b_07 ...
The README describes these samples:
> These samples result from tumor biopsies collected from a patient
> cohort with clear cell renal cell carcinoma (ccRCC). The researchers
> profiled the samples with 10X Visium to study the gene expression in
> TLS in a spatial context.
The metadata file (`meylan_etal_2022_tumor_grade.csv`) contains two
variables of interest. One is the tumor grade, and the other is
positivity for a tertiary lymphoid structure (TLS) as ascertained using
immunohistochemistry staining. The file looks like this:
| samplename | cohort | ID | pT | tls |
|:------------|:-------|:----|:----|:----|
| sample_b_01 | IMM | b_1 | pT1 | pos |
| sample_b_07 | IMM | b_7 | pT3 | neg |
## Create an account with spatialGE
<input type="checkbox"> Go to <https://spatialge.moffitt.org/>\
<input type="checkbox"> Click on "Sign Up" in the upper right corner.
## Starting a new project
<input type="checkbox"> Click the blue "New Project" button.\
<input type="checkbox"> For
`What spatial transcriptomics platform are you using for this project?`
choose `Visium` -- this is the type of data our example data are.\
<input type="checkbox"> Make your own project name and description
that's sensible. Could be something related to the workshop such as
"ITN-Moffitt workshop".\
<input type="checkbox"> Then click "Create".
### Uploading the dataset
**IMPORTANT:** *For each sample we will repeat the following steps to
upload each sample's set of files.*
### Uploading one sample's data {#uploading-one-samples-data}
<input type="checkbox"> For `Sample Name` put the ID indicating on the
folder, e.g. `sample_b_01`. This is very important, as sample IDs need
to match exactly the sample IDs in the metadata file
(`meylan_etal_2022_tumor_grade.csv`). Otherwise, no metadata is
imported.\
<input type="checkbox"> For the `Gene expression` box upload the `.h5`
file e.g. `GSM5924046_frozen_b_1_filtered_feature_bc_matrix.h5`. You can
upload files by dragging and dropping or by clicking on them to
navigate.\
<input type="checkbox"> For the `Coordinates` box upload the `.csv` file
e.g. `GSM5924046_frozen_b_1_tissue_positions_list.csv`.\
<input type="checkbox"> For the `Tissue image` box upload the `.png`
file e.g. `GSM5924046_frozen_b_1_tissue_hires_image.png`.\
<input type="checkbox"> For the `Scale factor` box upload the `.json`
file e.g. `GSM5924046_frozen_b_1_scalefactors_json.json`. The scaling
factor file is output automatically by the `10X Space Ranger` pipeline,
and contains information to approximate the size of the tissue image and
the expression plots.
<input type="checkbox"> Once the above steps are done click the green
`Import Sample`.
Now [return to the beginning of these
steps](#uploading-one-samples-data) to repeat the same steps for the
other sample.
*You can use this checklist to keep track as you upload and follow the
steps for each sample.*
<input type="checkbox"> `sample_b_01` data entered\
<input type="checkbox"> `sample_b_07` data entered
### Adding metadata
<input type="checkbox"> Now click `Option 1: Upload metadata file`.\
<input type="checkbox"> Upload the `meylan_etal_2022_tumor_grade.csv`
file. You can drag and drop the file or by click on the (+) button to
navigate.
#### {.click_to_expand_block}
<details>
<summary>Homework: Manually adding sample metadata</summary>
Metadata can also be added manually. To do so, follow these steps:
<input type="checkbox">Click `Option 2: Add metadata manually`.\
<input type="checkbox"> Click `Add new metadata column`. Add a column
named `patient`.\
<input type="checkbox"> Click `Add new metadata column` again. Add a
column named `therapy`.
You can refer to the `meylan_etal_2022_tumor_grade.csv` file's contents
to add these data for each sample:
| samplename | pT | tls |
|:------------|:-----|:----|
| sample_b_01 | pT1 | pos |
| sample_b_07 | pT3 | neg |
| sample_b_18 | pT2 | pos |
| sample_a_01 | pT1b | neg |
<input type="checkbox"> Add this `sample_b_01` corresponding `pT` and
`tls` information.\
<input type="checkbox"> Add this `sample_b_07` corresponding `pT` and
`tls` information.\
<input type="checkbox"> Add this `sample_b_18` corresponding `pT` and
`tls` information.\
<input type="checkbox"> Add this `sample_a_01` corresponding `pT` and
`tls` information.
</details>
####
**Remember:** The sample IDs in the metadata should match exactly the
sample names used during file import.
#### {.click_to_expand_block}
<details>
<summary>Homework: Additional samples in the data set</summary>
Users are encouraged to try spatialGE with these additional samples
after the workshop. These samples were not used in the workshop for
time-efficiency purposes
The metadata for the other two samples looks like this:
| samplename | cohort | ID | pT | tls |
|:------------|:---------|:-----|:-----|:----|
| sample_b_01 | IMM | b_1 | pT1 | pos |
| sample_b_07 | IMM | b_7 | pT3 | neg |
| sample_b_18 | IMM | b_18 | pT2 | pos |
| sample_a_01 | ExhauCRF | a_1 | pT1b | neg |
The metadata file and additional visium samples can be found within the
`additional_activity_files_for_home`.
*You can use this checklist to keep track as you upload and follow the
steps for each sample.*
<input type="checkbox"> `sample_b_01` data entered\
<input type="checkbox"> `sample_b_07` data entered\
<input type="checkbox"> `sample_b_18` data entered\
<input type="checkbox"> `sample_a_01` data entered
</details>
####
### After you've entered the data and metadata:
**NOTE**: Make sure to upload all the samples before clicking the
`Import Data` button. You will not be able to edit the project (unless
you start a new project completely) after you click `Import Data`.
<input type="checkbox"> Make sure everything is as you intend and then
click `Import Data`.
This may take a little bit of time. Note you can have it send you an
email instead of waiting on the page.
#### {.click_to_expand_block}
<details>
<summary>Advanced task: Loading data into the command-line version of
spatialGE</summary>
Most of analyses in this step-by-step tutorial can also be performed
with the spatialGE R package. To do so, start by installing the
spatialGE package:
```{r eval=F}
# Install devtools if not already installed:
#install.packages('devtools')
devtools::install_github('FridleyLab/spatialGE')
```
Then load the spatialGE package:
```{r eval=F}
library('spatialGE')
```
Now, specify the directory paths to the folders containing the data and
the metadata file:
```{r eval=F}
counts_fp = c('./activity-files/spatial_transcriptomics_activity_files/visium_samples/sample_b_01/',
'./activity-files/spatial_transcriptomics_activity_files/visium_samples/sample_b_07/',
'./activity-files/spatial_transcriptomics_activity_files/additional_activity_files_for_home/sample_a_01/',
'./activity-files/spatial_transcriptomics_activity_files/additional_activity_files_for_home/sample_b_18/')
meta_file = './activity-files/spatial_transcriptomics_activity_files/additional_activity_files_for_home/meylan_etal_2022_tumor_grade.csv'
```
Finally, create an STlist. The STlist is the R object in spatialGE that
holds the data and results:
```{r eval=F}
tls_stlist = STlist(rnacounts=counts_fp, samples=meta_file)
```
</details>
####
## Filtering your data
Each ST technology will require different filtering parameters. Compared
to single-cell ST, spot-level ST (e.g., Visium), tends to yield more
counts per spot. Even among spot-level ST projects, these parameters
will need adjustment considering the sequencing depth and cellularity
(i.e., cells per area unit). For these reasons, the values used here
should not be taken as "golden rule", but rather, users are encouraged
to try different parameters and see what filtering procedure produces
the most "noise" reduction without loosing too much relevant
information. spatialGE provides statistics and plots to help the user
assess the effect of filtering.
<input type="checkbox"> Go to the "Filter data" tab.\
<input type="checkbox"> Click "Filter spots/cells".\
<input type="checkbox"> Enter the minimum number of counts a spot needs
to have to be kept in the data set. In this case, 500 will be input.\
<input type="checkbox"> Enter the minimum number of genes a spot needs
to have to be kept in the data set. In this case, 100 will be input.\
<input type="checkbox"> Click the "Mitochondrial genes (\^MT-)" box to
filter spots by mitochondrial gene content. Keep in mind that some ST
platforms do not quantify mitochondrial genes.\
<input type="checkbox"> Enter the maximum percentage of mitochondrial
counts. Use 20% in this case.
#### {.click_to_expand_block}
<details>
<summary>Homework: Performing gene-level filtering</summary>
Users can also remove genes with low number of counts. This is advisable
in most cases. However, since ST features a high gene dropout (i.e.,
genes that the technology fails to quantify), imposing a filter too
stringent might lead to keep very little usable information in the data
set.
To perform a gene count filter:
<input type="checkbox"> Now, to filter out genes, click "Filter genes".\
<input type="checkbox"> Filter out genes with less than 100 counts.\
<input type="checkbox"> Filter out genes expressed in less than 20
spots.
</details>
####
#### {.click_to_expand_block}
<details>
<summary>Advanced task: Filtering data using the command-line version of
spatialGE</summary>
To achieve the same spot- and gene level filtering results as those
obtained with the web application, users can run the following command
in the R console.
```{r eval=F}
tls_stlist = filter_data(tls_stlist,
spot_minreads=500,
spot_mingenes=100,
spot_maxpct=0.2,
gene_minreads=100,
gene_minspots=20)
```
</details>
####
<input type="checkbox"> Once you have all the filter settings as you'd
like click the blue "APPLY FILTER" button.\
<input type="checkbox"> Users can also download a "parameter file",
which contains the filtering settings used for reproducibility. To do
this, locate the "Download parameter log" link below the "APPLY FILTER"
button.
### Visualize filtering results
#### Count distributions
<input type="checkbox"> Click "Violin plots" to visualize count
distribution after filtering.\
<input type="checkbox"> Currently, "total_counts" and "total_genes" per
spot can be visualized.\
<input type="checkbox"> When changing the variable to plot, click the
blue "GENERATE PLOTS" button to update.
#### {.click_to_expand_block}
<details>
<summary>Advanced task: Create count distribution plots with the
command-line version of spatialGE</summary>
```{r eval=F}
spatialGE::distribution_plots(tls_stlist, , plot_meta='total_counts', plot_type='violin')
```
</details>
####
#### {.click_to_expand_block}
<details>
<summary>Homework: Visualizing gene counts in spatial context</summary>
**Quilt plot**
The quilt plot tab within the QC and Data Transformation module allows
visualization of the counts or detected genes per spot. This
functionality might be useful to assess the localization of areas with
low cellularity or necrotic.
<input type="checkbox"> Click `Quilt plot` to visualize the total number
of genes or counts per spot and their spatial context.\
<input type="checkbox"> Select `total_counts`.\
<input type="checkbox"> Select one sample underneath the `First sample`
dropdown menu.\
<input type="checkbox"> And select a second sample to compare to
underneath the `Second sample` drop-down menu.\
<input type="checkbox"> Click blue "GENERATE PLOTS" button to create the
plot.
</details>
####
## Normalize Data
<input type="checkbox"> Click the "Normalize data" tab.\
<input type="checkbox"> Click "Use SCTransform" to apply Seurat's
normalization method.\
<input type="checkbox"> Click the blue "NORMALIZE DATA" to start
normalization.\
#### {.click_to_expand_block}
<details>
<summary>Homework: Assessing data normalization results at the gene
level</summary>
The spatialGE web app allows users to assess the distribution of counts
per spot or cell for specific genes. To do so, follow these steps:
<input type="checkbox"> The distribution of counts per spot for a given
gene can also be plotted. For example, *MAP2K2*. When querying a gene,
keep in mind that the query is case-sensitive. Since these are human
samples, use all-upper case letters.\
<input type="checkbox"> Click "GENERATE PLOTS" to show the number of
*MAP2K2* counts per spot.
</details>
####
#### {.click_to_expand_block}
<details>
<summary>Advanced task: Data normalization using the command-line
version of sptialGE</summary>
```{r eval=F}
# Perform data normalization using SCTransform
# NOTE: To use log-transformation set `method='log'`
tls_stlist = transform_data(tls_stlist, method='sct')
```
</details>
####
## Visualization
### Gene expression comparative visualization
<input type="checkbox"> Click the `Visualization` module on the left
side menu.\
<input type="checkbox"> You can search for your favorite gene in the
`Search and select genes` menu. For this example query and click
*IGKC*.\
<input type="checkbox"> Also query and click *MS4A1* gene.\
<input type="checkbox"> Lastly, also query and click *COL1A1*.\
<input type="checkbox"> Click blue "GENERATE PLOTS" button to create the
plot.
Images can be exported in multiple formats (PNG/SVG/PDF).
#### {.click_to_expand_block}
<details>
<summary>Homework: Generation of gene expression surfaces</summary>
**Gene expression surface ("kriging")**
Users can also generate a "expression surface" to visualize expression
of a gene in the spatial context. This is a type of plot where
expression values are inferred for the spaces not quantified between
spots.
<input type="checkbox"> Click the "Expression surface" tab.\
<input type="checkbox"> In the `Search and select genes` menu search and
select `IGKC`.\
<input type="checkbox"> Click "ESTIMATE SURFACES" button to create the
plot.
</details>
####
#### {.click_to_expand_block}
<details>
<summary>Advanced task: Create spatial gene expression plots using the
command-line version of spatialGE</summary>
```{r eval=F}
STplot(tls_stlist, genes=c('IGKC', 'MS4A1', 'COL1A1'))
```
</details>
####
## Spatial Domain Detection {#detect-spatial-domains}
<input type="checkbox"> Click the "Spatial domain detection" on the left
side menu.\
<input type="checkbox"> Now in the `Number of domains` slider put 3 to 5
domains will be detected in the samples. This is how many clusters will
attempt to be identified.\
<input type="checkbox"> For `Number of most variable genes to use`
choose 3000 with high variation will be used to detect the domains.\
<input type="checkbox"> Finally, click "RUN STCLUST" to find clusters.\
<input type="checkbox"> Explore the results by clicking each `K=` tab.
Images can be exported in multiple formats (PNG/SVG/PDF).
#### {.click_to_expand_block}
<details>
<summary>Advanced task: Domain detection using the command-line version
of spatialGE</summary>
```{r eval=F}
tls_stlist = STclust(tls_stlist, ks=c(3:5))
# Plot the detected domains
STplot(tls_stlist, ks=3:5)
```
</details>
####
#### {.click_to_expand_block}
<details>
<summary>Homework: Other analysis types to explore</summary>
We encourage the users to run these more advanced analyses. These were
not included due to time necessary to complete them. However, these are
some of the advanced analysis types that can help in hypothesis
generation using spatial transcriptomics data.
**Phenotyping**
In spatialGE, inferring cell types on Visium data sets is achieved with
STdeconvolve ([Miller et al.2022](https://doi.org/10.1038/s41467-022-30033-z)).
The STdeconvolve is performed in two stages:
1. The method attempts to fit a series of models composed of "latent
topics". Each latent topic represents a cell type, a cell state, or
even a functional niche.
2. The latent topics are assigned a biological identity based on a list
of reference genes. The assignments are obtained via gene set
enrichment analysis (GSEA).
<input type="checkbox"> Click the "Phenotyping" module on the left side
menu.\
<input type="checkbox"> To begin stage 1, select 7 to 10 topics in the
`Fit LDA models with this many topics` slider. This is the number of
topics within each model: One model with 7 topics, another with 8,
another with 9, and one with 10 topics.\
<input type="checkbox"> For `Use this many variable genes` choose 5000
with high variation will be used to detect the domains.\
<input type="checkbox"> Finally, click "RUN LDA MODELS".\
<input type="checkbox"> To begin stage 2, select the "CellMarker
signatures (v2.0, Human-Cancer)" reference data set from the "Gene
signatures" drop-down.\
<input type="checkbox"> Then, click "ASSIGN IDENTITIES" to begin GSEA.
**Spatial gradient detection**
In spatialGE, users can test if a gene shows higher (or lower)
expression closer to a specific cluster or spatial domain ("spatial
gradient"). The method is useful to investigate spatial patterns at the
interface of spatial domains.
Using the steps previously outlined to [detect spatial domains](#detect-spatial-domains),
it seems spatial domain 3 in the plot with title "stclust_spw0_k3", is an immune-infiltrated
area. With the following steps, users can test for genes with spatial expression
gradients with respect to this immune region:
<input type="checkbox"> Click the "Spatial gradients" module on the left
side menu.\
<input type="checkbox"> Using the table at the top, select all samples.\
<input type="checkbox"> Write "1000" in the "Number of most variable
genes to use" textbox.\
<input type="checkbox"> From the "Annotation to test" drop-down, select
" STclust; Domains (k): 03; No spatial weight".\
<input type="checkbox"> From the "reference cluster" drop-down, select
"3".\
<input type="checkbox"> Then, click "RUN STGRADIENT".
</details>
####