-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path03_SummarizedExperiment.Rmd
202 lines (137 loc) · 4.5 KB
/
03_SummarizedExperiment.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
# SummarizedExperiment review
Instructor: Renee
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r vignetteSetup_SEreview, echo=FALSE, message=FALSE, warning = FALSE}
## For links
library(BiocStyle)
## Bib setup
library(RefManageR)
## Write bibliography information
bib <- c(
smokingMouse = citation("smokingMouse")[1],
SummarizedExperiment = citation("SummarizedExperiment")[1]
)
options(max.print = 50)
```
<iframe width="560" height="315" src="https://www.youtube.com/embed/lqxtgpD-heM" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
[_LIBD rstats club notes_](https://docs.google.com/document/d/1umDODmdQldf5w2lNDoFe-unmezHPonpCiKD270VwkrQ/edit?usp=sharing)
## Overview
The `SummarizedExperiment` class is used to store experimental results in the form of matrixes. Objects of this class include observations (features) of the samples, as well as additional metadata. Usually, this type of object is automatically generated as the output of other software (ie. `SPEAQeasy`), but you can also build them.
One of the main characteristics of `SummarizedExperiment` is that it allows you to handle you data in a "coordinated" way. For example, if you want to subset your data, with `SummarizedExperiment` you can do so without worrying your assays and metadata unsync.
## Quiz
1. How many classes does the `SummarizedExperiment` class has?
2. What does **features** stand for?
3. Which is the structure of the `SummarizedExperiment` class?
<figure>
<img src="Figures/se_structure.png" width="700px" align=center />
</figure>
3. What type of data can we store on an assay?
4. What information does `colData` has?
## Exercises
We are gonna use the same sample data set as yesterday from the `airway` library
```{r, echo=FALSE}
suppressPackageStartupMessages(library(SummarizedExperiment))
suppressPackageStartupMessages(data(airway, package = "airway"))
```
```{r}
library(SummarizedExperiment)
library(airway)
data(airway, package = "airway")
se <- airway
```
<style>
p.exercise {
background-color: #E4EDE2;
padding: 9px;
border: 1px solid black;
border-radius: 10px;
font-family: sans-serif;
}
</style>
<p class="exercise">
**Exercise 1**:
**a)** How many genes do we have in this object? And samples?
**b)** How many samples come from donors treated (`trt`) with dexamethasone (`dex`)?
</p>
```{r}
## For a) you could only print the summary of the object but since the idea is to understand
## how to explore the object find other function that gives you the answer.
se
## Same thing for b, you could just print the colData and count the samples, but this is not
## efficient when our data consists in hundreds of samples. Find the answer using other tools.
colData(se)
```
<p class="exercise">
**Exercise 2**:
Add another assay that has the log10 of your original counts
</p>
```{r}
## In our object, if you look at the part that says assays, we can see that at the moment
## we only have one with the name "counts"
se
## To see the data that's stored in that assay you can do either one of the next commands
assay(se)
assays(se)$counts
## Note that assay() does not support $ operator
# assay(se)$counts
## We would have to do:
assay(se, 1)
assay(se, "counts")
## If you use assays() without specifying the element you want to see it shows you the length
## of the list and the name of each element
assays(se)
## To obtain a list of names as a vector you can do:
assayNames(se)
## Which can also be use to change the name of the assays
assayNames(se)[1] <- "foo"
assayNames(se)
assayNames(se)[1] <- "counts"
```
<p class="exercise">
**Exercise 3**:
Explore the metadata and add a new column that has the library size of each sample.
</p>
```{r}
## To calculate the library size use
apply(assay(se), 2, sum)
```
## Solutions
<style>
p.solution {
background-color: #C093D6;
padding: 9px;
border: 1px solid black;
border-radius: 10px;
font-family: sans-serif;
}
</style>
<p class="solution">
**Solution 1**:
</p>
```{r}
## For a), dim() gives the desired answer
dim(se)
## For b),
colData(se)[colData(se)$dex == "trt", ]
```
<p class="solution">
**Solution 2**:
</p>
```{r}
## There are multiple ways to do it
assay(se, "logcounts") <- log10(assay(se, "counts"))
assays(se)$logcounts_v2 <- log10(assays(se)$counts)
```
<p class="solution">
**Solution 3**:
</p>
```{r}
## To add the library size we an use..
colData(se)$library_size <- apply(assay(se), 2, sum)
names(colData(se))
```