03_SummarizedExperiment.Rmd

# SummarizedExperiment review

Instructor: Renee

```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)
```


```{r vignetteSetup_SEreview, echo=FALSE, message=FALSE, warning = FALSE}
## For links
library(BiocStyle)

## Bib setup
library(RefManageR)

## Write bibliography information
bib <- c(
    smokingMouse = citation("smokingMouse")[1],
    SummarizedExperiment = citation("SummarizedExperiment")[1]
)

options(max.print = 50)
```

<iframe width="560" height="315" src="https://www.youtube.com/embed/lqxtgpD-heM" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

[_LIBD rstats club notes_](https://docs.google.com/document/d/1umDODmdQldf5w2lNDoFe-unmezHPonpCiKD270VwkrQ/edit?usp=sharing)

## Overview
The `SummarizedExperiment` class is used to store experimental results in the form of matrixes. Objects of this class include observations (features) of the samples, as well as additional metadata. Usually, this type of object is automatically generated as the output of other software (ie. `SPEAQeasy`), but you can also build them. 

One of the main characteristics of `SummarizedExperiment` is that it allows you to handle you data in a "coordinated" way. For example, if you want to subset your data, with `SummarizedExperiment` you can do so without worrying your assays and metadata unsync.


## Quiz

1. How many classes does the `SummarizedExperiment` class has? 

2. What does **features** stand for? 

3. Which is the structure of the `SummarizedExperiment` class? 

<figure>
    <img src="Figures/se_structure.png" width="700px" align=center />
</figure>


3. What type of data can we store on an assay?

4. What information does `colData` has? 


## Exercises

We are gonna use the same sample data set as yesterday from the `airway` library

```{r, echo=FALSE}
suppressPackageStartupMessages(library(SummarizedExperiment))
suppressPackageStartupMessages(data(airway, package = "airway"))
```

```{r}
library(SummarizedExperiment)
library(airway)

data(airway, package = "airway")
se <- airway
```

<style>
p.exercise  {
background-color: #E4EDE2;
padding: 9px;
border: 1px solid black;
border-radius: 10px;
font-family: sans-serif;
}

</style>

<p class="exercise">
**Exercise 1**:
**a)** How many genes do we have in this object? And samples?
**b)** How many samples come from donors treated (`trt`) with dexamethasone (`dex`)? 
</p>


```{r}
## For a) you could only print the summary of the object but since the idea is to understand
## how to explore the object find other function that gives you the answer.
se

## Same thing for b, you could just print the colData and count the samples, but this is not
## efficient when our data consists in hundreds of samples. Find the answer using other tools.

colData(se)
```

<p class="exercise">
**Exercise 2**:
Add another assay that has the log10 of your original counts
</p>

```{r}
## In our object, if you look at the part that says assays, we can see that at the moment
## we only have one with the name "counts"

se

## To see the data that's stored in that assay you can do either one of the next commands
assay(se)
assays(se)$counts

## Note that assay() does not support $ operator
# assay(se)$counts

## We would have to do:
assay(se, 1)
assay(se, "counts")

## If you use assays() without specifying the element you want to see it shows you the length
## of the list and the name of each element
assays(se)

## To obtain a list of names as a vector you can do:
assayNames(se)

## Which can also be use to change the name of the assays
assayNames(se)[1] <- "foo"
assayNames(se)
assayNames(se)[1] <- "counts"
```


<p class="exercise">
**Exercise 3**:
Explore the metadata and add a new column that has the library size of each sample. 
</p>

```{r}
## To calculate the library size use

apply(assay(se), 2, sum)
```


## Solutions

<style>
p.solution  {
background-color: #C093D6;
padding: 9px;
border: 1px solid black;
border-radius: 10px;
font-family: sans-serif;
}
</style>


<p class="solution">
**Solution 1**:
</p>

```{r}
## For a), dim() gives the desired answer

dim(se)

## For b),

colData(se)[colData(se)$dex == "trt", ]
```


<p class="solution">
**Solution 2**:
</p>

```{r}
## There are multiple ways to do it

assay(se, "logcounts") <- log10(assay(se, "counts"))

assays(se)$logcounts_v2 <- log10(assays(se)$counts)
```

<p class="solution">
**Solution 3**:
</p>

```{r}
## To add the library size we an use..

colData(se)$library_size <- apply(assay(se), 2, sum)

names(colData(se))
```