experiment.Rmd

---
title: "The Experiment"
bibliography: methods.bibtex
output:
  html_document:
      toc: true
      toc_depth: 3
      toc_float:
        collapsed: false

---

#Overview

A time series from 1 hour to 9 hours post-fertilization (hpf) was collected for RNA-seq by students of the [MBL Embryology Course](http://www.mbl.edu/education/courses/embryology/) to understand the shifts in gene expression occuring in the early embryo of *M. leidyi*.


#Sample Collection
Animals were collected in the wild at various locations around Woods Hole, MA either by net or by a plastic beaker attached to a pole for gently scooping the animals from the water as pictured:


<img src="methods/Cteno_catch.jpg" style="width:300px;height:400px;">


Multiple biological replicates were collected for timepoints 1, 2, 3, 4, 5, 6, 7, 8, and 9 hpf. 


#RNA Extraction and Assembly

- Total RNA was harvested according to [this protocol](methods/ASA_Trizol-RNA-extraction-Mnemiopsis-V2.pdf)

- mRNA was extracted from the total RNA in two parts: [part 1](methods/Dynabead-protocol-Mnemiopsis-partI.pdf) and [part2](methods/Dynabead-protocol-Mnemiopsis-partII.pdf)


- Library prep and sequencing was completed according to [this protocol](methods/LibraryPrepMethodandSequencing.pdf)


- Transcriptome assembly was performed with [this protocol](transcriptome.html).


#Sequence Analysis

RNA sequence reads from each developmental timepoint were aligned to the assembled transcriptome, followed by quantification and expression analysis.


###FASTQ processing commands


```{r SampleAlignmentCommands, engine='sh', eval = FALSE}

index='/path/to/transcriptome_index'
fastq='/path/to/fastq_file'

bowtie2 -p 8 --norc -x ${index} ${fastq}  2>>alignment_report.txt | \

samtools view -@ 1 -b - | samtools sort -T ${fastq}_temp -o ${fastq}.bam -O bam -@ 1 - ;

samtools index ${fastq}.bam;

samtools idxstats ${fastq}.bam | sort -o ${fasq}.idxstats /dev/stdin

```
[@Langmead2012], [@Li2009]


###R Analyses


Counts for isoforms were summed to create a count per 'gene', which served as input to the R package DESeq2 for differential expression analysis and normalization of the read counts [@Love2014]. The following code block is a sample script for the RNA-seq analysis:


For the differential expression analysis, all timepoints were compared against the eight hour time point (8 hpf). The following samples were omitted for failing quality control tests: 7hpf_2012_N1’, ‘7hpf_2013_17’, ‘2hpf_2012_N6’, ‘1hpf_2012_N7’, ‘6hpf_2012_N2’. Additionally, all timepoints were compared against the two hour time point (2 hpf) omitting the following samples for failing quality control tests: 7hpf_2012_N1’, ‘2hpf_2012_N6’. 


All heatmaps were created using the R package pheatmap [@kolde2015pheatmap].

Gene Ontology terms [@2014; @Ashburner2000] were assigned to each *M. leidyi* gene based on homologous PFAM domains and significant Swissprot blastp best hits [Camacho2009]. These terms were used to impute functional categories of differentially expressed genes.


###GEO submission
Gene expression data has been deposited in the Gene Expression Omnibus (GEO) under the accession GSE93977.


###Scripts {.tabset}

#### DESeq2
```{r, code = readLines('scripts/runDEseq.R'), eval = FALSE}

```


#### GO Enrichment

```{r, code = readLines('scripts/input_mnemiopsis_GO_enrich.R'), eval = FALSE}

```

# References