forked from mnemiopsis/mnemiopsis.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathexperiment.Rmd
executable file
·102 lines (48 loc) · 3.34 KB
/
experiment.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
title: "The Experiment"
bibliography: methods.bibtex
output:
html_document:
toc: true
toc_depth: 3
toc_float:
collapsed: false
---
#Overview
A time series from 1 hour to 9 hours post-fertilization (hpf) was collected for RNA-seq by students of the [MBL Embryology Course](http://www.mbl.edu/education/courses/embryology/) to understand the shifts in gene expression occuring in the early embryo of *M. leidyi*.
#Sample Collection
Animals were collected in the wild at various locations around Woods Hole, MA either by net or by a plastic beaker attached to a pole for gently scooping the animals from the water as pictured:
<img src="methods/Cteno_catch.jpg" style="width:300px;height:400px;">
Multiple biological replicates were collected for timepoints 1, 2, 3, 4, 5, 6, 7, 8, and 9 hpf.
#RNA Extraction and Assembly
- Total RNA was harvested according to [this protocol](methods/ASA_Trizol-RNA-extraction-Mnemiopsis-V2.pdf)
- mRNA was extracted from the total RNA in two parts: [part 1](methods/Dynabead-protocol-Mnemiopsis-partI.pdf) and [part2](methods/Dynabead-protocol-Mnemiopsis-partII.pdf)
- Library prep and sequencing was completed according to [this protocol](methods/LibraryPrepMethodandSequencing.pdf)
- Transcriptome assembly was performed with [this protocol](transcriptome.html).
#Sequence Analysis
RNA sequence reads from each developmental timepoint were aligned to the assembled transcriptome, followed by quantification and expression analysis.
###FASTQ processing commands
```{r SampleAlignmentCommands, engine='sh', eval = FALSE}
index='/path/to/transcriptome_index'
fastq='/path/to/fastq_file'
bowtie2 -p 8 --norc -x ${index} ${fastq} 2>>alignment_report.txt | \
samtools view -@ 1 -b - | samtools sort -T ${fastq}_temp -o ${fastq}.bam -O bam -@ 1 - ;
samtools index ${fastq}.bam;
samtools idxstats ${fastq}.bam | sort -o ${fasq}.idxstats /dev/stdin
```
[@Langmead2012], [@Li2009]
###R Analyses
Counts for isoforms were summed to create a count per 'gene', which served as input to the R package DESeq2 for differential expression analysis and normalization of the read counts [@Love2014]. The following code block is a sample script for the RNA-seq analysis:
For the differential expression analysis, all timepoints were compared against the eight hour time point (8 hpf). The following samples were omitted for failing quality control tests: 7hpf_2012_N1’, ‘7hpf_2013_17’, ‘2hpf_2012_N6’, ‘1hpf_2012_N7’, ‘6hpf_2012_N2’. Additionally, all timepoints were compared against the two hour time point (2 hpf) omitting the following samples for failing quality control tests: 7hpf_2012_N1’, ‘2hpf_2012_N6’.
All heatmaps were created using the R package pheatmap [@kolde2015pheatmap].
Gene Ontology terms [@2014; @Ashburner2000] were assigned to each *M. leidyi* gene based on homologous PFAM domains and significant Swissprot blastp best hits [Camacho2009]. These terms were used to impute functional categories of differentially expressed genes.
###GEO submission
Gene expression data has been deposited in the Gene Expression Omnibus (GEO) under the accession GSE93977.
###Scripts {.tabset}
#### DESeq2
```{r, code = readLines('scripts/runDEseq.R'), eval = FALSE}
```
#### GO Enrichment
```{r, code = readLines('scripts/input_mnemiopsis_GO_enrich.R'), eval = FALSE}
```
# References