-
Notifications
You must be signed in to change notification settings - Fork 9
/
index.Rmd
171 lines (133 loc) · 6.58 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---
title: "Bioinformatics"
author: "Laurent Gatto"
date: "`r Sys.Date()`"
site: bookdown::bookdown_site
knit: bookdown::preview_chapter
description: "Course material for the Bbioinformatics (WSBIM1322) course at UCLouvain."
output:
msmbstyle::msmb_html_book:
toc: TRUE
toc_depth: 1
split_by: chapter
split_bib: no
css: style.css
mathjax: https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml-full.js
link-citations: yes
bibliography: [refs.bib, packages.bib]
---
```{r globalOptions, echo = FALSE}
knitr::opts_chunk$set(dev = "CairoPNG")
```
# Preamble {-}
The [WSBIM1322](https://uclouvain.be/cours-2021-wsbim1322.html) course
teaches the basics of statistical data analysis applied to high
throughput biology. It is aimed at biology and biomedical students
that are already familiar with the R langauge (see the pre-requisites
section below). The students will familiarise themselves with
statistical learning concepts such as unsupervised and supervised
learning, hypothesis testing, and extend their understanding and
practice in R data structures and programming and the Bioconductor
project.
The course will be followed by *Omics data analysis*
([WSBIM2122](https://github.com/UCLouvain-CBIO/WSBIM2122)).
## Motivation {-}
Today, it is difficult to overestimate the very broad importance and
impact of *data*. Given the abundance of data around us, and the
sophistication of tools for their analysis and interpretation that are
readily available, data has become a tool of profound social
change. Research in general, and biomedical research in particular, is
at the centre of this evolution. And while bioinformatics has been
playing a central role in bio-medical research for many years now,
bioinformatics skills aren't well integrated in life science
curricula, limiting students in their career prospects and research
horizon [@WilsonSayres:2018]. It is important for young researchers to
acquire quantitative, computational and data skills to address the
challenges that lie
[ahead](https://uclouvain-cbio.github.io/WSBIM1207/#motivation).
This course will focus on the application of data analysis methods and
algorithms, and the interpretation of their outputs. We will be using
the [R](https://www.R-project.org/) language and environment [@R] and
the [RStudio integrated development
environment](https://www.rstudio.com/products/RStudio/) to acquire
these data skills. Other interactive language such as
[Python](https://www.R-project.org/) and the interactive [jupyer
notebooks](https://jupyter.org/) would also have been a good fit. One
motivation of this choice is the availability of numerous
R/[Bionductor](https://www.bioconductor.org/) packages [@Huber:2015]
for the analysis of high throughput biology data.
Below, you can find three short videos (in French, with subtitles in
multiple languages) by PhD students that assist in the teaching of
this course. They will provide you with real-work applications of some
of the concepts taught in this course.
- [Julie Devis](https://youtu.be/MOa9fCHqKko) is pursuing a PhD in the
Computational Biology and Bioinformatics Unit with Prof Laurent
Gatto. She uses R and Bioconductor to study the methylation in
cancer germ-line genes.
- [Valentine Robaux](https://youtu.be/XUiOQ0LRKxc) is pursuing a PhD
in the Cardiovascular Research Unit with Profs Sandrine Horman and
Christophe Beauloye. She uses R and Bioconductor to investigate
platelet function.
- [Jean Fain](https://youtu.be/XC6HDdPzPac) is pursuing a PhD in the
Epigenetics Unit with Prof Charles De Smet. He uses R and
Bioconductor to study DNA methylation and its role in cancer.
## References and credits {-}
References are provided throughout the course. Several stand out
however, as they cover large parts of the material or provide
complementary resources.
- **Modern Statistics for Modern Biology**, by Susan Holmes and
Wolfgang Huber [@MSMB]. A free online version of the book is
available [here](https://www.huber.embl.de/msmb/).
- **An Introduction to Statistical Learning with Applications in R**
by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
[@ISLR]. A free pdf of the book is available
[here](http://faculty.marshall.usc.edu/gareth-james/ISL/index.html).
This course is being taught by Prof Laurent Gatto with invaluable
assistance from Dr Axelle Loriot at the Faculty of Pharmacy and
Biomedical Sciences (FASB) at the UCLouvain, Belgium.
## Pre-requisites {-}
Students taking this course should be familiar with data analysis and
visualisation in R. A formal pre-requisite for students taking the
class is the introductory course
[WSBIM1207](https://UCLouvain-CBIO.github.io/WSBIM1207). The first
chapter provides a refresher of the R skills needed for the rest of
the course.
Software requirements are documented in the *Setup* section below.
## About this course material {-}
This material is written in R markdown [@R-rmarkdown] and compiled as a
book using `knitr` [@R-knitr] `bookdown` [@R-bookdown]. The source
code is publicly available in a Github repository
[https://github.com/uclouvain-cbio/WSBIM1322](https://github.com/uclouvain-cbio/WSBIM1322)
and the compiled material can be read at http://bit.ly/WSBIM1322.
Contributions to this material are welcome. The best way to contribute
or contact the maintainers is by means of pull requests and
[issues](https://github.com/uclouvain-cbio/WSBIM1322/issues). Please
familiarise yourself with the [code of
conduct](https://github.com/UCLouvain-CBIO/WSBIM1322/blob/master/CONDUCT.md). By
participating in this project you agree to abide by its terms.
## Citation {-}
If you use this course, please cite it as
> Laurent Gatto. *UCLouvain-CBIO/WSBIM1322: Bioinformatics.*
> https://github.com/UCLouvain-CBIO/WSBIM1322.
## License {-}
This material is licensed under the [Creative Commons
Attribution-ShareAlike 4.0
License](https://creativecommons.org/licenses/by-sa/4.0/).
## Setup {-}
We will be using the [R environment for statistical
computing](https://www.r-project.org/) as main data science language.
We will also use the
[RStudio](https://www.rstudio.com/products/RStudio/) interface to
interact with R and write scripts and reports. Both R and RStudio are
easy to install and works on all major operating systems.
Once R and RStudio are installed, a set of packages will need to be
installed. See section \@ref(sec-setup2) for details.
The `rWSBIM1322` package provides some pre-formatted data used in this
course. It can be installed with
```{r, eval = FALSE}
BiocManager::install("UCLouvain-CBIO/rWSBIM1322")
```
and then loaded with
```{r rwsbim1322, message = FALSE, warning = FALSE}
library("rWSBIM1322")
```