forked from OHI-Science/data-science-training
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtidyr_practice.Rmd
107 lines (77 loc) · 1.91 KB
/
tidyr_practice.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
title: "TidyR"
output: html_document
---
```{r}
library(tidyverse)
```
## Example datasets
```{r}
AirPassengers
```
```{r read_in_data}
gap_wide <- readr::read_csv('https://raw.githubusercontent.com/OHI-Science/data-science-training/master/data/gapminder_wide.csv')
## yesterdays data format
gapminder <- readr::read_csv('https://raw.githubusercontent.com/OHI-Science/data-science-training/master/data/gapminder.csv')
head(gap_wide)
head(gapminder)
```
## `gather()`
Use gather to turn `gap_wide` into a long format dataset
```{r}
head(gap_wide)
gap_long <- gap_wide %>%
gather(key = obstype_year,
value = obs_values)
head(gap_long)
```
```{r}
gap_long <- gap_wide %>%
gather(key = obstype_year,
value = obs_values,
dplyr::starts_with("pop"),
dplyr::starts_with("lifeExp"),
dplyr::starts_with("gdpPercap"))
## The :: indicates that starts_with comes from the dplyr package
head(gap_long)
```
## keep columns with `-` in gather()
```{r}
gap_long <- gap_wide %>%
gather(key = obstype_year,
value = obs_values,
-continent, - country)
head(gap_long)
```
## separate out the year
```{r}
gap_long <- gap_wide %>%
gather(key = obstype_year,
value = obs_values,
-continent, - country) %>%
separate(obstype_year,
into = c("obs_type", "year"),
sep = "_",
convert = T)
head(gap_long)
```
## plot long format data
```{r}
life_df <- gap_long %>%
filter(obs_type == "lifeExp",
continent == "Americas")
head(life_df)
ggplot(data = life_df, aes(x = year, y = obs_values, color = country)) +
geom_line()
```
```{r}
meanLE_DF <- gap_long %>%
filter(obs_type =="lifeExp",
year >= 1982,
year <= 2007) %>%
group_by(continent, year) %>%
summarize(mean = mean(obs_values))
ggplot(data = meanLE_DF, aes(x=year,y=mean,color=continent)) +
geom_line() +
labs(title = "Life E")
```