-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path04_nested_dataframes.Rmd
116 lines (89 loc) · 3.84 KB
/
04_nested_dataframes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
title: "Nesting data frames"
author: "John Little"
date: "`r Sys.Date()`"
output: html_notebook
---
One way to leverage what we've learned in previous lessons (functions, purrr/map, regex, ggplot2, dplyr) is to use the `nest` function. Nesting will create group-specific tibbles upon which which you can map more functional analysis.
```{r}
library(tidyverse)
```
## `group_by` and nest
While `group_by` is not specifically required (see next code-chunk) to `nest`, I find that it helps me when I'm trying to leverage ggplot functions as you'll see below.
```{r}
starwars |>
group_by(gender) |>
nest()
```
### `nest` without group_by
```{r}
starwars |>
nest(-gender)
```
## map a custom function over a nested group
Building on lesson three where we created custom functions, we can map those functions onto our embedded tibbles
```{r}
make_me_a_plot <- function(df) {
df %>%
ggplot(aes(height, mass)) +
geom_point()
}
starwars |>
nest(data = -gender) |>
mutate(my_plot = map(data, make_me_a_plot)) |>
pull(my_plot)
```
Adding titles to the plots requires the `group_by()` function
```{r, warning=TRUE}
starwars |>
group_by(gender) |>
nest(my_df = -gender) |>
mutate(my_plot = map(my_df, make_me_a_plot)) |>
mutate(my_plot_with_title = map(my_plot, ~ .x + ggtitle(gender))) |>
pull(my_plot_with_title) # This can be executed more elegantly with map2(). See Below.
```
### Models and mapping other functions
As with other functions, you can map linear regressions -- `lm()` -- and other functions. Once the model is embedded as a list within a data frame, you can map the `broom::tidy()` function onto the model. Then `unnest()` the tidied _model_ and manipulate the data with standard dplyr verbs.
```{r}
starwars |>
group_by(sex) |>
nest(-sex) |>
mutate(my_plot = map(data, make_me_a_plot)) |>
mutate(my_plott = map(my_plot, ~ .x + labs(title = sex))) |>
mutate(my_model = map(data, ~ lm(mass ~ height, data = .x))) |>
mutate(my_tidy_model = map(my_model, broom::tidy)) #|>
# unnest(my_tidy_model) |>
# filter(term != "(Intercept)") |>
# filter(p.value < 0.05) |>
# mutate(my_reg_plot = map(data, ~ .x |>
# ggplot(aes(height, mass)) +
# geom_smooth(method = lm, se = FALSE) +
# geom_point())) |>
# pull(my_reg_plot)
```
## `map2()` - Iterating over more than a single parallel vector
In the section about mapping custom functions, I use the `group_by()`, `nest()` and `map()` functions to generate plots with titles. the `.x` pronoun refers to the object I want to iterate over. That code works fine even if it's a little verbose. It also demonstrates the value of additional {purrr} functions. `map2()` iterates over two vectors in parallel. Similarly, `pmap()` iterates over a list of multiple vectors.
In the case of `map2()` the two vectors of a parent data-frame can be referred to by the pronouns `.x` and `.y`. So, one vector is the title of the plot (`.y`), the other vector (`.x`) is the list-column tibble that contains the x-axis and y-axis vectors generating a scatter plot for each nested tibble.
Essentially, coding with `map2()` creates the scatter plots with a bit less code
```{r}
starwars |>
nest(my_df = -gender) |>
mutate(my_plot = map2(my_df, gender, ~ .x |>
ggplot(aes(height, mass)) +
geom_point() +
ggtitle(.y))) |>
pull(my_plot)
```
Of course, the anonymous function can be written as a named function, making the code even more readable
```{r}
splot_title <- function(my_nested_df, my_title) {
my_nested_df |>
ggplot(aes(height, mass)) +
geom_point() +
ggtitle(my_title)
}
starwars |>
nest(my_df = -gender) |>
mutate(my_plot = map2(my_df, gender, splot_title)) |>
pull(my_plot)
```