forked from ropensci-books/targets
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathstatic.Rmd
322 lines (279 loc) · 11 KB
/
static.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
# Static branching {#static}
```{r, message = FALSE, warning = FALSE, echo = FALSE}
knitr::opts_knit$set(root.dir = fs::dir_create(tempfile()))
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = TRUE)
```
```{r, message = FALSE, warning = FALSE, echo = FALSE, eval = TRUE}
library(targets)
library(tarchetypes)
library(tidyverse)
```
## Branching
Sometimes, a pipeline contains more targets than a user can comfortably type by hand. For projects with hundreds of targets, branching can make the `_targets.R` file more concise and easier to read and maintain.
`targets` supports two types of branching: dynamic branching and [static branching](#static). Some projects are better suited to dynamic branching, while others benefit more from [static branching](#static) or a combination of both. Some users understand dynamic branching more easily because it avoids metaprogramming, while others prefer [static branching](#static) because `tar_manifest()` and `tar_visnetwork()` provide immediate feedback. Except for the [section on dynamic-within-static branching](static.html#dynamic-within-static-branching), you can read the two chapters on branching in any order (or skip them) depending on your needs.
## When to use static branching
Static branching is the act of defining a group of targets in bulk before the pipeline starts. Whereas dynamic branching uses last-minute dependency data to define the branches, static branching uses metaprogramming to modify the code of the pipeline up front. Whereas dynamic branching excels at creating a large number of very similar targets, static branching is most useful for smaller number of heterogeneous targets. Some users find it more convenient because they can use `tar_manifest()` and `tar_visnetwork()` to check the correctness of static branching before launching the pipeline.
## Map
[`tar_map()`](https://docs.ropensci.org/tarchetypes/reference/tar_map.html) from the [`tarchetypes`](https://github.com/ropensci/tarchetypes) package creates copies of existing target objects, where each new command is a variation on the original. In the example below, we have a data analysis workflow that iterates over datasets and analysis methods. The `values` data frame has the operational parameters of each data analysis, and `tar_map()` creates one new target per row.
```{r, echo = TRUE, eval = FALSE}
# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
values <- tibble(
method_function = rlang::syms(c("method1", "method2")),
data_source = c("NIH", "NIAID")
)
targets <- tar_map(
values = values,
tar_target(analysis, method_function(data_source, reps = 10)),
tar_target(summary, summarize_analysis(analysis, data_source))
)
list(targets)
```
```{r, echo = FALSE, eval = TRUE}
tar_script({
library(targets)
library(tarchetypes)
library(tibble)
values <- tibble(
method_function = rlang::syms(c("method1", "method2")),
data_source = c("NIH", "NIAID")
)
targets <- tar_map(
values = values,
tar_target(analysis, method_function(data_source, reps = 10)),
tar_target(summary, summarize_analysis(analysis, data_source))
)
list(targets)
})
```
```{r, paged.print = FALSE, eval = TRUE}
tar_manifest()
```
```{r, eval = TRUE}
tar_visnetwork(targets_only = TRUE)
```
For shorter target names, use the `names` argument of `tar_map()`. And for more combinations of settings, use `tidyr::expand_grid()` on `values`.
```{r, eval = FALSE, echo = TRUE}
# _targets.R file:
library(targets)
library(tarchetypes)
library(tidyr)
values <- expand_grid( # Use all possible combinations of input settings.
method_function = rlang::syms(c("method1", "method2")),
data_source = c("NIH", "NIAID")
)
targets <- tar_map(
values = values,
names = "data_source", # Select columns from `values` for target names.
tar_target(analysis, method_function(data_source, reps = 10)),
tar_target(summary, summarize_analysis(analysis, data_source))
)
list(targets)
```
```{r, eval = TRUE, echo = FALSE}
tar_script({
library(targets)
library(tarchetypes)
library(tidyr)
values <- expand_grid(
method_function = rlang::syms(c("method1", "method2")),
data_source = c("NIH", "NIAID")
)
targets <- tar_map(
values = values,
names = "data_source",
tar_target(analysis, method_function(data_source, reps = 10)),
tar_target(summary, summarize_analysis(analysis, data_source))
)
list(targets)
})
```
```{r, paged.print = FALSE, eval = TRUE}
tar_manifest()
```
```{r, eval = TRUE}
# You may need to zoom out on this interactive graph to see all 8 targets.
tar_visnetwork(targets_only = TRUE)
```
## Dynamic-within-static branching
You can even combine together static and dynamic branching. The static `tar_map()` is an excellent outer layer on top of targets with patterns. The following is a sketch of a pipeline that runs each of two data analysis methods 10 times, once per random seed. Static branching iterates over the method functions, while dynamic branching iterates over the seeds. `tar_map()` creates new patterns as well as new commands. So below, the summary methods map over the analysis methods both statically and dynamically.
```{r, eval = FALSE, echo = TRUE}
# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
random_seed_target <- tar_target(random_seed, seq_len(10))
targets <- tar_map(
values = tibble(method_function = rlang::syms(c("method1", "method2"))),
tar_target(
analysis,
method_function("NIH", seed = random_seed),
pattern = map(random_seed)
),
tar_target(
summary,
summarize_analysis(analysis),
pattern = map(analysis)
)
)
list(random_seed_target, targets)
```
```{r, echo = FALSE, eval = TRUE}
tar_script({
library(targets)
library(tarchetypes)
library(tibble)
random_seed_target <- tar_target(random_seed, seq_len(10))
targets <- tar_map(
values = tibble(method_function = rlang::syms(c("method1", "method2"))),
tar_target(
analysis,
method_function("NIH", seed = random_seed),
pattern = map(random_seed)
),
tar_target(
summary,
summarize_analysis(analysis),
pattern = map(analysis)
)
)
list(random_seed_target, targets)
})
```
```{r, eval = TRUE, paged.print = FALSE}
tar_manifest()
```
```{r, eval = TRUE, paged.print = FALSE}
tar_visnetwork(targets_only = TRUE)
```
## Combine
[`tar_combine()`](https://docs.ropensci.org/tarchetypes/reference/tar_combine.html) from the [`tarchetypes`](https://github.com/ropensci/tarchetypes) package creates a new target to aggregate the results of upstream targets. In the simple example below, our combined target simply aggregates the rows returned from two other targets.
```{r, eval = FALSE, echo = TRUE}
# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
options(crayon.enabled = FALSE)
target1 <- tar_target(head, head(mtcars, 1))
target2 <- tar_target(tail, tail(mtcars, 1))
target3 <- tar_combine(combined_target, target1, target2)
list(target1, target2, target3)
```
```{r, echo = FALSE, eval = TRUE}
tar_script({
library(targets)
library(tarchetypes)
library(tibble)
options(crayon.enabled = FALSE)
target1 <- tar_target(head_mtcars, head(mtcars, 1))
target2 <- tar_target(tail_mtcars, tail(mtcars, 1))
target3 <- tar_combine(combined_target, target1, target2)
list(target1, target2, target3)
})
```
```{r, eval = TRUE}
tar_manifest()
```
```{r, eval = TRUE}
tar_visnetwork(targets_only = TRUE)
```
```{r, eval = TRUE}
tar_make()
```
```{r, eval = TRUE}
tar_read(combined_target)
```
To use `tar_combine()` and `tar_map()` together in more complicated situations, you may need to supply `unlist = FALSE` to `tar_map()`. That way, `tar_map()` will return a nested list of target objects, and you can combine the ones you want. The pipeline extends our previous `tar_map()` example by combining just the summaries, omitting the analyses from `tar_combine()`. Also note the use of `bind_rows(!!!.x)` below. This is how you supply custom code to combine the return values of other targets. `.x` is a placeholder for the return values, and `!!!` is the "unquote-splice" operator from the `rlang` package.
```{r, eval = FALSE, echo = TRUE}
# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
random_seed <- tar_target(random_seed, seq_len(10))
mapped <- tar_map(
unlist = FALSE, # Return a nested list from tar_map()
values = tibble(method_function = rlang::syms(c("method1", "method2"))),
tar_target(
analysis,
method_function("NIH", seed = random_seed),
pattern = map(random_seed)
),
tar_target(
summary,
summarize_analysis(analysis),
pattern = map(analysis)
)
)
combined <- tar_combine(
combined_summaries,
mapped[[2]],
command = dplyr::bind_rows(!!!.x, .id = "method")
)
list(random_seed, mapped, combined)
```
```{r, echo = FALSE, eval = TRUE}
tar_script({
library(targets)
library(tarchetypes)
library(tibble)
random_seed <- tar_target(random_seed, seq_len(10))
mapped <- tar_map(
unlist = FALSE, # Return a nested list from tar_map()
values = tibble(method_function = rlang::syms(c("method1", "method2"))),
tar_target(
analysis,
method_function("NIH", seed = random_seed),
pattern = map(random_seed)
),
tar_target(
summary,
summarize_analysis(analysis),
pattern = map(analysis)
)
)
combined <- tar_combine(
combined_summaries,
mapped[[2]],
command = dplyr::bind_rows(!!!.x, .id = "method")
)
list(random_seed, mapped, combined)
})
```
```{r, paged.print = FALSE, eval = TRUE}
tar_manifest()
```
```{r, eval = TRUE}
tar_visnetwork(targets_only = TRUE)
```
## Metaprogramming
Custom metaprogramming is a more flexible alternative to [`tar_map()`](https://docs.ropensci.org/tarchetypes/reference/tar_map.html) and [`tar_combine()`](https://docs.ropensci.org/tarchetypes/reference/tar_combine.html). [`tar_eval()`](https://docs.ropensci.org/tarchetypes/reference/tar_eval.html) from [`tarchetypes`](https://github.com/ropensci/tarchetypes) accepts an arbitrary expression and iteratively plugs in symbols. Below, we use it to branch over datasets.
```{r, eval = FALSE, echo = TRUE}
# _targets.R
library(rlang)
library(targets)
library(tarchetypes)
string <- c("gapminder", "who", "imf")
symbol <- syms(string)
tar_eval(
tar_target(symbol, get_data(string)),
values = list(string = string, symbol = symbol)
)
```
```{r, echo = FALSE, eval = TRUE}
tar_script({
library(rlang)
library(tarchetypes)
string <- c("gapminder", "who", "imf")
symbol <- syms(string)
tar_eval(
tar_target(symbol, get_data(string)),
values = list(string = string, symbol = symbol)
)
})
```
[`tar_eval()`](https://docs.ropensci.org/tarchetypes/reference/tar_eval.html) has fewer guardrails than [`tar_map()`](https://docs.ropensci.org/tarchetypes/reference/tar_map.html) or [`tar_combine()`](https://docs.ropensci.org/tarchetypes/reference/tar_combine.html), so [`tar_manifest()`](https://docs.ropensci.org/targets/reference/tar_manifest.html) is especially important for checking the correctness of your metaprogramming.
```{r, eval = TRUE}
tar_manifest(fields = command)
```