dynamic.Rmd

# Dynamic branching {#dynamic}

```{r, message = FALSE, warning = FALSE, echo = FALSE}
knitr::opts_knit$set(root.dir = fs::dir_create(tempfile()))
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = TRUE)
```

```{r, message = FALSE, warning = FALSE, echo = FALSE}
library(targets)
library(tidyverse)
```

## Branching

Sometimes, a pipeline contains more targets than a user can comfortably type by hand. For projects with hundreds of targets, branching can make the `_targets.R` file more concise and easier to read and maintain. 

`targets` supports two types of branching: dynamic branching and [static branching](#static). Some projects are better suited to dynamic branching, while others benefit more from [static branching](#static) or a combination of both. Some users understand dynamic branching more easily because it avoids metaprogramming, while others prefer [static branching](#static) because `tar_manifest()` and `tar_visnetwork()` provide immediate feedback. Except for the [section on dynamic-within-static branching](static.html#dynamic-within-static-branching), you can read the two chapters on branching in any order (or skip them) depending on your needs.

## About dynamic branching

Dynamic branching is the act of defining new targets (i.e. branches) while the pipeline is running. Prior to launching the pipeline, the user does not necessarily know which branches will spawn or how many branches there will be, and each branch's inputs are determined at the last minute. Relative to [static branching](#static), dynamic branching is better suited to iterating over a larger number of very similar tasks (but can act as an inner layer inside [static branching](#static), as the next chapter demonstrates).

## Patterns

To use dynamic branching, set the `pattern` argument of `tar_target()`. A pattern is a dynamic branching specification expressed in terms of functional programming. The following minimal example explores the mechanics of patterns (and examples of branching in real-world projects are [linked from here](https://docs.ropensci.org/targets/index.html#examples)).

```{r, echo = FALSE, eval = TRUE}
library(targets)
library(tidyverse)
tar_script({
  options(crayon.enabled = FALSE, tidyverse.quiet = TRUE)
  list(
    tar_target(w, c(1, 2)),
    tar_target(x, c(10, 20)),
    tar_target(y, w + x, pattern = map(w, x)),
    tar_target(z, sum(y)),
    tar_target(z2, length(y), pattern = map(y))
  )
})
```

```{r, eval = FALSE}
# _targets.R
library(targets)
library(tidyverse)
list(
  tar_target(w, c(1, 2)),
  tar_target(x, c(10, 20)),
  tar_target(y, w + x, pattern = map(w, x)),
  tar_target(z, sum(y)),
  tar_target(z2, length(y), pattern = map(y))
)
```

```{r, eval = TRUE}
tar_visnetwork()
```

```{r, eval = TRUE}
tar_make()
```

Above, targets `w`, `x`, and `z` are called **stems** because they do not create branches themselves. Target `y` is a **pattern** because it creates its own **branches** like `y_4096b6a8`. A branch is a special kind of target whose dependencies are slices of the targets mentioned in `pattern = map(...)` etc. If we read target `y` into memory, all the branches will load and automatically aggregate as a vector.^[To use list aggregation, write `tar_target(y, w + x, pattern = map(w, x), iteration = "list")` in the pipeline.]

```{r, eval = TRUE}
tar_read(y)
```

Target `z` accepts this entire aggregate of `y` and sums it.

```{r, eval = TRUE}
tar_read(z)
```

Target `z2` maps over `y`, so each each branch of `z2` accepts a branch of `y`.

```{r, eval = TRUE}
tar_read(z2)
```

## Pattern construction

`targets` supports the following pattern types.

* `map()`: iterate over one or more targets in sequence.
* `cross()`: iterate over combinations of slices of targets.
* `slice()`: select individual branches slices by numeric index. For example, `pattern = slice(x, index = c(3, 4))` applies the target's command to the third and fourth slices (or branches) of upstream target `x`.
* `head()`: restrict branching to the first few elements.
* `tail()`: restrict branching to the last few elements.
* `sample()`: restrict branching to a random subset of elements.

These patterns are composable. Below, target `z` creates six branches, one for each combination of `w` and (`x`, `y`) pair. The pattern `cross(w, map(x, y))` is equivalent to `tidyr::crossing(w, tidyr::nesting(x, y))`.

```{r, echo = FALSE, eval = TRUE}
tar_script({
  options(crayon.enabled = FALSE)
  list(
    tar_target(w_comp, seq_len(2)),
    tar_target(x_comp, head(letters, 3)),
    tar_target(y_comp, head(LETTERS, 3)),
    tar_target(
      z_comp,
      data.frame(w = w_comp, x = x_comp, y = y_comp),
      pattern = cross(w_comp, map(x_comp, y_comp))
    )
  )
})
```

```{r, eval = FALSE}
# _targets.R
library(targets)
list(
  tar_target(w_comp, seq_len(2)),
  tar_target(x_comp, head(letters, 3)),
  tar_target(y_comp, head(LETTERS, 3)),
  tar_target(
    z_comp,
    data.frame(w = w_comp, x = x_comp, y = y_comp),
    pattern = cross(w_comp, map(x_comp, y_comp))
  )
)
```

```{r, eval = TRUE}
# R console
tar_make()
```

```{r, eval = TRUE}
# R console
tar_read(z_comp)
```

With `slice()`, you can select pieces of a pattern or upstream target as follows.

```{r, echo = FALSE, eval = TRUE}
tar_script({
  options(crayon.enabled = FALSE)
  list(
    tar_target(a_data, letters),
    tar_target(b_data, LETTERS),
    tar_target(
      c_data,
      paste(a_data, b_data),
      pattern = slice(cross(a_data, b_data), index = seq(2, 3))
    )
  )
})
```

```{r, eval = FALSE}
# _targets.R
library(targets)
list(
  tar_target(a_data, letters),
  tar_target(b_data, LETTERS),
  tar_target(
    c_data,
    paste(a_data, b_data),
    pattern = slice(cross(a_data, b_data), index = seq(2, 3))
  )
)
```

```{r, eval = TRUE}
# R console
tar_make()
```

```{r, eval = TRUE}
# R console
tar_read(c_data)
```

## Branch provenance

The `tar_branches()` function identifies dependency relationships among individual branches. In the example pipeline below, we can find out the branch of `y` that each branch of `z` depends on. 

```{r, echo = FALSE, eval = TRUE}
tar_script({
  options(crayon.enabled = FALSE)
  list(
    tar_target(x, seq_len(3)),
    tar_target(y, x + 1, pattern = map(x)),
    tar_target(z, y + 1, pattern = map(y))
  )
})
```

```{r, eval = FALSE}
# _targets.R
library(targets)
list(
  tar_target(x, seq_len(3)),
  tar_target(y, x + 1, pattern = map(x)),
  tar_target(z, y + 1, pattern = map(y))
)
```

```{r, eval = TRUE}
tar_make()
```

```{r, eval = TRUE}
branches <- tar_branches(z, map(y))
branches
```

```{r, eval = TRUE}
tar_read_raw(branches$y[2])
```

However, `tar_branches()` is not always helpful: for example, if we look at how `y` branches over `x`. `x` does not use dynamic branching, so `tar_branches()` does not return meaningful branch names.

```{r, eval = TRUE}
branches <- tar_branches(y, map(x))
branches
```

```{r, error = TRUE, eval = TRUE}
tar_read_raw(branches$x[2])
```

In situations like this, it is best to proactively write targets that keep track of information about their upstream branches. Data frames and `tibble`s are useful for this.

```{r, echo = FALSE, eval = TRUE}
tar_script({
  library(tibble)
  options(crayon.enabled = FALSE)
  list(
    tar_target(x, seq_len(3)),
    tar_target(y, tibble(x = x, y = x + 1), pattern = map(x))
  )
})
```

```{r, eval = FALSE}
# _targets.R
library(targets)
library(tibble)
list(
  tar_target(x, seq_len(3)),
  tar_target(y, tibble(x = x, y = x + 1), pattern = map(x))
)
```

```{r, eval = TRUE}
tar_make()
```

```{r, eval = TRUE}
tar_read(y)
```

## Testing patterns

To check the correctness of a pattern without running the pipeline, use [`tar_pattern()`](https://docs.ropensci.org/targets/reference/tar_pattern.html). Simply supply the pattern itself and the length of each dependency target. The branch names in the data frames below are made up, but they convey a high-level picture of the branching structure.

```{r, eval = TRUE}
tar_pattern(
  cross(w_comp, map(x_comp, y_comp)),
  w_comp = 2,
  x_comp = 3,
  y_comp = 3
)
```

```{r, eval = TRUE}
tar_pattern(
  head(cross(w_comp, map(x_comp, y_comp)), n = 2),
  w_comp = 2,
  x_comp = 3,
  y_comp = 3
)
```

```{r, eval = TRUE}
tar_pattern(
  cross(w_comp, sample(map(x_comp, y_comp), n = 2)),
  w_comp = 2,
  x_comp = 3,
  y_comp = 3
)
```

## Branching over files

Dynamic branching over files is tricky. A target with `format = "file"` treats the entire set of files as an irreducible bundle. That means in order to branch over files downstream, each file must already have its own branch.

```{r, eval = FALSE}
# _targets.R
library(targets)
list(
  tar_target(paths, c("a.csv", "b.csv")),
  tar_target(files, paths, format = "file", pattern = map(paths)),
  tar_target(data, read_csv(files), pattern = map(files))
)
```

The [`tar_files()`](https://docs.ropensci.org/tarchetypes/reference/tar_files.html) function from the [`tarchetypes`](https://github.com/ropensci/tarchetypes) package is shorthand for the first two targets above.

```{r, eval = FALSE}
# _targets.R
library(targets)
library(tarchetypes)
list(
  tar_files(files, c("a.csv", "b.csv")),
  tar_target(data, read_csv(files), pattern = map(files))
)
```

## Branching over rows

`tarchetypes` >= 0.1.0 has helpers for easy branching over subsets of data frames:

* `tar_group_by()`: define row groups using `dplyr::group_by()` semantics.
* `tar_group_select()`: define row groups using `tidyselect` semantics.
* `tar_group_count()`: define a given number row groups.
* `tar_group_size()`: define row groups of a given size.

If you define a target with one of these functions, all downstream dynamic targets will automatically branch over the row groups.

```{r, echo = FALSE, eval = TRUE}
targets::tar_script({
  produce_data <- function() {
    expand.grid(var1 = c("a", "b"), var2 = c("c", "d"), rep = c(1, 2, 3))
  }
  list(
    tarchetypes::tar_group_by(data, produce_data(), var1, var2),
    tar_target(group, data, pattern = map(data))
  )
})
```

```{r, eval = FALSE}
# _targets.R file:
library(targets)
library(tarchetypes)
produce_data <- function() {
  expand.grid(var1 = c("a", "b"), var2 = c("c", "d"), rep = c(1, 2, 3))
}
list(
  tar_group_by(data, produce_data(), var1, var2),
  tar_target(group, data, pattern = map(data))
)
```

```{r, eval = TRUE}
# R console:
library(targets)
tar_make()

# First row group:
tar_read(group, branches = 1)

# Second row group:
tar_read(group, branches = 2)
```

For more information on how the row grouping mechanism works, see the section on "group" iteration below.

## Iteration

The `iteration` arguments of `tar_target()` and `tar_option_set()` control the slicing of stems and the aggregation of patterns. For example, if you write a stem target with `tar_target(x, fun(), iteration = "list")`, then all downstream targets that branch over `x` will give list-like slices `x[[1]]`, `x[[2]]`, etc. to the individual branches instead of the default vector-like slices `x[1]`, `x[2]`, etc. Likewise, for a pattern defined with `tar_target(y, fun(x), pattern = map(x), iteration = "list")`, `tar_read(y)` will return a list instead of a vector, and a downstream target like `tar_target(y, fun(z))` will treat `z` as a list instead of a vector.^[`iteration` does not control the aggregation of stems because stems are already aggregated, so there is nothing to be done. Likewise, it does not control the splitting of patterns because the `pattern` argument of `tar_target()` already does that.]

### Vector iteration

`targets` uses vector iteration by default, and you can opt into this behavior by setting `iteration = "vector"` in `tar_target()`. In vector iteration, `targets` uses the [`vctrs`](https://vctrs.r-lib.org/) package to split stems and aggregate branches. That means `vctrs::vec_slice()` slices up stems like `x` for mapping, and `vctrs::vec_c()` aggregates patterns like `y` for operations like `tar_read()`.

For atomic vectors like in the example above, this behavior is already intuitive. But if we map over a data frame, each branch will get a row of the data frame due to vector iteration.

```{r, echo = FALSE, eval = TRUE}
tar_script({
  options(crayon.enabled = FALSE, tidyverse.quiet = TRUE)
  print_and_return <- function(x) {
    print(x)
    x
  }
  list(
    tar_target(x, data.frame(a = c(1, 2), b = c("a", "b"))),
    tar_target(y, print_and_return(x), pattern = map(x))
  )
})
```

```{r, eval = FALSE}
library(targets)
print_and_return <- function(x) {
  print(x)
  x
}
list(
  tar_target(x, data.frame(a = c(1, 2), b = c("a", "b"))),
  tar_target(y, print_and_return(x), pattern = map(x))
)
```

```{r, eval = TRUE}
tar_make()
```

And since `y` also has iteration = `"vector"`, the aggregate of `y` is a single data frame of all the rows.

```{r, eval = TRUE}
tar_read(y)
```

### List iteration

List iteration splits and aggregates targets as simple lists. If target `x` has `"list"` iteration, all branches of downstream patterns will get `x[[1]]`, `x[[2]]`, and so on. (`vctrs::vec_slice()` behaves more like `[]` than `[[]]`.)

```{r, echo = FALSE, eval = TRUE}
tar_script({
  options(crayon.enabled = FALSE, tidyverse.quiet = TRUE)
  print_and_return <- function(x) {
    print(x)
    x
  }
  list(
    tar_target(
      x,
      data.frame(a = c(1, 2), b = c("a", "b")),
      iteration = "list"
    ),
    tar_target(y, print_and_return(x), pattern = map(x)),
    tar_target(z, x, pattern = map(x), iteration = "list")
  )
})
```

```{r, eval = FALSE}
# _targets.R
library(targets)
print_and_return <- function(x) {
  print(x)
  x
}
list(
  tar_target(
    x,
    data.frame(a = c(1, 2), b = c("a", "b")),
    iteration = "list"
  ),
  tar_target(y, print_and_return(x), pattern = map(x)),
  tar_target(z, x, pattern = map(x), iteration = "list")
)
```

```{r, eval = TRUE}
tar_make()
```

Aggregation also happens differently. In this case, the vector iteration in `y` is not ideal, and the list iteration in `z` gives us more sensible output. 

```{r, error = TRUE, eval = TRUE}
tar_read(y)
```

```{r, eval = TRUE}
tar_read(z)
```

### Group iteration

Group iteration brings `dplyr::group_by()` functionality to patterns. This way, we can map or cross over custom subsets of rows. Consider the following data frame.

```{r}
object <- data.frame(
  x = seq_len(6),
  id = rep(letters[seq_len(3)], each = 2)
)

object
```

To map over the groups of rows defined by the `id` column, we

1. Use `group_by()` and `tar_group()` to define the groups of rows, and
1. Use `iteration = "group"` in `tar_target()` to tell downstream patterns to use the row groups.

Put together, the pipeline looks like this.

```{r, echo = FALSE, eval = TRUE}
tar_script({
options(crayon.enabled = FALSE, tidyverse.quiet = TRUE)
tar_option_set(packages = "tidyverse")
list(
  tar_target(
    data,
    data.frame(
      x = seq_len(6),
      id = rep(letters[seq_len(3)], each = 2)
    ) %>%
      group_by(id) %>%
      tar_group(),
    iteration = "group"
  ),
  tar_target(
    subsets,
    data,
    pattern = map(data),
    iteration = "list"
  )
)
})
```

```{r, eval = FALSE}
# _targets.R
library(targets)
tar_option_set(packages = "tidyverse")
list(
  tar_target(
    data,
    data.frame(
      x = seq_len(6),
      id = rep(letters[seq_len(3)], each = 2)
    ) %>%
      group_by(id) %>%
      tar_group(),
    iteration = "group"
  ),
  tar_target(
    subsets,
    data,
    pattern = map(data),
    iteration = "list"
  )
)
```

```{r, eval = TRUE}
tar_make()
```

```{r, eval = TRUE}
lapply(tar_read(subsets), as.data.frame)
```

Row groups are defined in the special `tar_group` column created by `tar_group()`. 

```{r, eval = TRUE}
data.frame(
  x = seq_len(6),
  id = rep(letters[seq_len(3)], each = 2)
) %>%
  dplyr::group_by(id) %>%
  tar_group()
```

`tar_group()` creates this column based on the orderings of the grouping variables supplied to `dplyr::group_by()`, not the order of the rows in the data.

```{r, eval = TRUE}
flip_order <- function(x) {
  ordered(x, levels = sort(unique(x), decreasing = TRUE))
}

data.frame(
  x = seq_len(6),
  id = flip_order(rep(letters[seq_len(3)], each = 2))
) %>%
  dplyr::group_by(id) %>%
  tar_group()
```

The ordering in `tar_group` agrees with the ordering shown by `dplyr::group_keys()`.

```{r, eval = TRUE}
data.frame(
  x = seq_len(6),
  id = flip_order(rep(letters[seq_len(3)], each = 2))
) %>%
  dplyr::group_by(id) %>%
  dplyr::group_keys()
```

Branches are arranged in increasing order with respect to the integers in `tar_group`.

```{r, echo = FALSE, eval = TRUE}
tar_script({
options(crayon.enabled = FALSE, tidyverse.quiet = TRUE)
tar_option_set(packages = "tidyverse")

flip_order <- function(x) {
  ordered(x, levels = sort(unique(x), decreasing = TRUE))
}

list(
  tar_target(
    data,
    data.frame(
      x = seq_len(6),
      id = flip_order(rep(letters[seq_len(3)], each = 2))
    ) %>%
      group_by(id) %>%
      tar_group(),
    iteration = "group"
  ),
  tar_target(
    subsets,
    data,
    pattern = map(data),
    iteration = "list"
  )
)
})
```

```{r, eval = FALSE}
# _targets.R
library(targets)
tar_option_set(packages = "tidyverse")

flip_order <- function(x) {
  ordered(x, levels = sort(unique(x), decreasing = TRUE))
}

list(
  tar_target(
    data,
    data.frame(
      x = seq_len(6),
      id = flip_order(rep(letters[seq_len(3)], each = 2))
    ) %>%
      group_by(id) %>%
      tar_group(),
    iteration = "group"
  ),
  tar_target(
    subsets,
    data,
    pattern = map(data),
    iteration = "list"
  )
)
```

```{r, eval = TRUE}
tar_make()
```

```{r, eval = TRUE}
lapply(tar_read(subsets), as.data.frame)
```

## Batching

With dynamic branching, it is super easy to create an enormous number of targets. But when the number of targets starts to exceed a couple hundred, `tar_make()` slows down, and graphs from `tar_visnetwork()` start to become unmanageable. If that happens to you, consider batching your work into a smaller number of targets.

[Targetopia](https://wlandau.github.io/targetopia.html) packages usually have functions that support batching for various use cases. In [`stantargets`](https://wlandau.github.io/stantargets/), [`tar_stan_mcmc_rep_summary()`](https://wlandau.github.io/stantargets/articles/mcmc_rep.html) and friends automatically use batching behind the scenes. The user simply needs to select the number of batches and number of reps per batch. Each batch is a dynamic branch with multiple reps, and each rep fits the user's model once and computes summary statistics. 

In [`tarchetypes`](https://docs.ropensci.org/tarchetypes/), [`tar_rep()`](https://docs.ropensci.org/tarchetypes/reference/tar_rep.html) is a general-use target factory for dynamic branching. It allows you to repeat arbitrary code over multiple reps split into multiple batches. Each batch gets its own reproducible random number seed generated from the target name (as do all targets) and reps run sequentially within each batch, so the results are reproducible. 
The [`targets-stan`](https://github.com/ropensci/targets-stan) repository has an example of batching implemented from scratch. The goal of the pipeline is to validate a Bayesian model by simulating thousands of dataset, analyzing each with a Bayesian model, and assessing the overall accuracy of the inference. Rather than define a target for each dataset in model, the pipeline breaks up the work into batches, where each batch has multiple datasets or multiple analyses. Here is a version of the pipeline with 40 batches and 25 simulation reps per batch (1000 reps total in a pipeline of 82 targets).

```{r, eval = FALSE}
# _targets.R
library(targets)
list(
  tar_target(model_file, compile_model("stan/model.stan"), format = "file"),
  tar_target(index_batch, seq_len(40)),
  tar_target(index_sim, seq_len(25)),
  tar_target(
    data_continuous,
    purrr::map_dfr(index_sim, ~simulate_data_continuous()),
    pattern = map(index_batch)
  ),
  tar_target(
    fit_continuous,
    map_sims(data_continuous, model_file = model_file),
    pattern = map(data_continuous)
  )
)
```