Skip to content

Commit

Permalink
feat(benchmark): Swap to bench::mark() to capture memory also
Browse files Browse the repository at this point in the history
  • Loading branch information
RasmusSkytte committed Apr 18, 2024
1 parent 2e4c899 commit 9c88e9e
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 17 deletions.
10 changes: 5 additions & 5 deletions .github/workflows/benchmark.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ jobs:

- name: Delete previous benchmark files
if: always()
run: rm -rf inst/extdata/benchmark-*.rds
run: rm -rf inst/extdata/benchmark*.rds



Expand Down Expand Up @@ -156,8 +156,7 @@ jobs:
benchmarks <- benchmark_files |>
purrr::map(readRDS) |>
purrr::map(tibble::as_tibble) |>
purrr::reduce(rbind)
purrr::list_rbind()
benchmarks <- benchmarks |>
dplyr::mutate(
Expand All @@ -180,7 +179,8 @@ jobs:
dplyr::pull("database")
benchmarks <- benchmarks |>
dplyr::mutate("database" = paste0(database, ifelse(database %in% slow_backends, "*", "")))
dplyr::mutate("database" = paste0(database, ifelse(database %in% slow_backends, "*", ""))) |>
tidyr::unnest(c(time, gc))
# Mean and standard deviation (see ggplot2::mean_se())
Expand All @@ -192,7 +192,7 @@ jobs:
g <- ggplot2::ggplot(
benchmarks,
ggplot2::aes(x = version, y = time / 1e9)
ggplot2::aes(x = version, y = time)
) +
ggplot2::stat_summary(fun.data = mean_sd, geom = "pointrange", size = 0.5, linewidth = 1) +
ggplot2::facet_grid(rows = ggplot2::vars(benchmark_function), cols = ggplot2::vars(database)) +
Expand Down
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -39,14 +39,14 @@ Imports:
tidyselect,
utils
Suggests:
bench,
callr,
conflicted,
duckdb,
here,
jsonlite,
knitr,
lintr,
microbenchmark,
odbc,
rmarkdown,
roxygen2,
Expand Down
4 changes: 2 additions & 2 deletions data-raw/benchmark.R
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ if (identical(Sys.getenv("CI"), "true") && identical(Sys.getenv("BACKEND"), ""))
}

# Construct the list of benchmarks
update_snapshot_benchmark <- microbenchmark::microbenchmark(scdb_updates(conn, data_on_conn), times = 25) |>
update_snapshot_benchmark <- bench::mark(scdb_updates(conn, data_on_conn), iterations = 25) |>
dplyr::mutate(
"benchmark_function" = "update_snapshot()",
"database" = names(conns)[[1]],
Expand All @@ -145,7 +145,7 @@ if (identical(Sys.getenv("CI"), "true") && identical(Sys.getenv("BACKEND"), ""))
}

# Construct the list of benchmarks
update_snapshot_benchmark <- microbenchmark::microbenchmark(scdb_updates(conn, data), times = 5) |>
update_snapshot_benchmark <- bench::mark(scdb_updates(conn, data), iterations = 5) |>
dplyr::mutate(
"benchmark_function" = "update_snapshot() - complexity",
"database" = names(conns)[[1]],
Expand Down
19 changes: 10 additions & 9 deletions vignettes/benchmarks.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ This data forms the basis for three "snapshots" used in the benchmarks:
The benchmark function uses three consecutive calls to `update_snapshot()` to create the table with first snapshot and
then update it to the second and third snapshot. Finally, the table is deleted.

The performance of this benchmark function is timed with the `microbenchmark` package using 10 replicates.
The performance of this benchmark function is timed with the `bench` package using 10 replicates.
All benchmarks are run on the same machine.

The results of the benchmark are shown graphically below (mean and standard deviation), where we compare the current
Expand All @@ -45,7 +45,8 @@ benchmark_location <- c(
purrr::discard(~ identical(., "")) |>
purrr::pluck(1)
benchmarks <- readRDS(benchmark_location)
benchmarks <- readRDS(benchmark_location) |>
tidyr::unnest(c(time, gc))
# Determine if the SHA is on main
sha <- benchmarks |>
Expand Down Expand Up @@ -82,7 +83,7 @@ mean_sd <- function(x) {
```{r benchmark_1, echo = FALSE, eval = requireNamespace("here")}
# Use data for benchmark 1
benchmark_1 <- benchmarks |>
dplyr::filter(!stringr::str_ends(.data$benchmark_function, "complexity"))
dplyr::filter(!stringr::str_ends(.data$benchmark_function, stringr::fixed("complexity")))
# Add note slow backends
slow_backends <- benchmark_1 |>
Expand All @@ -96,7 +97,7 @@ benchmark_1 <- benchmark_1 |>
g <- ggplot2::ggplot(
benchmark_1,
ggplot2::aes(x = version, y = time / 1e9)
ggplot2::aes(x = version, y = time)
) +
ggplot2::stat_summary(fun.data = mean_sd, geom = "pointrange", size = 0.5, linewidth = 1) +
ggplot2::facet_grid(rows = ggplot2::vars(benchmark_function), cols = ggplot2::vars(database)) +
Expand All @@ -111,27 +112,27 @@ g


We include another benchmark to highlight the complexity scaling of the `update_snapshot() ` with the size of the input
data. The datasets are similar to the first benchmark is used, but the number of repeats is varied to see the impact of
data. The data sets are similar to the first benchmark is used, but the number of repeats is varied to see the impact of
increasing data size. The benchmarks are run from a "clean" state, where the target_table does not exists. The benchmark
measures both the time to create the table and to remove it again afterwards (to restore the clean state).

The performance of this benchmark function is timed with the `microbenchmark` package using 5 replicates.
The performance of this benchmark function is timed with the `bench` package using 5 replicates.
All benchmarks are run on the same machine.

The results of the benchmark are shown graphically below (mean and standard deviation), where we compare the current
development version of `SCDB` with the current CRAN version.

NOTE: There are reports of a superlinear complexity for very large data sets. If you experience such problems, consider
NOTE: There are reports of a super-linear complexity for very large data sets. If you experience such problems, consider
batching the updates via the `filters` argument.

```{r benchmark_2, echo = FALSE, eval = requireNamespace("here")}
# Use data for benchmark 2
benchmark_2 <- benchmarks |>
dplyr::filter(stringr::str_ends(.data$benchmark_function, "complexity"))
dplyr::filter(stringr::str_ends(.data$benchmark_function, stringr::fixed("complexity")))
ggplot2::ggplot(
benchmark_2,
ggplot2::aes(x = n * nrow(iris) / 1e3, y = time / 1e9, color = version)
ggplot2::aes(x = n * nrow(iris) / 1e3, y = time, color = version)
) +
ggplot2::stat_summary(fun.data = mean_sd, geom = "pointrange", size = 0.5, linewidth = 1) +
ggplot2::facet_grid(rows = ggplot2::vars(benchmark_function), cols = ggplot2::vars(database)) +
Expand Down

0 comments on commit 9c88e9e

Please sign in to comment.