Use `mutate(.keep = "none")` in `transmute()` #483

markfairbanks · 2024-12-10T14:34:44Z

Closes #470

Note: I removed most of the dedicated transmute() tests as they're now covered by mutate() tests

eutwt · 2024-12-25T14:44:11Z

The code change looks good. There's a tradeoff to the change, because if you're doing an immediate transmute after converting to a lazy frame, the data table is copied but currently in main it would have only used .()/list() in j. This makes the keep = "none" translation take more memory (example below). I'm not sure the relative frequency of using transmute ~immediately vs transmute later in a pipeline where your data would have already been copied.

library(bench)
library(data.table)
library(magrittr)
n <- 1e6

dt <- replicate(10, sample(n), simplify = FALSE) %>% 
  setNames(., tail(letters, length(.))) %>% 
  as.data.table()

mark(
  add_then_remove = copy(dt)[, `:=`(double_x = x * 2)][, `:=`(names(dt), NULL)],
  list_in_j = dt[, .(double_x = x * 2)],
)
#> # A tibble: 2 × 6
#>   expression           min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 add_then_remove    6.8ms   8.96ms      91.7    47.6MB     504.
#> 2 list_in_j         2.19ms   2.86ms     315.     15.3MB     137.

^{Created on 2024-12-25 with reprex v2.0.2}

One thought I had is that transmute could be a reframe() which uses vec_recycle_common(..., .size = .N) instead of list(...) in j, but that is a bad option because it would prevent GForce optimizations on the ... expressions

R/step-subset-transmute.R

markfairbanks · 2024-12-26T18:37:43Z

I think we need a re-order to match dplyr output for tibble(x = 1, y = 2) %>% transmute(y, x)

Ah yep I'll make the reorder change.

One thought I had is that transmute could be a reframe() which uses vec_recycle_common(..., .size = .N) instead of list(...) in j, but that is a bad option because it would prevent GForce optimizations on the ... expressions

We could also do something like this to preserve GForce but I think it makes the translation look odd.

library(dtplyr)
library(dplyr)

df <- tibble(x = 1:2, y = 3:4) %>%
  lazy_dt()

df %>%
  reframe(y, mean_x = mean(x), .row_preserve = row_number()) %>%
  select(-.row_preserve)
#> Source: local data table [2 x 2]
#> Call:   `_DT1`[, .(y = y, mean_x = mean(x), .row_preserve = seq_len(.N))][, 
#>     `:=`(".row_preserve", NULL)]
#> 
#>       y mean_x
#>   <int>  <dbl>
#> 1     3    1.5
#> 2     4    1.5
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

markfairbanks · 2024-12-26T18:44:45Z

Actually sorry I completely did the column order thing wrong 😅

Let me take another pass at it.

markfairbanks · 2024-12-26T19:04:48Z

Ok good to review now

R/step-subset-transmute.R

eutwt

Looks good, thanks! This will be good to have. Apologies for the delayed review

markfairbanks added 6 commits December 10, 2024 07:31

Use mutate(.keep = "none")

729f545

Use a snapshot

34f4284

Remove most tests as they're now covered by mutate() tests

65470de

Remove transmute() test from wrong file

61b0261

Remove transmute() test

15c1af8

News bullet

1baf2f4

markfairbanks requested a review from eutwt December 10, 2024 14:42

eutwt reviewed Dec 25, 2024

View reviewed changes

R/step-subset-transmute.R Outdated Show resolved Hide resolved

markfairbanks added 2 commits December 26, 2024 11:41

Preserve column order

8bb7056

Add test about column order

ec1dd71

markfairbanks requested a review from eutwt December 26, 2024 18:43

markfairbanks added 2 commits December 26, 2024 12:04

Get correct column order

bb9a46b

Fix test

dfd6772

eutwt reviewed Jan 2, 2025

View reviewed changes

R/step-subset-transmute.R Outdated Show resolved Hide resolved

markfairbanks added 2 commits January 20, 2025 18:50

Use any_of()

a435561

Add test

fcdd5cd

markfairbanks requested a review from eutwt January 21, 2025 01:52

eutwt approved these changes Jan 24, 2025

View reviewed changes

markfairbanks merged commit 75310e3 into main Jan 24, 2025
13 checks passed

markfairbanks deleted the simplify-transmute branch January 24, 2025 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `mutate(.keep = "none")` in `transmute()` #483

Use `mutate(.keep = "none")` in `transmute()` #483

markfairbanks commented Dec 10, 2024 •

edited

Loading

eutwt commented Dec 25, 2024 •

edited

Loading

markfairbanks commented Dec 26, 2024

markfairbanks commented Dec 26, 2024

markfairbanks commented Dec 26, 2024

eutwt left a comment

Use mutate(.keep = "none") in transmute() #483

Use mutate(.keep = "none") in transmute() #483

Conversation

markfairbanks commented Dec 10, 2024 • edited Loading

eutwt commented Dec 25, 2024 • edited Loading

markfairbanks commented Dec 26, 2024

markfairbanks commented Dec 26, 2024

markfairbanks commented Dec 26, 2024

eutwt left a comment

Choose a reason for hiding this comment

Use `mutate(.keep = "none")` in `transmute()` #483

Use `mutate(.keep = "none")` in `transmute()` #483

markfairbanks commented Dec 10, 2024 •

edited

Loading

eutwt commented Dec 25, 2024 •

edited

Loading