Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore data frames generically #943

Open
hadley opened this issue Apr 24, 2020 · 3 comments
Open

Restore data frames generically #943

hadley opened this issue Apr 24, 2020 · 3 comments
Labels
feature a feature request or enhancement

Comments

@hadley
Copy link
Member

hadley commented Apr 24, 2020

See existing work in #812, and see below for a list of functions that we needed to consider, and some thoughts on what form of genericity is needed. Goal is to make sure that data frame extensions return reasonable results in the absence of specific methods (and to make sure all needed functions are generic so that they can be extended when needed).

chop, unchop
pack, unpack
nest, unnest

separate, extract = append_df
hoist = append_df

complete = full_join + replace_na
drop_na = dplyr_row_slice
separate_rows = str_split + unchop
uncount = dplyr_row_slice + optional column removal
replace_na = dplyr_col_modify
expand = dplyr_reconstruct

pivot_longer = dplyr_reconstruct
pivot_wider = dplyr_reconstruct

# don't need to update superseded functions
gather, spread 
nest_legacy, unnest_legacy
@hadley hadley added the feature a feature request or enhancement label Apr 24, 2020
@DavisVaughan
Copy link
Member

DavisVaughan commented Nov 12, 2021

Need to consider the sticky column case, like panelr.

Ideally we'd be like dplyr, and just forcibly make the assumption that [ with 1 argument i is going to return a data frame with length length(i).

I have a feeling that we are going to have to say: if you have sticky columns and a sticky [ method, you'll need to implement an S3 method for this generic specific to your package. Otherwise it should just work.

That would break packages like this (with sticky cols) until they add a method for these operations. But it isn't like it worked right to begin with.

library(tidyr)
library(panelr)

data("WageData")
wages <- panel_data(WageData, id = id, wave = t)

wages
#> # Panel data:    4,165 × 14
#> # entities:      id [595]
#> # wave variable: t [1, 2, 3, ... (7 waves)]
#>    id        t   exp   wks   occ   ind south  smsa    ms   fem union    ed   blk
#>    <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 1         1     3    32     0     0     1     0     1     0     0     9     0
#>  2 1         2     4    43     0     0     1     0     1     0     0     9     0
#>  3 1         3     5    40     0     0     1     0     1     0     0     9     0
#>  4 1         4     6    39     0     0     1     0     1     0     0     9     0
#>  5 1         5     7    42     0     1     1     0     1     0     0     9     0
#>  6 1         6     8    35     0     1     1     0     1     0     0     9     0
#>  7 1         7     9    32     0     1     1     0     1     0     0     9     0
#>  8 2         1    30    34     1     0     0     0     1     0     0    11     0
#>  9 2         2    31    27     1     0     0     0     1     0     0    11     0
#> 10 2         3    32    33     1     1     0     0     1     0     1    11     0
#> # … with 4,155 more rows, and 1 more variable: lwage <dbl>

# Sticky cols
wages <- wages["exp"]
wages
#> # Panel data:    4,165 × 3
#> # entities:      id [595]
#> # wave variable: t [1, 2, 3, ... (7 waves)]
#>    id        t   exp
#>    <fct> <dbl> <dbl>
#>  1 1         1     3
#>  2 1         2     4
#>  3 1         3     5
#>  4 1         4     6
#>  5 1         5     7
#>  6 1         6     8
#>  7 1         7     9
#>  8 2         1    30
#>  9 2         2    31
#> 10 2         3    32
#> # … with 4,155 more rows

# Meaning they come along for the ride here
chop(wages, exp)
#> New names:
#> * id -> id...1
#> * t -> t...2
#> * id -> id...3
#> * t -> t...4
#> # A tibble: 4,165 × 5
#>    id...1 t...2      id...3       t...4         exp
#>    <fct>  <dbl> <list<fct>> <list<dbl>> <list<dbl>>
#>  1 1          1         [1]         [1]         [1]
#>  2 1          2         [1]         [1]         [1]
#>  3 1          3         [1]         [1]         [1]
#>  4 1          4         [1]         [1]         [1]
#>  5 1          5         [1]         [1]         [1]
#>  6 1          6         [1]         [1]         [1]
#>  7 1          7         [1]         [1]         [1]
#>  8 2          1         [1]         [1]         [1]
#>  9 2          2         [1]         [1]         [1]
#> 10 2          3         [1]         [1]         [1]
#> # … with 4,155 more rows

# Genericity doesn't realllly work right
# In theory this should be a panel data frame, but reconstruct_tibble()
# took over since it inherits from grouped_df
tidyr::pack(wages, data = exp)
#> # A tibble: 4,165 × 3
#> # Groups:   id [595]
#>    id        t data$id    $t  $exp
#>    <fct> <dbl> <fct>   <dbl> <dbl>
#>  1 1         1 1           1     3
#>  2 1         2 1           2     4
#>  3 1         3 1           3     5
#>  4 1         4 1           4     6
#>  5 1         5 1           5     7
#>  6 1         6 1           6     8
#>  7 1         7 1           7     9
#>  8 2         1 2           1    30
#>  9 2         2 2           2    31
#> 10 2         3 2           3    32
#> # … with 4,155 more rows

Created on 2021-11-12 by the reprex package (v2.0.1)

@hadley
Copy link
Member Author

hadley commented Nov 12, 2021

Let's kick this down the road again.

@DavisVaughan
Copy link
Member

See #1556 for an example. reconstruct_tibble() drops the class through as_tibble(), which is currently the expected behavior.

iris |> 
  dplyr::as_tibble() |> 
  structure(class = c("pop_data", "tbl_df", "tbl", "data.frame")) |>  
  tidyr::drop_na()  |>   
  class()
#> [1] "tbl_df"     "tbl"        "data.frame"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants