Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrow version: upcoming changes to pull() behavior #47

Open
schuemie opened this issue Mar 15, 2023 · 3 comments
Open

arrow version: upcoming changes to pull() behavior #47

schuemie opened this issue Mar 15, 2023 · 3 comments

Comments

@schuemie
Copy link
Member

When using pull(), we get this warning:

Warning message:
Default behavior of `pull()` on Arrow data is changing. Current behavior of returning an R vector is deprecated, and in a future release, it will return an Arrow `ChunkedArray`. To control this:
i Specify `as_vector = TRUE` (the current default) or `FALSE` (what it will change to) in `pull()`
i Or, set `options(arrow.pull_as_vector)` globally
This warning is displayed once every 8 hours. 

This warning is annoying, and the advertised new behavior will break many HADES packages when it becomes the default in some future release of arrow.

I don't have a good solution here. Is there an alternative to pull()? Should we set options(arrow.pull_as_vector) in Andromeda's onLoad() function? (Would CRAN allow that?)

@ablack3
Copy link
Collaborator

ablack3 commented Mar 17, 2023

I'm not sure if CRAN allows that or not. I think it's generally not recommended to set options for the user but we could check the option and print a message if it is not set.

An alternative to pull would be to use select then collect

library(Andromeda)

a <- andromeda(cars = cars)
a$cars %>% pull(speed)
#>  [1]  4  4  7  7  8  9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15
#> [26] 15 16 16 17 17 17 18 18 18 18 19 19 19 20 20 20 20 20 22 23 24 24 24 24 25
a$cars %>% select(speed) %>% collect() %>% {.[["speed"]]}
#>  [1]  4  4  7  7  8  9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15
#> [26] 15 16 16 17 17 17 18 18 18 18 19 19 19 20 20 20 20 20 22 23 24 24 24 24 25

Created on 2023-03-17 with reprex v2.0.2

This example is using the current release but it should work with the new release as well.

@ablack3
Copy link
Collaborator

ablack3 commented Mar 20, 2023

I commented on the issue: apache/arrow#32705

I think I'd propose printing a message or warning in Andromeda's onload if the options(arrow.pull_as_vector) is not set.

Alternatively I could provide a function that would have the same behavior as pull (returns vector) and we could switch to that.

Do you think it would be possible or even advantageous to use chucked arrays instead of R vectors? One benefit we have in Andromeda is that

@ablack3
Copy link
Collaborator

ablack3 commented Mar 20, 2023

Another option is to add withr::local_options() at the beginning of functions that use pull on Andromeda tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants