Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: support Dask Dataframes for larger than memory returns #569

Open
ihs-nick opened this issue Nov 24, 2020 · 0 comments
Open

Feature: support Dask Dataframes for larger than memory returns #569

ihs-nick opened this issue Nov 24, 2020 · 0 comments

Comments

@ihs-nick
Copy link

Would it be possible to not only support a pandas dataframe serialization/deserialization for the return, but also a dask dataframe for when the v3io return is much larger than memory?

I was thinking that perhaps if we are reading chunks of any of these data sources into in memory arrow, then could serialize to parquet (I would prefer pure arrow files, but dask doesn't support that right now), and finally read those into a dask dataframe with read_parquet. Would love more thoughts on this, so we could support larger than memory dataframe operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant