Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider sparse data types #8

Open
DanielFEvans opened this issue Aug 27, 2020 · 0 comments
Open

Consider sparse data types #8

DanielFEvans opened this issue Aug 27, 2020 · 0 comments

Comments

@DanielFEvans
Copy link

DanielFEvans commented Aug 27, 2020

Pandas' sparse data structures are another handy-looking memory saving trick that fits with the theme of dtype_diet. It'd be nice if the tool considered it as an option.

The simple case would be to try a sparse column with NaN as the "omitted" value (or perhaps zero for dtypes that lack NaNs).

To get a bit more complex, Pandas lets you can choose any value, and a slightly better trick might be to use the most common value in the column as the "omitted" value. However, that might result in some silly suggestions. For example, suggesting that a column with values [1, 2, 2, 3] be made sparse by omitting '2' isn't really a great suggestion if '2' is only most common for the particular piece of example data being analysed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant