Categorical DataType #1440
-
In pandas we usually do data.col.astype('category') but in vaex it does not work, so how to change the string column into categorical column? and also It will be nice if you implement astype in dataframe than Series. like data.astype({col1:datatype, col2:datatype ..... and so on}) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
I think data.viz.categorize(col_name) should work right? |
Beta Was this translation helpful? Give feedback.
-
What are you trying to achieve by turning a string column to a category? This is supported in vaex but it might have a different meaning compared to pandas. Since vaex works fully out of core, if your data is on disk (hdf5, arrow), you don't have to worry about memory issues, which in large part is the reason for using categories in pandas. In vaex, turning strings into categories can speed up certain operations (like groupby, binby etc.). In any case, it is done like this:
This turns (encodes) that column as ints, so operations are faster. You can see the mapping dictionary in Also for now, printing out the
We took this approach to avoid breaking changes (for now). I hope this helps. |
Beta Was this translation helpful? Give feedback.
What are you trying to achieve by turning a string column to a category? This is supported in vaex but it might have a different meaning compared to pandas.
Since vaex works fully out of core, if your data is on disk (hdf5, arrow), you don't have to worry about memory issues, which in large part is the reason for using categories in pandas.
In vaex, turning strings into categories can speed up certain operations (like groupby, binby etc.). In any case, it is done like this:
This turns (encodes) that column as ints, so operations are faster. You can see the mapping dictionary in
df._categories
.Also for now, printing out the
df
will give you the encoded…