You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 1, 2024. It is now read-only.
Creating a new RowVector seems necessary, since assigning column to DataFrame may change the children column type. One idea would be allowing the wrapped RowColumn to change the delegated RowVector (e.g. something like self._data._reset_data(new_delegate)) . -- Basically DataFrame is a thin wrapper and everything is in RowColumn.
For this to work, DataFrame.dtype should always use the underlying Velox Vector's type as groundtruth.
The text was updated successfully, but these errors were encountered:
To reproduce:
Now df looksl like:
Try to change
df["dense_features"]["int_1"]
(and failed):For now, the work around is to first get the nested DF out, apply the transformation, and then put it back:
https://github.com/facebookresearch/torcharrow/blob/6d2bca82e65f74193360bd06c5ab4f8c761c5342/torcharrow/test/integration/test_criteo.py#L149-L157
The problem is
DataFrameCpu._set_field_data
generates a newRowVector
and copy the column vector pointer -- for a nested RowVector, it only updates the leaf level struct but doesn't propagate upwards: https://github.com/facebookresearch/torcharrow/blob/6d2bca82e65f74193360bd06c5ab4f8c761c5342/torcharrow/velox_rt/dataframe_cpu.py#L310-L329Creating a new
RowVector
seems necessary, since assigning column to DataFrame may change the children column type. One idea would be allowing the wrappedRowColumn
to change the delegatedRowVector
(e.g. something likeself._data._reset_data(new_delegate)
) . -- BasicallyDataFrame
is a thin wrapper and everything is inRowColumn
.For this to work,
DataFrame.dtype
should always use the underlying Velox Vector's type as groundtruth.The text was updated successfully, but these errors were encountered: