-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Image cast storage faster #6786
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Hi ! Thanks for diving into this, this conversion to python lists is indeed quite slow. Array2DExtensionType and Array3DExtensionType currently rely on pyarrow lists, but we will soon modify them to use FixedShapeTensorArray instead which is more efficient (e.g. doesn't need to store an offset for each value). So ideally it would be cool to speed this code up without using those extension types or it will be blocking to improve Array2DExtensionType and Array3DExtensionType. If I understand correctly you just need the logic from ArrayExtensionArray.to_numpy ? If so feel free to make a separate function and ArrayExtensionArray.to_numpy can call it |
Hey! I didn't have time to look into this but I just stumbled upon another problem. I think actually making the Array3DExtensionType faster would probably resolve both issues as you mentioned. |
No one is working on this atm afaik (and actually we don't have any ETA unfortunately). To do this change I think we need to:
|
Thanks, I have looked into this and have a working solution at least for my specific case. Hopefully, I can create a separate PR with these changes soon. |
Nice, thanks @Modexus ! |
I have run into some issues, notably I don't think I have tried to somehow cast the |
Can we start using FixedShapeTensor or FixedSizeList even if pandas/polars don't support them fully yet ? We would still get the benefit of optimized conversion to numpy |
PR for issue #6782.
Makes
cast_storage
of theImage
class faster by removing the slow call to.pylist
.Instead directly convert each
ListArray
item to eitherArray2DExtensionType
orArray3DExtensionType
.This also preserves the
dtype
removing the warning if the array is alreadyuint8
.