Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image -> Arrow support #8330

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Image -> Arrow support #8330

wants to merge 17 commits into from

Conversation

wiredfool
Copy link
Member

@wiredfool wiredfool commented Aug 25, 2024

Partial implementation of #8329.

This is for discussion -- there are a number of issues.

Changes proposed in this pull request:

  • Implements basic __arrow_c_array__ and __arrow_c_schema__ support for reading Pillow images in Arrow capable libraries.
  • Implements basic arrow array exporting object -> Pillow image. This supports reading arrays in exactly the same formats that we export from Pillow.

Issues:

  • Currently fails on large images.
  • There's extraneous stuff in Arrow.c that should be cleaned up.
  • Some changes in Storage.c aren't required.
  • Unclear if we're exporting the most useful version of the data.

Todo:

  • Alternate binary installable arrow lib for testing, or loopback only testing on troublesome plaftorms if we can't find one that works.
  • Fix Windows Segfault. I don't have a windows env here, so either I need to figure out a CI based version or I'll need an assist. @nulano?
  • Python accessible version (r/w, likely) of the schema capsule and possibly array metadata. Partially for testing, partially for sanity checking at the python layer before passing into the C portion.
  • Fix the UNDONEs in the code, especially around error handling.
    • make sure we're raising an error if we don't accept the array format.
  • Large image support or error out.
  • Wire up enforcement of c level read_only flag

For Review:

  • What CVE is in here that I'm not seeing?
  • There's some pretty serious mucking with the object lifetimes, specifically there's now effectively a refcount on the core C image, so it's lifetime is not limited to the python object's lifetime. I think this is required, but it needs review.
  • I'm not 100% on how so much of the checking is in Storage.c, vs somewhere closer to the surface.

src/PIL/Image.py Outdated Show resolved Hide resolved
* Fixed format, only for 4 channel images
* structs can't be encoded this way, they need to have one child array
per struct member.
* i.e. struct of arrays, rather than an array of structs.
* basic safety included
* only respecting the types we emit
@wiredfool wiredfool force-pushed the arrow branch 3 times, most recently from c10beff to 97eb7c0 Compare January 21, 2025 21:41
@wiredfool
Copy link
Member Author

Added basic fromarrow support, covering exactly the formats that we emit, and testing using round trip support.

Windows is failing, so that's something that's going to need digging. And somehow codecov is failing before most of the runs have gone, so not sure what's up there.

@wiredfool
Copy link
Member Author

Windows AMD64 is failing on pillow arrow tests as well as the python version windows matrix:

Tests/test_arrow.py::test_to_array[L-dtype0-None] DEBUG:   25+ if ( >>>> !$?) { exit $LASTEXITCODE }

Windows x86, Windows pypy, manylinux wheels and MacOS Arm64 and 10.15 X86_64 are failing on pyarrow install/compilation, instead of installing the prebuilt wheel. (I'm not sure why MacOS Arm64 is failing, as that's my dev platform). We're going to have to look at either compile time prerequisites, a lighter library with lots of binaries, or potentially skipping tests if pyarrow doesn't compile.

src/PIL/Image.py Outdated Show resolved Hide resolved
src/PIL/Image.py Outdated Show resolved Hide resolved
src/PIL/Image.py Outdated Show resolved Hide resolved
wiredfool and others added 4 commits January 25, 2025 13:42
Co-authored-by: Andrew Murray <[email protected]>
Co-authored-by: Andrew Murray <[email protected]>
Co-authored-by: Andrew Murray <[email protected]>
Co-authored-by: Andrew Murray <[email protected]>
("HSV", ["L", "F"]),
),
)
def test_invalid_array_type(mode: str, dest_modes: List[str]) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_invalid_array_type(mode: str, dest_modes: List[str]) -> None:
def test_invalid_array_type(mode: str, dest_modes: list[str]) -> None:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants