Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement Arrow PyCapsule Interface #9140

Closed
1 task done
kylebarron opened this issue May 7, 2024 · 4 comments · Fixed by #9143
Closed
1 task done

feat: Implement Arrow PyCapsule Interface #9140

kylebarron opened this issue May 7, 2024 · 4 comments · Fixed by #9143
Labels
feature Features or general enhancements

Comments

@kylebarron
Copy link

Is your feature request related to a problem?

Currently Ibis integrates with Arrow via the to_pyarrow method. The downside of this is that library consumers have to:

  1. Be aware of ibis
  2. Look for specific ibis data types
  3. Know that they can call this to_pyarrow method, which tends to be named differently in different libraries. E.g. DuckDB calls it .arrow() and Polars calls it .to_arrow().

What is the motivation behind your request?

The Arrow PyCapsule Interface is a new standard for exchanging Arrow data in Python. Among other benefits, this defines a single method name (__arrow_c_stream__) that is public and standardized

This means that other libraries don't have to build specific connectors to Polars, DuckDB, Ibis, pyarrow, etc, but rather can implement support for any input object with an __arrow_c_stream__ method. For my particular use case, a geospatial visualization library I develop, Lonboard, is Arrow based and looks for this method.

Describe the solution you'd like

Implement an __arrow_c_stream__ method wherever there's currently a to_pyarrow method. This could be as simple as

    def __arrow_c_stream__(self, requested_schema):
        return self.to_pyarrow().__arrow_c_stream__(requested_schema)

where it uses the fact that the pyarrow Table class implements this as of v14 or so (not 100% sure which version it was added in)

What version of ibis are you running?

I haven't run ibis yet but __arrow_c_stream__ is not found in a code search of the repo.

What backend(s) are you using, if any?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@kylebarron kylebarron added the feature Features or general enhancements label May 7, 2024
@jcrist
Copy link
Member

jcrist commented May 7, 2024

Makes sense to me! Would we also want to implement __arrow_c_schema__ (as per the docs you linked)?

@kylebarron
Copy link
Author

There's some ongoing discussion about this apache/arrow#39689 but my own understanding is that if you're exporting a "table" or "stream of batches", then you'd only implement the __arrow_c_stream__ method. While if you're defining your own arrow-compatible data types (which maybe ibis is), then on that data type or field object you could implement __arrow_c_schema__

@cpcloud
Copy link
Member

cpcloud commented May 7, 2024

We have a single extension type used when exporting complex-type data from the snowflake backend to pyarrow. It's really only for internal use at the moment. Unclear how it would interact with the effort here, but I don't think it should be a blocker as it's the only backend that has this issue and I'm pretty sure we can solve the problem that type is solving in some other way if we need to.

@jcrist
Copy link
Member

jcrist commented May 7, 2024

Sounds good to me - I pushed up a quick PR to support this for ibis.Table types in #9143.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants