Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add list.index_of_in() to Expr and Series #21192

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

itamarst
Copy link
Contributor

Fixes #20626.

The prototype in the draft PR (#20733) was 10× faster than using non-obvious hacky Python implementations based on list.eval(), so I would guess this is too.

Just as a reminder, the original requested feature in #5503 that led to the addition of Expr.index_of() was actually what's covered in this PR. And this is also what G-Research internal users had requested, so it seems like there are multiple potential users.

Copy link

codecov bot commented Feb 11, 2025

Codecov Report

Attention: Patch coverage is 97.50000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 79.90%. Comparing base (635a215) to head (d6fa1d4).
Report is 63 commits behind head on main.

Files with missing lines Patch % Lines
...ars-plan/src/plans/conversion/type_coercion/mod.rs 80.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #21192      +/-   ##
==========================================
+ Coverage   79.20%   79.90%   +0.70%     
==========================================
  Files        1585     1594       +9     
  Lines      225619   227899    +2280     
  Branches     2588     2600      +12     
==========================================
+ Hits       178697   182101    +3404     
+ Misses      46332    45201    -1131     
- Partials      590      597       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46
Copy link
Member

We still haven't decided if we want this. I want to think of better ways than implementing every expression again over arrays and lists. Iterators over anyvalues, are things we should avoid. I want something more generic that allows us to reuse expressions on lists in way that doesn't impact binary size and reuses existing logic such as the row-encoding.

I first want to thing about this one. Being faster than list.eval isn't a metric we think is important in this, as list.eval is very very slow. This is of course perfectly implementable as plugin, if needed upstream.

@itamarst itamarst changed the title feat: list.index_of_in() implementation feat: Add list.index_of_in() to Expr and Series Feb 11, 2025
@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars and removed title needs formatting labels Feb 11, 2025
@itamarst
Copy link
Contributor Author

Insofar as possible this uses ChunkedArray.iter(), rather than Series.iter().

And the fact that (if you want type coercion in IR, anyway) it required an additional kind of type coercion is at least something to consider for a more generic implementation strategy.

If you decide not to accept this I will likely refactor into a plugin, yes.

@itamarst itamarst marked this pull request as ready for review February 11, 2025 17:51
@itamarst
Copy link
Contributor Author

Not sure what's up with the Python docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Search for an item in a list (or array), part 2
3 participants