Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to use repeat_by with lists #21151

Open
borchero opened this issue Feb 9, 2025 · 1 comment · May be fixed by #21206
Open

Allow to use repeat_by with lists #21151

borchero opened this issue Feb 9, 2025 · 1 comment · May be fixed by #21206
Labels
enhancement New feature or an improvement of an existing feature

Comments

@borchero
Copy link
Contributor

borchero commented Feb 9, 2025

Description

Currently, trying to use repeat_by with a list dtype results in the following error:

InvalidOperationError: `repeat_by` operation not supported for dtype `list[i64]`

MWE:

import polars as pl

df = pl.DataFrame({
    "a": [[1], [2, 3], [4, 5, 6]],
    "n": [0, 3, 1],
})
df.select(pl.col("a").repeat_by("n"))

Conceptually, I see no issue that this cannot be supported, lists can be repeated similarly to e.g. strings.


Happy to open a PR for this but I might need some guidance where to do this...

  • I found crates/polars-ops/src/chunked_array/repeat_by.rs where List needs to be allowed as another dtype
  • It seems like one should extend ListFromIter in crates/polars-arrow/src/legacy/array/mod.rs (e.g. with a from_iter_list_trusted_len method)?
  • Once I'm there, I can't really figure out how to turn an iterator into a list again; MutableListArray seems to require concrete types and this is impossible as lists can be arbitrarily nested

That being said, the functionality to do this should already be in polars. group_by.agg also allows to chain arbitrary lists.

@borchero borchero added the enhancement New feature or an improvement of an existing feature label Feb 9, 2025
@borchero
Copy link
Contributor Author

I managed to get a trimmed-down version working by extending ListFromIter with a generic from_iter_dynamic_trusted_len method and using a derivative of the DynMutableListArray struct that was previously defined in polars-arrow/io/avro/read. This required plenty of dynamic type casting but it works at least. Unless there is a much simpler approach, I'll work on finalizing a PR in the next few days :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant