Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow group_by on lists of strings #12636

Closed
ByteNybbler opened this issue Nov 22, 2023 · 3 comments
Closed

Allow group_by on lists of strings #12636

ByteNybbler opened this issue Nov 22, 2023 · 3 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@ByteNybbler
Copy link
Contributor

Description

The following code currently panics with the following verbose output:

use polars::prelude::*;

fn main() {
    let num_rows = 1_100;

    let df = df![
        "lists" => vec![Series::new("inner", ["str"; 1]); num_rows],
    ].unwrap();

    df.lazy()
        .group_by([col("lists")])
        .agg([col("lists").alias("agg")])
        .collect()
        .unwrap();
}
run GroupbyExec
keys/aggregates are not partitionable: running default HASH AGGREGATION
thread 'main' panicked at src/issuex.rs:14:10:
called `Result::unwrap()` on an `Err` value: ComputeError(ErrString("grouping on list type is only allowed if the inner type is numeric"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

It would be nice if lists of strings could be used as hash join keys.

@ByteNybbler ByteNybbler added the enhancement New feature or an improvement of an existing feature label Nov 22, 2023
@cmdlineluser
Copy link
Contributor

If it helps: You can cast to categorical.

df = pl.DataFrame({"lists": [["A"], ["B"], ["A"]], "val": [1, 2, 3]})

df.group_by(pl.col("lists").cast(pl.List(pl.Categorical))).all()
# shape: (2, 2)
# ┌───────────┬───────────┐
# │ lists     ┆ val       │
# │ ---       ┆ ---       │
# │ list[cat] ┆ list[i64] │
# ╞═══════════╪═══════════╡
# │ ["A"]     ┆ [1, 3]    │
# │ ["B"]     ┆ [2]       │
# └───────────┴───────────┘

@nardi
Copy link

nardi commented Jan 1, 2024

I'm looking for this feature as well (and similarly, joins on list columns). Is there a reason it is not implemented? It seems that there is hashing functionality implemented, so it should be possible to simply hash the list column and join on that, right?

@cmdlineluser
Copy link
Contributor

Closed by #15540

@reswqa reswqa closed this as completed Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

4 participants