Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression after 0.20.19: vec_hash_combine operation not supported for dtype list[u8] when joining with secondary column on list[numeric] #15555

Closed
2 tasks done
Object905 opened this issue Apr 9, 2024 · 1 comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@Object905
Copy link
Contributor

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

In [1]: import polars as pl

In [2]: pl.__version__
Out[2]: '0.20.19'

In [3]: df = pl.DataFrame([(1, [1, 2],)], schema={"x": pl.UInt32, "y": pl.List(pl.UInt8)})

In [4]: df2 = pl.DataFrame([(1, [1, 2],)], schema={"x": pl.UInt32, "y": pl.List(pl.UInt8)})

In [5]: df.join(df2, on=("x", "y"), how="inner")
thread 'ipython' panicked at crates/polars-ops/src/frame/join/hash_join/multiple_keys.rs:185:91:
called `Result::unwrap()` on an `Err` value: InvalidOperation(ErrString("`vec_hash_combine` operation not supported for dtype `list[u8]`"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[5], line 1
----> 1 df.join(df2, on=("x", "y"), how="inner")

File ~/Dev/provider-cabinet/.venv/lib/python3.12/site-packages/polars/dataframe/frame.py:5952, in DataFrame.join(self, other, on, how, left_on, right_on, suffix, validate, join_nulls)
   5937     msg = f"expected `other` join table to be a DataFrame, got {type(other).__name__!r}"
   5938     raise TypeError(msg)
   5940 return (
   5941     self.lazy()
   5942     .join(
   5943         other=other.lazy(),
   5944         left_on=left_on,
   5945         right_on=right_on,
   5946         on=on,
   5947         how=how,
   5948         suffix=suffix,
   5949         validate=validate,
   5950         join_nulls=join_nulls,
   5951     )
-> 5952     .collect(_eager=True)
   5953 )

File ~/Dev/provider-cabinet/.venv/lib/python3.12/site-packages/polars/lazyframe/frame.py:1683, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, no_optimization, streaming, background, _eager)
   1680 if background:
   1681     return InProcessQuery(ldf.collect_concurrently())
-> 1683 return wrap_df(ldf.collect())

PanicException: called `Result::unwrap()` on an `Err` value: InvalidOperation(ErrString("`vec_hash_combine` operation not supported for dtype `list[u8]`"))

Log output

No response

Issue description

After upgrading to 0.20.19 join with secondary column being pl.List(pl.UInt8) fails, while it works on 0.20.18.

Also joining on only pl.List(pl.UInt8) raises PanicException: not implemented on all previous versions, but when second non-list column is added to on= in join - it works, which is expected?

0.20.18 raises grouping on list type is only allowed if the inner type is numeric when trying to join in pl.String, which suggests that groupby on numeric list is supported.

Also since group-by of non numeric lists is implemented, perhaps joins should support this too, but that's a separate issue.

Expected behavior

Join (at least with secondary non-list join condition) to work for lists, like in 0.20.18

Installed versions

--------Version info---------
Polars:               0.20.19
Index type:           UInt32
Platform:             Linux-6.8.2-arch2-1-x86_64-with-glibc2.39
Python:               3.12.2 (main, Feb 18 2024, 18:15:06) [GCC 13.2.1 20230801]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          3.0.0
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               24.2.1
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.1
pyarrow:              15.0.2
pydantic:             2.6.4
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.29
xlsx2csv:             0.8.2
xlsxwriter:           3.2.0```

</details>
@Object905 Object905 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Apr 9, 2024
@ritchie46
Copy link
Member

fixed by #15559

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants