Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coalesce with null dtype column #21506

Open
2 tasks done
jesusestevez opened this issue Feb 27, 2025 · 0 comments
Open
2 tasks done

Coalesce with null dtype column #21506

jesusestevez opened this issue Feb 27, 2025 · 0 comments
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@jesusestevez
Copy link

jesusestevez commented Feb 27, 2025

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

Minimal reproducible example by cmdlineuser:

(pl.DataFrame({"x": [[None, None]], "y": [[1, 2]]})
   #.cast(pl.List(pl.Int64))
   .group_by(1)
   .agg(pl.coalesce(pl.all().flatten()))
)

# shape: (1, 2)
# ┌─────────┬────────────┐
# │ literal ┆ x          │
# │ ---     ┆ ---        │
# │ i32     ┆ list[null] │
# ╞═════════╪════════════╡
# │ 1       ┆ null       │  # [1, 2] is the output with .cast()
# └─────────┴────────────┘

Full example:

import polars as pl

df = pl.DataFrame(
    {
        "missing": [[None, None], [None, None], [None, None]],
        "values1": [[1, None], [3, 4], [5, 6]],
        "values2": [[1, 2], [3, 4], [5, 6]],
    }
)

comparison = lambda x, y: x.eq_missing(y)
coalesce = lambda x: pl.coalesce(x)


df.select(
    compare=comparison(
        pl.col("values1").explode(), pl.col("values2").explode()).implode().over(
            pl.int_range(pl.len())
        ),
    coalesce_values=coalesce(
        (pl.col("values1").explode(), pl.col("values2").explode())).implode().over(
            pl.int_range(pl.len())
        ),
    coalesce_missing=coalesce(
        (pl.col("missing").explode(), pl.col("values2").explode())).implode().over(
            pl.int_range(pl.len())
        )
    )

Log output

Issue description

Following https://discord.com/channels/908022250106667068/957930511999832064/1344616117204680744 We have noticed that there seems to be a bug on the treatment of null values for Null dtype columns.

Expected behavior

I would expect the Null dtype column to be accepted in the coalesce function.

Installed versions

--------Version info---------
Polars:              1.22.0
Index type:          UInt32
Platform:            Windows-11-10.0.22621-SP0
Python:              3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:20:11) [MSC v.1938 64 bit (AMD64)]
LTS CPU:             False

----Optional dependencies----
Azure CLI            <not installed>
adbc_driver_manager  <not installed>
altair               5.4.1
azure.identity       <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            0.10.4
fsspec               2024.10.0
gevent               <not installed>
google.auth          <not installed>
great_tables         0.16.1
matplotlib           3.9.1
numpy                2.0.1
openpyxl             3.1.3
pandas               2.2.2
pyarrow              16.1.0
pydantic             2.10.6
pyiceberg            <not installed>
sqlalchemy           1.4.52
torch                <not installed>
xlsx2csv             0.8.2
xlsxwriter           3.1.9
@jesusestevez jesusestevez added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

1 participant