Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter and datetime do not work together #19696

Open
2 tasks done
EthanSteinberg opened this issue Nov 8, 2024 · 1 comment
Open
2 tasks done

Filter and datetime do not work together #19696

EthanSteinberg opened this issue Nov 8, 2024 · 1 comment
Labels
bug Something isn't working P-low Priority: low python Related to Python Polars

Comments

@EthanSteinberg
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

d = pl.DataFrame({
    'year': [1970],
    'value': [None]
}, schema={'year': pl.Int32(), 'value': pl.Int32()})

# This line doesn't affect things at all
# d = d.lazy()

d = d.filter(pl.col('value').is_not_null())

final = d.select(time=pl.datetime(pl.col('year'), 1, 1), value=pl.col('value'))

print(final)

print(final.collect())

Log output

---------------------------------------------------------------------------
ShapeError                                Traceback (most recent call last)
<ipython-input-24-2a2004cbf8f4> in <cell line: 16>()
     14 
     15 
---> 16 final = d.select(time=pl.datetime(pl.col('year'), 1, 1), value=pl.col('value'))
     17 
     18 

1 frames
/usr/local/lib/python3.10/dist-packages/polars/lazyframe/frame.py in collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, collapse_joins, no_optimization, streaming, engine, background, _eager, **_kwargs)
   2053         # Only for testing purposes
   2054         callback = _kwargs.get("post_opt_callback", callback)
-> 2055         return wrap_df(ldf.collect(callback))
   2056 
   2057     @overload

ShapeError: Series time, length 1 doesn't match the DataFrame height of 0

If you want expression: col("year").dt.datetime([dyn int: 1, dyn int: 1, dyn int: 0, dyn int: 0, dyn int: 0, dyn int: 0, String(raise)]) to be broadcasted, ensure it is a scalar (for instance by adding '.first()').

Issue description

Polars filtering and pl.datetime don't seem to work together at all. When you filter a dataset and then use datetime to construct a result you get the error seen above.

This error seems to happen in both lazy and eager mode, the main requirement seems to be the combination of filter and datetime.

https://colab.research.google.com/drive/1N6ByBcHuOKYuE5YNJwtPCzAlv5N4U8qF?usp=sharing is an interactive example of the bug.

Expected behavior

The expected behavior is to get an empty dataframe result.

Installed versions

--------Version info---------
Polars: 1.12.0
Index type: UInt32
Platform: Linux-6.1.85+-x86_64-with-glibc2.35
Python: 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0]
LTS CPU: False

----Optional dependencies----
adbc_driver_manager
altair 4.2.2
cloudpickle 3.1.0
connectorx
deltalake
fastexcel
fsspec 2024.10.0
gevent
great_tables
matplotlib 3.8.0
nest_asyncio 1.6.0
numpy 1.26.4
openpyxl 3.1.5
pandas 2.2.2
pyarrow 17.0.0
pydantic 2.9.2
pyiceberg
sqlalchemy 2.0.36
torch 2.5.0+cu121
xlsx2csv
xlsxwriter

@EthanSteinberg EthanSteinberg added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Nov 8, 2024
@MarcoGorelli
Copy link
Collaborator

MarcoGorelli commented Nov 8, 2024

thanks for the report - I don't think this is related to filter, the following also reproduces the same error:

d = pl.DataFrame({
   'year': [],
    'value': []
}, schema={'year': pl.Int32(), 'value': pl.Int32()})
d.select(time=pl.datetime(pl.col('year'), 1, 1), value=pl.col('value'))

also reproduces the issue

Rather, the error is that you can't use pl.datetime in select in an empty dataframe

@MarcoGorelli MarcoGorelli added P-low Priority: low and removed needs triage Awaiting prioritization by a maintainer labels Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P-low Priority: low python Related to Python Polars
Projects
Status: Ready
Development

No branches or pull requests

2 participants