Performance Issue - Polars DataFrame Behaves Depending on Data Source (Which Shouldn't) #19703
Open
2 tasks done
Labels
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
python
Related to Python Polars
Checks
Reproducible example
Polars DataFrame Behaves Depending on Data Source (Which Shouldn't)
Workarounds found, but need deeper investigation.
This behavior triggered when I am dealing with really large dataframe(in my case 28 million rows) and tons of .csv files, so hopefully I can describe the problem clearly without having to upload my dataset.
What happened
How to reproduce this problematic .parquet file:
Workaround:
Log output
No response
Issue description
See code section for details.
Tl;dr,
Expected behavior
DataFrame's behavior should be independent from data source once loaded to python.
Installed versions
The text was updated successfully, but these errors were encountered: