-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.3.0 regression when reading all-null DECIMAL(19,0) column @ parquet file exported by AWS Redshift #17929
Closed
2 tasks done
Comments
TinoSM
added
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
python
Related to Python Polars
labels
Jul 29, 2024
This is how the file was generated (Redshift SQL query)
|
TinoSM
changed the title
1.3.0 regression when reading DECIMAL(19,0) parquet files exported by Redshift
1.3.0 regression when reading all-null DECIMAL(19,0) column @ parquet file exported by Redshift
Jul 29, 2024
TinoSM
changed the title
1.3.0 regression when reading all-null DECIMAL(19,0) column @ parquet file exported by Redshift
1.3.0 regression when reading all-null DECIMAL(19,0) column @ parquet file exported by AWS Redshift
Jul 29, 2024
coastalwhite
added
accepted
Ready for implementation
P-medium
Priority: medium
A-io-parquet
Area: reading/writing Parquet files
A-panic
Area: code that results in panic exceptions
and removed
needs triage
Awaiting prioritization by a maintainer
labels
Jul 29, 2024
Can you share a small file showing the issue? |
@ritchie46 it is attached to the original ticket, is the .zip in the reproducible example (broken_example.parquet.zip in ctrl-F will find it) Adding it aswell to this comment |
Check. Thanks! |
Minimal repro for this issue: import polars as pl
import io
from polars.testing import assert_frame_equal
df = pl.DataFrame({
'a': [ None ]
}, schema={ 'a': pl.Decimal(precision=18, scale=0) })
f = io.BytesIO()
df.write_parquet(f, use_pyarrow=True)
f.seek(0)
assert_frame_equal(pl.read_parquet(f), df) |
coastalwhite
added a commit
to coastalwhite/polars
that referenced
this issue
Jul 30, 2024
coastalwhite
added a commit
to coastalwhite/polars
that referenced
this issue
Jul 30, 2024
thanks @coastalwhite @ritchie46 ! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Checks
Reproducible example
broken_example.parquet.zip
Log output
Issue description
Since 1.3.0 (we come from 1.2.1 and same files work fine) we are unable to read all our parquet exports.
It turns out the issue happens when we read a DECIMAL(19,0) column with (many/all?) values set to null
It works fine if use_pyarrow=True (but that enforces read_parquet instead of scan_parquet also...)
Expected behavior
Parquet file being read without errors thrown
I can read it correctly with Polars 1.2.1 or DuckDB or using "pyarrow" engine
Installed versions
The text was updated successfully, but these errors were encountered: