-
-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big strings cause AssertionError: found X raw bytes (expected Y) #375
Comments
If you write with fastparquet too, then it works just fine. This will be tricky to find out why. |
I've found that changing the row group size changes the error message:
Changing the compression to Packages:
Python version:
|
This happens with old files that we wrote using pyarrow. See dask/fastparquet#375
I wonder, can someone check with pdb in |
Writing really long strings from pyarrow causes exception in fastparquet read.
If written with compression, it reports compression errors instead:
SNAPPY:
snappy.UncompressError: Error while decompressing: invalid input
GZIP:
zlib.error: Error -3 while decompressing data: incorrect header check
Minimal code to reproduce:
Versions:
fastparquet==0.1.6
pyarrow==0.10.0
pandas==0.22.0
sys.version '2.7.15 |Anaconda custom (64-bit)| (default, May 1 2018, 18:37:05) \n[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]'
Also opened issue here: apache/arrow#2562
The text was updated successfully, but these errors were encountered: