-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Using unify_schema() during schema evolution fails #37898
Comments
As per the documentation: "Note that two fields with different types will fail merging." What are you using to merge the parquet files once you get the unified schema? |
I'm reading the tables using pyarrow.parquet.read_table() and storing them in a list. After this, I'm looping through the whole list of tables and I'm using pyarrow.parquet.ParquetWriter.write_table() to write the merged parquet file. This is further being uploaded to S3 bucket. |
I see. There are some issues already opened connected to this. I suggest reading through:
Let me know if you get the information you are looking for. |
|
reg. 2 It doesn't work for me in case of adding / removing columns (pyarrow==15.0.0). E.g. 1.parquet has scheme:
2.parquet has scheme:
When I read them by manually merged schema:
get:
|
Describe the usage question you have. Please include as many useful details as possible.
I am trying to merge multiple parquet files using pyarrow's unify_schema(). During schema evolution, the same field has 2 different data type structures. What is the best way to handle schema evolution in such cases while making sure there is no loss of data?
Component(s)
Parquet, Python
The text was updated successfully, but these errors were encountered: