You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Before we concatenate the partial tables generated from each batch, we error out if the schemas of the tables don't match. But the column ordering of the partial tables can change depending on nulls in the columns. We should not error out in this case.
Enforce column ordering based on the partial table in the first batch in all later batches.
From offline discussions with @karthikeyann, pitfalls with the proposed solution:
Data type mismatch for the same column between partial tables. For example, a partial table may have int8, but next chunk might be inferred at int16 or float.
Column present in the second batch but not in the first batch. In this case, we will prune that column out and the final table will be missing that column. Note that the converse case - if a column present in the first batch is missing from some following batch - is handled by the JSON tree algorithms. The missing column is included and filled with nulls.
Describe the bug
Before we concatenate the partial tables generated from each batch, we error out if the schemas of the tables don't match. But the column ordering of the partial tables can change depending on nulls in the columns. We should not error out in this case.
Steps/Code to reproduce bug
Draft PR #17688
./build/latest/benchmarks/JSON_READER_NVBENCH --benchmark json_read_compressed_io --axis compression_type GZIP --axis data_size[pow2]=28 --axis num_sources=4 --device 0
Expected behavior
Enforce column ordering based on the partial table in the first batch in all later batches.
The text was updated successfully, but these errors were encountered: