Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
apacheGH-37170: [C++] Support schema rewriting of RecordBatch. (apach…
…e#37171) ### Rationale for this change We have a scene. There is a plan in pg that looks like the following. For the Append node, there are two scans in parallel, and then there is a column of data, but the column names are different. If it is mapped to the arrow schema It is a different field. For the append node, we will get two batches. The first batch comes from the first scan, and the second batch comes from the second scan, but because the two columns are constructed based on the scan The schema is different, so the final schema of the two batches is different. When we construct the slot returned by the Append node, we use the schema of the first batch. When we put the data of the second batch into it, the verification fails due to inconsistent shcema. Therefore, the problem is simplified to: For a node, If there are n child nodes, the schema of the following child nodes must be consistent. If not, the schema of n-1 child nodes must be the same as the first schema, so there is logic to rewrite the schema of the batch data. ``` -> Vec Append -> Vec Seq Scan on public. tenk1 Output: tenk1.unique1 -> Vec Seq Scan on public.tenk1 tenk1_1 Output: tenk1_1.fivethous ``` However, when reading the batch code, there is only the read-only interface schema(), so here we submit a pr to add and rewrite the schema interface, and only modify the columns with the same type. If they are not the same, an invalid modification will be returned. backgroud: apache#37170 ### What changes are included in this PR? - record_batch.h - record_batch.cc - record_batch_test.cc ### Are these changes tested? yes, see record_batch_test.cc. gtest filter is: ``` TestRecordBatch.RewriteSchema ``` ### Are there any user-facing changes? yes: see background in issue. * Closes: apache#37170 Authored-by: light-city <[email protected]> Signed-off-by: David Li <[email protected]>
- Loading branch information