Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-48898][SQL] Fix Variant shredding bug
### What changes were proposed in this pull request? In VariantShreddingWriter, there are two calls to `variantBuilder.appendVariant` that were left over from an earlier version of the shredding spec where we constructed new metadata for every shredded value. This method rebuilds the Variant value to refer to the new metadata dictionary in the builder, so we should not be using it in shredding, where all dictionary IDs now refer to the original Variant metadata. 1) When writing a Variant value that does not match the shredding type. The code was doing the right thing, but unnecessarily calling `variantBuilder.appendVariant` and then discarding the result. The PR removes that dead code. 2) When reconstructing a Variant object that contains only the fields of the original object that don't appear in the shredding schema. This is a correctness bug, since we would modify the value to use new dictionary IDs that do not correspond to the ones in the original metadata. ### Why are the changes needed? Variant shredding correctness. ### Does this PR introduce _any_ user-facing change? No, shredding has not yet been released. ### How was this patch tested? Added a unit test that fails without the fix. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49031 from cashmand/SPARK-48898-bugfix. Authored-by: cashmand <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
- Loading branch information