forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-47483][SQL] Add support for aggregation and join operations on…
… arrays of collated strings ### What changes were proposed in this pull request? Example of aggregation sequence: ``` create table t(a array<string collate utf8_binary_lcase>) using parquet; insert into t(a) values(array('a' collate utf8_binary_lcase)); insert into t(a) values(array('A' collate utf8_binary_lcase)); select distinct a from t; ``` Example of join sequence: ``` create table l(a array<string collate utf8_binary_lcase>) using parquet; create table r(a array<string collate utf8_binary_lcase>) using parquet; insert into l(a) values(array('a' collate utf8_binary_lcase)); insert into r(a) values(array('A' collate utf8_binary_lcase)); select * from l join r where l.a = r.a; ``` Both runs should yield one row since the arrays are considered equal. Problem is in `isBinaryStable` function which should return false if **any** of its subtypes is non-binary collated string. ### Why are the changes needed? To support aggregates and joins in arrays of collated strings properly. ### Does this PR introduce _any_ user-facing change? Yes, it fixes the described scenarios. ### How was this patch tested? Added new checks to collation suite. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#45611 from nikolamand-db/SPARK-47483. Authored-by: Nikola Mandic <[email protected]> Signed-off-by: Max Gekk <[email protected]>
- Loading branch information
1 parent
ca44489
commit 8cba15e
Showing
3 changed files
with
156 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters