forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-46841][SQL] Add collation support for ICU locales and collatio…
…n specifiers ### What changes were proposed in this pull request? Languages and localization for collations are supported by ICU library. Collation naming format is as follows: ``` <2-letter language code>[_<4-letter script>][_<3-letter country code>][_specifier_specifier...] ``` Locale specifier consists of the first part of collation name (language + script + country). Locale specifiers need to be stable across ICU versions; to keep existing ids and names invariant we introduce golden file will locale table which should case CI failure on any silent changes. Currently supported optional specifiers: - `CS`/`CI` - case sensitivity, default is case-sensitive; supported by configuring ICU collation levels - `AS`/`AI` - accent sensitivity, default is accent-sensitive; supported by configuring ICU collation levels User can use collation specifiers in any order except of locale which is mandatory and must go first. There is a one-to-one mapping between collation ids and collation names defined in `CollationFactory`. ### Why are the changes needed? To add languages and localization support for collations. ### Does this PR introduce _any_ user-facing change? Yes, it adds new predefined collations. ### How was this patch tested? Added checks to `CollationFactorySuite` and ICU locale map golden file. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46180 from nikolamand-db/SPARK-46841. Authored-by: Nikola Mandic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
- Loading branch information
1 parent
a78ef73
commit 7fe1b93
Showing
27 changed files
with
1,388 additions
and
236 deletions.
There are no files selected for viewing
678 changes: 558 additions & 120 deletions
678
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.