Skip to content

Commit

Permalink
ICU-22707 Unicode 16 script metadata
Browse files Browse the repository at this point in the history
  • Loading branch information
markusicu committed Apr 9, 2024
1 parent c3014df commit 1a85d48
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 11 deletions.
8 changes: 8 additions & 0 deletions icu4c/source/common/uscript_props.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,14 @@ const int32_t SCRIPT_PROPS[] = {
0x10582 | EXCLUSION | CASED, // Vith
0x11F1B | EXCLUSION | LB_LETTERS, // Kawi
0x1E4E6 | EXCLUSION, // Nagm
0,
0x10D5D | EXCLUSION | RTL | CASED, // Gara
0x1611C | EXCLUSION, // Gukh
0x16D45 | EXCLUSION, // Krai
0x1E5D0 | EXCLUSION, // Onao
0x11BC4 | EXCLUSION, // Sunu
0x105C2 | EXCLUSION, // Todr
0x11392 | EXCLUSION, // Tutg
// End copy-paste from parsescriptmetadata.py
};

Expand Down
21 changes: 10 additions & 11 deletions icu4c/source/data/unidata/changes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,6 @@ and see the change logs below.
Unicode 16.0 update for ICU 76

TODO
- In the Unicode Tools repo: Delete the org.unicode.text.tools.RecommendedSetGenerator.
- In corepropsbuilder.cpp, remove the isA9CF hack.

https://www.unicode.org/versions/Unicode16.0.0/
Expand All @@ -62,6 +61,8 @@ https://www.unicode.org/reports/tr44/tr44-33.html
https://unicode-org.atlassian.net/browse/ICU-22707 Unicode 16
https://unicode-org.atlassian.net/browse/CLDR-17226 BRS Unicode 16

https://github.com/unicode-org/unicodetools/pull/774 delete the RecommendedSetGenerator

https://github.com/unicode-org/unicodetools/issues/492 adjust cldr/*BreakTest generation for Unicode 15.1

* Command-line environment setup
Expand Down Expand Up @@ -198,8 +199,6 @@ export UNICODE_TOOLS=~/oss/unicodetools/mine/src
+ Indic_Syllabic_Category: uchar.h & UCharacter.IndicSyllabicCategory
+ after adding new API constants, run preparseucd.py again

TODO: need to update CLDR script metadata first

* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
(not strictly necessary for NOT_ENCODED scripts)
$ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
Expand All @@ -219,8 +218,6 @@ to find out the latest `bazel` version, and
copying that version number into the $ICU_SRC/.bazeliskrc config file.
(Revert if you find incompatibilities, or, better, update our build & config files.)

TODO

* generate data files

- remember to define the environment variables
Expand All @@ -233,6 +230,8 @@ TODO
- build/bootstrap/generate new files:
icu4c/source/data/unidata/generate.sh

TODO

* run & fix ICU4C tests
- Note: Some of the collation data and test data will be updated below,
so at this time we might get some collation test failures.
Expand Down Expand Up @@ -328,14 +327,14 @@ TODO
output:
...
make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt74b
mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt74b
LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt74l.dat ./out/icu4j/icudt74b.dat -s ./out/build/icudt74l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt74b
mv ./out/icu4j/"com/ibm/icu/impl/data/icudt74b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt74b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt74b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt74b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt74b"
jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt74b/
mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt75b
mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt75b
LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt75l.dat ./out/icu4j/icudt75b.dat -s ./out/build/icudt75l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt75b
mv ./out/icu4j/"com/ibm/icu/impl/data/icudt75b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt75b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt75b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt75b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt75b"
jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt75b/
mkdir -p /tmp/icu4j/main/shared/data
cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt74b/
jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt75b/
mkdir -p /tmp/icu4j/main/shared/data
cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
Expand Down
8 changes: 8 additions & 0 deletions icu4j/main/core/src/main/java/com/ibm/icu/lang/UScript.java
Original file line number Diff line number Diff line change
Expand Up @@ -1585,6 +1585,14 @@ private static final class ScriptMetadata {
0x10582 | EXCLUSION | CASED, // Vith
0x11F1B | EXCLUSION | LB_LETTERS, // Kawi
0x1E4E6 | EXCLUSION, // Nagm
0,
0x10D5D | EXCLUSION | RTL | CASED, // Gara
0x1611C | EXCLUSION, // Gukh
0x16D45 | EXCLUSION, // Krai
0x1E5D0 | EXCLUSION, // Onao
0x11BC4 | EXCLUSION, // Sunu
0x105C2 | EXCLUSION, // Todr
0x11392 | EXCLUSION, // Tutg
// End copy-paste from parsescriptmetadata.py
};

Expand Down

0 comments on commit 1a85d48

Please sign in to comment.