2.0.0-beta1 script properties not up to date for unicode 16 #6041

cmyr · 2025-01-27T21:06:15Z

Per the unicode 16 version of ScriptExtensions.txt, the following should pass:

    #[test]
    fn expected_script_thing() {
        let scripts = ScriptWithExtensions::new()
            .get_script_extensions_val('\u{2bc}')
            .iter()
            .collect::<Vec<_>>();
        assert_eq!(
            scripts,
            [
                Script::Bengali,
                Script::Cyrillic,
                Script::Devanagari,
                Script::Latin,
                Script::Lisu,
                Script::Thai,
                Script::Toto
            ]
        );
    }

but we end up with just Script::Common, which would have been expected for unicode 15 and earlier.

To make this more confusing, If I look at the raw data files in the release-76-1 tag, it does appear up to date. I haven't dug much past that.

The text was updated successfully, but these errors were encountered:

Manishearth · 2025-01-27T21:43:47Z

cc @robertbastian what's the status of icuexportdata being updated? I thought we were already on Unicode 16.

robertbastian · 2025-01-27T23:13:06Z

All I can tell you is that 2.0.0-beta1 is on ICU release-76-1. Whether that correctly exports Unicode 16 requires me to debug ICU4C, which I'm not familiar with.

Manishearth · 2025-01-27T23:28:59Z

It's supposed to , from the relnotes: https://github.com/unicode-org/icu/releases/tag/release-76-1

Confirmed that this reproduces on ICU4X main, and confirmed that Unicode 16 data has a whole bunch of scx values for low codepoints that are not available on Unicode 15.

Manishearth · 2025-01-27T23:51:06Z

Trying to build ICU4C to see what's up

Manishearth · 2025-01-28T00:02:31Z

Found the culprit: https://unicode-org.atlassian.net/browse/ICU-21821

That hardcoded table in icuexportdata needs to be updated

cc @sffc @echeran

Manishearth · 2025-01-28T19:27:32Z

New data in #6044

Confirmed that it passes the following test:

#[test]
fn expected_script_thing() {
    use crate::props::Script;
    use crate::script::ScriptWithExtensions;
    let scripts = ScriptWithExtensions::new()
        .get_script_extensions_val('\u{2bc}')
        .iter()
        .collect::<Vec<_>>();
    assert_eq!(
        scripts,
        [
            Script::Bengali,
            Script::Cyrillic,
            Script::Devanagari,
            Script::Latin,
            Script::Thai,
            Script::Lisu,
            Script::Toto
        ]
    );
}

robertbastian · 2025-01-28T19:32:01Z

Linking #4602

Manishearth · 2025-02-02T11:34:46Z

It seems you have a workaround for this for now: We have fixed the ICU4C data export around this, and could do the work for a patch release, but @sffc and I would prefer to wait till ICU4X 2.0.0-beta2 which should happen in the next few weeks, instead of doing a transient patch release.

cmyr · 2025-02-03T15:30:09Z

A few weeks is fine, thanks for tracking this down!

rsheeter mentioned this issue Jan 27, 2025

How to determine version of baked script extensions data? #6034

Closed

This was referenced Jan 28, 2025

ICU-23033 Regenerate scx value array unicode-org/icu#3355

Merged

Update scx data #6044

Draft

cmyr mentioned this issue Jan 29, 2025

Remove script extension workaround when icu4x is updated googlefonts/fontc#1222

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.0.0-beta1 script properties not up to date for unicode 16 #6041

2.0.0-beta1 script properties not up to date for unicode 16 #6041

cmyr commented Jan 27, 2025 •

edited

Loading

Manishearth commented Jan 27, 2025

robertbastian commented Jan 27, 2025

Manishearth commented Jan 27, 2025

Manishearth commented Jan 27, 2025

Manishearth commented Jan 28, 2025

Manishearth commented Jan 28, 2025

robertbastian commented Jan 28, 2025

Manishearth commented Feb 2, 2025

cmyr commented Feb 3, 2025 •

edited

Loading

2.0.0-beta1 script properties not up to date for unicode 16 #6041

2.0.0-beta1 script properties not up to date for unicode 16 #6041

Comments

cmyr commented Jan 27, 2025 • edited Loading

Manishearth commented Jan 27, 2025

robertbastian commented Jan 27, 2025

Manishearth commented Jan 27, 2025

Manishearth commented Jan 27, 2025

Manishearth commented Jan 28, 2025

Manishearth commented Jan 28, 2025

robertbastian commented Jan 28, 2025

Manishearth commented Feb 2, 2025

cmyr commented Feb 3, 2025 • edited Loading

cmyr commented Jan 27, 2025 •

edited

Loading

cmyr commented Feb 3, 2025 •

edited

Loading