Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing tags for script identifiers #35

Closed
fschutt opened this issue Dec 7, 2020 · 2 comments
Closed

Missing tags for script identifiers #35

fschutt opened this issue Dec 7, 2020 · 2 comments

Comments

@fschutt
Copy link
Contributor

fschutt commented Dec 7, 2020

I'm currently trying to shape a simple text as in the allsorts-tools/examples/shape file. Since I have to determine the script of the text block at runtime, I'm using whatlang to detect the language + script from the text itself. However, I'm missing some tags for the language codes:

use allsorts::tag;

// auto-detect script + language from text (todo: performance!)
let (lang, script) = whatlang::detect(text)
    .map(|info| (info.lang(), info.script()))
    .unwrap_or((Lang::English, Script::Latin));

let lang = lang.code().toupper();

let script_id = match language_info.script() {
    Script::Arabic => tag::ARAB,
    Script::Bengali => tag::BENG,
    Script::Cyrillic => tag::CYRL,
    Script::Devanagari => tag::DEVA,
    Script::Ethiopic => , // ??
    Script::Georgian => , // ??
    Script::Greek => tag::GREK,
    Script::Gujarati => tag::GUJR,
    Script::Gurmukhi => tag::GURU, // can also be GUR2
    Script::Hangul => , // ??
    Script::Hebrew => , // ??
    Script::Hiragana => , // ??
    Script::Kannada => tag::KNDA,
    Script::Katakana => , // ??
    Script::Khmer => , // TODO?? - unsupported?
    Script::Latin => tag::LATN,
    Script::Malayalam => tag::MLYM,
    Script::Mandarin => , // ??
    Script::Myanmar => ,  // ??
    Script::Oriya => tag::ORYA,
    Script::Sinhala => tag::SINH,
    Script::Tamil => tag::TAML,
    Script::Telugu => tag::TELU,
    Script::Thai => tag::THAI,
};

Is it possible to add these tags to the API, even if they are unsupported? Or is this by design? Thanks.

@wezm
Copy link
Contributor

wezm commented Dec 7, 2020

It's not really feasible to maintain a list of all possible tags. Ultimately they are just u32 values. The tag module has a macro that makes it a little more pleasant to construct them from a byte string. I'll make that public in the next release. In the meantime you might like to copy the macro and its supporting function into your own code:

/// Generate a 4-byte font table tag from byte string
///
/// Example:
///
/// ```
/// assert_eq!(tag!(b"glyf"), 0x676C7966);
/// ```
macro_rules! tag {
    ($w:expr) => {
        tag(*$w)
    };
}

const fn tag(chars: [u8; 4]) -> u32 {
    ((chars[3] as u32) << 0)
        | ((chars[2] as u32) << 8)
        | ((chars[1] as u32) << 16)
        | ((chars[0] as u32) << 24)
}

You would use this as follows for any tags missing from the tag module (See OpenType docs for tag values):

Script::Kannada => tag::KNDA,
    Script::Katakana => tag!(b"kana"),
    Script::Khmer => tag!(b"khmr"),
    Script::Latin => tag::LATN,

@wezm
Copy link
Contributor

wezm commented Dec 17, 2020

Fixed by bf0f283

@wezm wezm closed this as completed Dec 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants