-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes to characters considered zero-width #34
Fixes to characters considered zero-width #34
Conversation
This implements a specific standardized algorithm as documented in the readme. This rule around Default_Ignorable doesn't seem to be documented there. This is not a general purpose terminal width library. |
This library already differs from UAX 11 in several important ways:
|
Hmm, yeah. I didn't originally write this but I would like for the code to follow the spec first and offer these things as settings |
UAX 11 doesn't really give a full, exact algorithm for getting a "width value" for a string. For example, control codes aren't even mentioned, nor are line breaks etc. So I think referring to other parts of the Unicode standard as well makes perfect sense. |
Hmm that's fair. Will review later. I would ideally like someone to take a holistic view of this crate, compare with the specs, and document/add options. Haven't had time to do this myself ever since I inherited it. |
Default_Ignorable_Code_Point
s as zero-widthDefault_Ignorable_Code_Point
s as zero-width, as well as vowel and trailing Jamo
Default_Ignorable_Code_Point
s as zero-width, as well as vowel and trailing Jamo
I've added some comments throughout the code, but here is a summary of the current rules (with this PR's changes included):
What's still not handled, or could be handled differently:
|
https://www.unicode.org/L2/L2023/23107-terminal-suppt.pdf "Measurement" section highlights more problem cases |
See also https://www.unicode.org/versions/Unicode15.1.0/ch05.pdf#G40095, "Characters Ignored for Display" |
…rols as non-zero width
Unicode §5.21 - "Characters Ignored for Display" - "Default Ignorable Code Point" says:
Software that interprets the interlinear annotation characters should probably do that processing before passing to |
These characters are supposed to be completely invisible and ignored by rendering unless specially supported: https://www.unicode.org/faq/unsup_char.html#3. Characters affected
Edit: Now also fixes #26
Edit 2: I've marked
Prepended_Concatenation_Mark
s as not zero-width. This matches the behavior of glibcEdit 3: I've given U+115F HANGUL CHOSEONG FILLER back its width 2, because it's expected to be combined with other jamo to form a width-2 syllable block.