-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emoji width #4
Comments
The Unicode standard defines which characters should be considered wide and which should not. To my knowledge, emoji are not considered wide characters by the standard. Note also that width refers to number of columns when displayed in monospaced font; any character can appear wider when displayed in a proportional font. (Anecdotally, the heart symbol above occupies one column in my Unicode-aware terminal.) |
From reading this, I believe that as of Unicode 9, emoji are now wide characters. I also seems that as of unicode-width 0.1.4 emojis are considered to be wide characters, so this can be closed. PS Thanks for writing this library! |
unicode-width 0.1.4 returns |
Ah, it looks like unicode-width 0.1.4 reports that a |
The change in 0828133 means that the width of emoji are used. I think the issue unicode-rs/unicode-width#4 means that the wrong width is being calculated for emoji, and there happens to be one in the example.
The change in 0828133 means that the width of emoji are used. I think the issue unicode-rs/unicode-width#4 means that the wrong width is being calculated for emoji, and there happens to be one in the example.
I thought I was experiencing this, but it turns out that my terminal was just getting the widths wrong and I was seeing it the wrong way! |
Apart from this, there is a problem with compound emojis. The current implementation just splits things up into characters and adds all the widths. That may not be correct in the presence of compound emojis like 👩🔬 = 👩 + ZWJ + 🔬 , as all the individual emojis have width 2. |
I don't think handling that is what this crate is about -- this crate implements a spec, a spec which doesn't attempt to deal with emoji. |
The docs say "we provide the width in columns". For characters in X, Y, Z categories, we do A, B, C. AIUI Emoji don't really fall into those categories, so I'd naively expect the result to be whatever makes the most sense (if there is one such result). Depending on the user's system -- whether the compound emoji can be rendered properly or not (in which case, it shows up as two separate emoji) -- the computed width will be different. The crate picks the width you'd get when it shows as split up, which is a reasonable choice. However, since there are two reasonable answers here, I think if the precise scope and limitations of the crate were made clearer, then the behavior for compound emoji wouldn't be an issue. I'm happy to open a PR to add this clarification if you agree. |
There kinda isn't, the concept of "width" you're asking for is a matter of font, as well as the context (many terminals will not use emoji presentation, which means those will display as two) The crate does already mention that it follows the UTS 11 rules. Feel free to add to the readme that this may not match actual rendered column width. |
I'd been using this crate on the assumption that Is there a non-trivial subset of strings for which the displayed column width is exactly specified and we can rely on it being accurate for any standards-compliant terminal? If so, can we add another method to |
In regards to UAX #11, the recommendations state
and as best as I can tell from this definition, In other words, it seems like the most "correct" behavior for a character with a text presentation by default, like U+2764, would be assert_eq!(1, UnicodeWidthStr::width("\u{2764}"));
assert_eq!(1, UnicodeWidthStr::width("\u{2764}\u{fe0e}"));
assert_eq!(2, UnicodeWidthStr::width("\u{2764}\u{fe0f}")); And for a character with an emoji presentation by default: assert_eq!(2, UnicodeWidthStr::width("\u{26a1}"));
assert_eq!(1, UnicodeWidthStr::width("\u{26a1}\u{fe0e}"));
assert_eq!(2, UnicodeWidthStr::width("\u{26a1}\u{fe0f}")); Of course, the rendering of this also seems to vary by OS and browser: |
I don't really know much about this space, but here's my attempt at dealing with this in a terminal emulator. /// Returns the number of cells visually occupied by a sequence
/// of graphemes
pub fn unicode_column_width(s: &str) -> usize {
use unicode_segmentation::UnicodeSegmentation;
s.graphemes(true).map(grapheme_column_width).sum()
}
/// Returns the number of cells visually occupied by a grapheme.
/// The input string must be a single grapheme.
pub fn grapheme_column_width(s: &str) -> usize {
// Due to this issue:
// https://github.com/unicode-rs/unicode-width/issues/4
// we cannot simply use the unicode-width crate to compute
// the desired value.
// Let's check for emoji-ness for ourselves first
use xi_unicode::EmojiExt;
for c in s.chars() {
if c.is_emoji_modifier_base() || c.is_emoji_modifier() {
// treat modifier sequences as double wide
return 2;
}
}
UnicodeWidthStr::width(s)
} |
I noticed while scrolling `emoji-test.txt` that some of the combined emoji sequences rendered very poorly. This was due to the unicode width being reported as up to 4 in some cases. Digging into it, I discovered that the unicode width crate uses a standard calculation that doesn't take emoji combination sequences into account (see unicode-rs/unicode-width#4). This commit takes a dep on the xi-unicode crate as a lightweight way to gain access to emoji tables and test whether a given grapheme is part of a combining sequence of emoji.
Not sure, but suppose that example from this article related to this issue:
returns 5, but article author think that it must be 2 |
Right, this crate is dealing with a different notion of width. |
@keidax is actually right. I came here not as a rust dev, but more as a VTE dev, because I actually forgot where in the huge mass of unicode (emoji) specs I was reading that emoji presentation is always considered to be east Asian wide (2 columns in mono spaced fonts). -- so thanks for also having provided the links @keidax. Sadly many VTEs and even client apps are still getting this wrong, but it seems to shift slightly (Kitty for example gets a lot of it right). |
#41 added support for U+FE0F. (Emoji ZWJ sequences and skintone modifiers remain unsupported, however.) |
I am not sure but the displayed width of emoji seems to be at least 2:
The text was updated successfully, but these errors were encountered: