Emoji width #4

gwenn · 2016-03-06T16:57:04Z

I am not sure but the displayed width of emoji seems to be at least 2:

"❤️"
"12"

let w = unicode_width::UnicodeWidthStr::width("\u{2764}\u{fe0f}");
assert_eq!(2, w); // (left: `2`, right: `1`)

kwantam · 2016-03-07T00:01:51Z

The Unicode standard defines which characters should be considered wide and which should not. To my knowledge, emoji are not considered wide characters by the standard. Note also that width refers to number of columns when displayed in monospaced font; any character can appear wider when displayed in a proportional font.

(Anecdotally, the heart symbol above occupies one column in my Unicode-aware terminal.)

casey · 2017-04-03T05:22:43Z

From reading this, I believe that as of Unicode 9, emoji are now wide characters.

I also seems that as of unicode-width 0.1.4 emojis are considered to be wide characters, so this can be closed.

PS Thanks for writing this library!

gwenn · 2017-05-01T12:01:58Z

unicode-width 0.1.4 returns 1...

casey · 2017-05-02T02:23:08Z

Ah, it looks like unicode-width 0.1.4 reports that a ❤️ is one column wide, and a 😗 is two columns wide. I didn't specifically test the heart character, just emoji.

The change in 0828133 means that the width of emoji are used. I think the issue unicode-rs/unicode-width#4 means that the wrong width is being calculated for emoji, and there happens to be one in the example.

ogham · 2017-05-17T22:37:22Z

I thought I was experiencing this, but it turns out that my terminal was just getting the widths wrong and I was seeing it the wrong way!

typesanitizer · 2018-10-11T21:23:39Z

Apart from this, there is a problem with compound emojis. The current implementation just splits things up into characters and adds all the widths. That may not be correct in the presence of compound emojis like 👩‍🔬 = 👩 + ZWJ + 🔬 , as all the individual emojis have width 2.

Manishearth · 2018-10-11T21:27:19Z

I don't think handling that is what this crate is about -- this crate implements a spec, a spec which doesn't attempt to deal with emoji.

typesanitizer · 2018-10-11T21:39:53Z

The docs say "we provide the width in columns". For characters in X, Y, Z categories, we do A, B, C. AIUI Emoji don't really fall into those categories, so I'd naively expect the result to be whatever makes the most sense (if there is one such result). Depending on the user's system -- whether the compound emoji can be rendered properly or not (in which case, it shows up as two separate emoji) -- the computed width will be different. The crate picks the width you'd get when it shows as split up, which is a reasonable choice.

However, since there are two reasonable answers here, I think if the precise scope and limitations of the crate were made clearer, then the behavior for compound emoji wouldn't be an issue. I'm happy to open a PR to add this clarification if you agree.

Manishearth · 2018-10-11T21:43:30Z

if there is one such result

There kinda isn't, the concept of "width" you're asking for is a matter of font, as well as the context (many terminals will not use emoji presentation, which means those will display as two)

The crate does already mention that it follows the UTS 11 rules. Feel free to add to the readme that this may not match actual rendered column width.

canndrew · 2019-03-29T03:51:21Z

I'd been using this crate on the assumption that UnicodeWidthStr::width would give the actual displayed width in columns. It's a shame that that assumption doesn't hold :/

Is there a non-trivial subset of strings for which the displayed column width is exactly specified and we can rely on it being accurate for any standards-compliant terminal? If so, can we add another method to UnicodeWidthStr which returns an Option<usize>? That way my terminal GUI library can know when it might have lost track of the cursor position.

keidax · 2019-04-03T03:49:18Z

In regards to UAX #11, the recommendations state

UTS51 emoji presentation sequences behave as though they were East Asian Wide, regardless of their assigned East_Asian_Width property value.

and as best as I can tell from this definition, "\u{2764}\u{fe0f}" would be a valid emoji presentation sequence.

In other words, it seems like the most "correct" behavior for a character with a text presentation by default, like U+2764, would be

assert_eq!(1, UnicodeWidthStr::width("\u{2764}"));
assert_eq!(1, UnicodeWidthStr::width("\u{2764}\u{fe0e}"));
assert_eq!(2, UnicodeWidthStr::width("\u{2764}\u{fe0f}"));

And for a character with an emoji presentation by default:

assert_eq!(2, UnicodeWidthStr::width("\u{26a1}"));
assert_eq!(1, UnicodeWidthStr::width("\u{26a1}\u{fe0e}"));
assert_eq!(2, UnicodeWidthStr::width("\u{26a1}\u{fe0f}"));

Of course, the rendering of this also seems to vary by OS and browser:
❤
❤︎
❤️
⚡
⚡︎
⚡️

wez · 2019-11-05T16:44:33Z

I don't really know much about this space, but here's my attempt at dealing with this in a terminal emulator.

/// Returns the number of cells visually occupied by a sequence
/// of graphemes
pub fn unicode_column_width(s: &str) -> usize {
    use unicode_segmentation::UnicodeSegmentation;
    s.graphemes(true).map(grapheme_column_width).sum()
}

/// Returns the number of cells visually occupied by a grapheme.
/// The input string must be a single grapheme.
pub fn grapheme_column_width(s: &str) -> usize {
    // Due to this issue:
    // https://github.com/unicode-rs/unicode-width/issues/4
    // we cannot simply use the unicode-width crate to compute
    // the desired value.
    // Let's check for emoji-ness for ourselves first
    use xi_unicode::EmojiExt;
    for c in s.chars() {
        if c.is_emoji_modifier_base() || c.is_emoji_modifier() {
            // treat modifier sequences as double wide
            return 2;
        }
    }
    UnicodeWidthStr::width(s)
}

I noticed while scrolling `emoji-test.txt` that some of the combined emoji sequences rendered very poorly. This was due to the unicode width being reported as up to 4 in some cases. Digging into it, I discovered that the unicode width crate uses a standard calculation that doesn't take emoji combination sequences into account (see unicode-rs/unicode-width#4). This commit takes a dep on the xi-unicode crate as a lightweight way to gain access to emoji tables and test whether a given grapheme is part of a combining sequence of emoji.

worldmind · 2019-12-27T08:02:30Z

Not sure, but suppose that example from this article related to this issue:

fn main() {
    println!("{}", "🤦🏼‍♂️".width());
}

returns 5, but article author think that it must be 2

Manishearth · 2019-12-27T08:05:40Z

Right, this crate is dealing with a different notion of width.

christianparpart · 2020-06-11T22:59:12Z

@keidax is actually right. I came here not as a rust dev, but more as a VTE dev, because I actually forgot where in the huge mass of unicode (emoji) specs I was reading that emoji presentation is always considered to be east Asian wide (2 columns in mono spaced fonts). -- so thanks for also having provided the links @keidax.

Sadly many VTEs and even client apps are still getting this wrong, but it seems to shift slightly (Kitty for example gets a lot of it right).

Jules-Bertholet · 2024-04-23T22:20:05Z

#41 added support for U+FE0F. (Emoji ZWJ sequences and skintone modifiers remain unsupported, however.)

typesanitizer mentioned this issue Oct 11, 2018

Add a possible issue to the README. #7

Merged

chrisduerr mentioned this issue Jul 22, 2019

Superhero emoji width #11

Closed

Manishearth closed this as completed Dec 27, 2019

bbqsrc mentioned this issue Oct 1, 2020

Emoji in default output causes width calculation to be incorrect cucumber-rs/cucumber#71

Closed

stephen-huan mentioned this issue Jun 18, 2022

Certain double-width unicode emoji characters are treated as single-width alacritty/alacritty#6144

Closed

kirawi mentioned this issue Jan 12, 2023

Rendering issue/glitches with files containing emojis on Windows helix-editor/helix#4932

Closed

joshka mentioned this issue Aug 29, 2023

Buffer: unicode-width and emojis ratatui/ratatui#75

Closed

BenceSzalai mentioned this issue Sep 5, 2023

Inconsistent spacing around emojis when displayed in terminal commitizen/cz-cli#815

Open

Yomguithereal mentioned this issue Sep 29, 2023

Padding and truncating issues wrt emojis in terminal medialab/xan#59

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emoji width #4

Emoji width #4

gwenn commented Mar 6, 2016

kwantam commented Mar 7, 2016

casey commented Apr 3, 2017

gwenn commented May 1, 2017

casey commented May 2, 2017

ogham commented May 17, 2017

typesanitizer commented Oct 11, 2018

Manishearth commented Oct 11, 2018

typesanitizer commented Oct 11, 2018

Manishearth commented Oct 11, 2018

canndrew commented Mar 29, 2019

keidax commented Apr 3, 2019

wez commented Nov 5, 2019

worldmind commented Dec 27, 2019

Manishearth commented Dec 27, 2019

christianparpart commented Jun 11, 2020

Jules-Bertholet commented Apr 23, 2024

Emoji width #4

Emoji width #4

Comments

gwenn commented Mar 6, 2016

kwantam commented Mar 7, 2016

casey commented Apr 3, 2017

gwenn commented May 1, 2017

casey commented May 2, 2017

ogham commented May 17, 2017

typesanitizer commented Oct 11, 2018

Manishearth commented Oct 11, 2018

typesanitizer commented Oct 11, 2018

Manishearth commented Oct 11, 2018

canndrew commented Mar 29, 2019

keidax commented Apr 3, 2019

wez commented Nov 5, 2019

worldmind commented Dec 27, 2019

Manishearth commented Dec 27, 2019

christianparpart commented Jun 11, 2020

Jules-Bertholet commented Apr 23, 2024