magic width modification considered harmful #4

mintty · 2024-03-05T10:28:16Z

Several sections assume that width of a grapheme cluster can be modified in magic ways:

6.1: variation selector 16 (VS16) that may have caused the width of the grapheme cluster to change to wide (2 grid cells)

6.2: Emoji symbols are always rendered in ... 2 grid cells.

6.3: VS16 ... will force the grapheme cluster’s width to be 2

This is a serious problem in terminals. The traditional way for applications to know the width of its own output is based on the locale mechanism, addressed via function wcwidth/wcswidth. While this method has limitations, esp. in a remote login scenario, no other reliable method has been established.
So I suggest the reference for actual width assumption should be the width given by these functions, even taking some unpleasant visual consequences into account. How else could an application know how wide its output actually appears? Look at https://unicode.org/reports/tr51/#Display :
The width of the pirate flag 🏴‍☠️shown there depends on the actual availability of its emoji in the current environment (font or platform).
This information is inaccessible to a terminal application.
(The Unicode specification does not really address terminal display.)
So applications like editors would be confused and frequently display garbage.
The mintty terminal goes another way: It takes the locale width authoritative and adjusts emoji display to the width thus determined.
This can mean that single-width emojis appear squeezed into a single cell and that emoji sequences can take up to 8 character cells. For some compensation in style preference, mintty offers placement options to align, center, or expand the emojis to their consumed space.
While, as said above, this may not be pleasant, it is the only way I see to achieve consistent screen handling and avoid garbage caused by unforeseeable positioning. The only alternative would be that an application requests cursor position reports after every questionable grapheme output which would slow down screen output considerably.

Also in particular

6.2: ... cursor will always move by 2 grid cells.

contradicts traditional cursor behaviour and would result in unreliable position assumptions. Cursor position inside a multi-column glyhp needs to be maintained somehow. Output of text there would then break the previous glyph (also for non-emoji glyphs like East Asian).

j4james · 2024-03-05T12:00:13Z

This can mean that single-width emojis appear squeezed into a single cell and that emoji sequences can take up to 8 character cells.

This was the one of the main reasons for creating this protocol in the first place: terminals want a way to display things like the pirate flag nicely without breaking backwards compatibility, and that's why there is a mode for it. Apps that expect widths to be calculated with the old wcwidth/wcswidth algorithm would simply not set mode ?2027, and everything should work exactly as it's always done.

But if a terminal claims to be able to support mode ?2027, and an application requests that support, then they should be able to rely on the terminal honoring that request in a predictable manner. If a particular font doesn't support the pirate flag emoji then worst case you just display a question mark or whatever fallback you'd use for any other missing glyph. The key thing is that you use the correct width calculation so that the layout of the page doesn't break.

If you don't think this is a good idea, then just don't support this mode, and carry on doing your own thing.

mintty · 2024-03-05T15:32:02Z

OK, 2 points:

The description should make a clear statement about the intention and that it describes a mode that breaks compatibility with traditional terminal width handling.
A complete and consistent definition of string widths should be aspired.
Example:
U+1F468 (man) has width 2
U+1F680 (rocket) has width 2
Their cluster U+1F468 U+200D U+1F680 (astronaut) shall have width 2 in your model - understandable.
But what about, let's say U+1F468 U+200D U+1F680 (two men). There's no emoji specified for it right now, so if I understand correctly your model would leave it two separate emojis, width 4.
Now what if the next Unicode version introduces that missing emoji, so the width changes from 4 to 2?

j4james · 2024-03-07T14:32:06Z

I didn't write the spec, and I'm not overly familiar with all the details, so I'm not the best person to answer these questions. However, regarding your second point, I would have thought a combination like U+1F468 U+200D U+1F680 should be interpreted as a single emoji, even if it isn't yet defined as such, or your font doesn't have an appropriate glyph to render it.

If an application has requested mode ?2027, and then deliberately used a zero-width joiner between two characters, I think that's a pretty good indication that they're expecting those characters to be joined together, and would expect them to occupy the same space as a single emoji. And that assumedly solves your problem of what happens when Unicode introduces a new combination emoji.

But again I'm not an expert on this subject - hopefully somebody else will chime in here with a more authoritative answer.

mintty · 2024-05-07T09:44:42Z

Having checked all emoji sequences, it seems that the issue could be resolved by two simple rules:
Text output while "Unicode width mode" is enabled will have the following width-rendering modifications:

Appending U+FE0F as a combining character changes any character to double-width.
Zero-width joiner U+200D forces the subsequent character to also be treated as a combining character (thus not add any width).

Both rules would need to be applied to any character sequence, regardless of whether it has an emoji definition, and regardless of whether it has a glyph in the current font. So the following resulting examples should be checked:

"a U+FE0F" would render as an expanded double-width a.
something like "a U+200D b" which now renders as ab would render single-width with undefined appearance.
The "enclosing keycap" sequences would only render double-width if also combined with U+FE0F (both variations are listed as emojis by Unicode).

It's unclear whether additional variation selectors might affect width as recently discussed in https://gitlab.freedesktop.org/terminal-wg/specifications/-/issues/9.
Note that for wider attention to this specification proposal, I'd suggest to raise it as an issue in that Terminal Working Group Specifications.

mintty · 2024-07-23T11:30:51Z

I've created an issue at https://gitlab.freedesktop.org/terminal-wg/specifications/-/issues/36.

mintty · 2024-07-23T11:48:49Z

I've implemented this experimentally for mintty. Switching is done with DECSET 7769 for now. 2027 should not be used as it was in used by some terminas for something different.

christianparpart · 2024-07-23T15:24:10Z

I've implemented this experimentally for mintty. Switching is done with DECSET 7769 for now. 2027 should not be used as it was in used by some terminas for something different.

I think this is pretty bad. Why do you want to deviate from 2027? To me it seems to be quite well accepted by those who where involved already. Why changing then? This only creates friction. :(

mintty · 2024-07-23T20:23:54Z

I think this is pretty bad. Why do you want to deviate from 2027?

It seems this is actually your own fault. See mintty/mintty#1255 (comment),
2027 was in use for something else by Contour, and thus also by mintty. However, I had deprecated it for mintty already and now dropped, so there is an opportunity to switch to it. Anyway, overlaying controls is always a bad idea, so I rather thought this one is now burnt.

To me it seems to be quite well accepted by those who where involved already.

Please list terminals that use 2027 for emoji width mode already.

j4james · 2024-07-23T22:54:34Z

Please list terminals that use 2027 for emoji width mode already.

These are some that I'm aware of:

Also looks like it's used in the following applications/libraries:

mintty · 2024-07-24T05:19:19Z

Thank you. So I'll switch for the release.
I'd also appreciate feedback on the way I implemented it, particularly rule 0 in https://gitlab.freedesktop.org/terminal-wg/specifications/-/issues/36.

mintty · 2024-07-24T05:26:15Z

By the way, do you also have an overview which terminals implement auto-rewrap and which use 2028 for it?

christianparpart · 2024-07-24T08:29:35Z

I think this is pretty bad. Why do you want to deviate from 2027?
It seems this is actually your own fault. See mintty/mintty#1255 (comment)

but that was fixed in January already. We have July. It's still quite an early stage (given the fact, that the ecosystem is moving slowly)

EDIT: thanks for the summary, @j4james :)
I'd also rather like to update the spec I was writing rather than diverging from it. I yet have to look what you wrote up there. But let's find a common ground. I intentionally made it its own repo for better and more neutral collaboration

mintty · 2024-07-24T10:00:16Z

I'll support both mode settings for now as it's not yet clear whether implementations will be compatible.
See https://github.com/mintty/mintty/blob/master/wiki/CtrlSeqs.md#emoji-width-mode for the mintty feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

magic width modification considered harmful #4

magic width modification considered harmful #4

mintty commented Mar 5, 2024

j4james commented Mar 5, 2024

mintty commented Mar 5, 2024

j4james commented Mar 7, 2024

mintty commented May 7, 2024

mintty commented Jul 23, 2024

mintty commented Jul 23, 2024

christianparpart commented Jul 23, 2024

mintty commented Jul 23, 2024 •

edited

Loading

j4james commented Jul 23, 2024

mintty commented Jul 24, 2024

mintty commented Jul 24, 2024

christianparpart commented Jul 24, 2024 •

edited

Loading

mintty commented Jul 24, 2024

magic width modification considered harmful #4

magic width modification considered harmful #4

Comments

mintty commented Mar 5, 2024

j4james commented Mar 5, 2024

mintty commented Mar 5, 2024

j4james commented Mar 7, 2024

mintty commented May 7, 2024

mintty commented Jul 23, 2024

mintty commented Jul 23, 2024

christianparpart commented Jul 23, 2024

mintty commented Jul 23, 2024 • edited Loading

j4james commented Jul 23, 2024

mintty commented Jul 24, 2024

mintty commented Jul 24, 2024

christianparpart commented Jul 24, 2024 • edited Loading

mintty commented Jul 24, 2024

mintty commented Jul 23, 2024 •

edited

Loading

christianparpart commented Jul 24, 2024 •

edited

Loading