-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interpreting grapheme clusters when calculating width breaks on most terminal emulators #826
Comments
See #184 |
Thanks, I should have searched for “width” instead of “grapheme”. |
I tried again with iTerm2, making sure to update to the latest version (3.5.9). The behavior is actually a bit different from other terminals. In my original post, I had only checked that my iTerm2 did not interpret ZJWs when pasing in Zsh/Python3. But the ZJWs are actually taken into account when running in Rustyline! At least for the version without any skin color variations: The first emoji seems to come from Apple giving up on ZJW and using a generic image. However, the version with skin color variation is still rendered as individual emojis. Could you check how it looks for you with, e.g. 👩🏼👨🏼👦🏼👦🏼? In any case, interpreting that first one as having a width of 8 would definitely break things there. One option would be to check for |
Maybe we can check if current terminal supports emoji with this: |
There seems to be three modes:
|
|
https://github.com/jtdaugherty/vty?tab=readme-ov-file#multi-column-character-support
|
tl;dr:
calculate_position
should not use the lengths of graphemes as provided by unicode-width, but instead use the sum of the widths of the codepoints.At least on Unix, when calculating the width of displayed characters, rustline uses grapheme segmentation.
However, using the minimal example and pasting
👨👩👧👦
(\u{1f468}\u{200d}\u{1f469}\u{200d}\u{1f467}\u{200d}\u{1f466}
), and then typingA
, results in the following output:This is because my terminal does not interpret the ZERO WIDTH JOINER (U+200D). In fact, I was able to reproduce this behavior in the following terminal emulators:
Edit: Regarding the sentence betlow, the UAX #11 actually says nothing about graphemes. It mostly talks about CJK characters and half-width variants, which do not require grapheme handling either. In fact UTS #51 says that the handling of the ZERO WIDTH JOINER can vary by platform. So what we are seeing is a choice made by
unicode-width
.rustyline
might not want to follow it, and use the sum of the widths of the individual code points instead.Unicode does say that the full grapheme should be considered, andunicode-width implement it so:outputs:
The first line looks correct in a graphical browser, but this is what I actually see:
Also note that this is not about legacy vs extended graphemes. ZERO WIDTH JOINER is considered in both.
If I remove the
.graphemes(true)
part fromcalculate_position
(and adapt the code to use codepoints instead of grapheme clusters), I achieve the expected behavior:Are there cases where we do need to use grapheme clusters when calculating widths? That is, either:
The text was updated successfully, but these errors were encountered: