Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EastAsianWidth.txt Format Change after Unicode 15.1.0 #585

Merged
merged 3 commits into from
Aug 23, 2023

Conversation

elfham
Copy link
Contributor

@elfham elfham commented Aug 22, 2023

The format of EastAsianWidth.txt seems to change after Unicode 15.1.0 (draft).

15.0.0: https://www.unicode.org/Public/15.0.0/ucd/EastAsianWidth.txt

# For legacy reasons, there are no spaces before or after the semicolon
# which separates the two fields. The comments following the number sign
# "#" list the General_Category property value or the L& alias of the
# derived value LC, the Unicode character name or names, and, in lines
# with ranges of code points, the code point count in square brackets.
#
# For more information, see UAX #11: East Asian Width,
# at https://www.unicode.org/reports/tr11/
#
# @missing: 0000..10FFFF; N
0000..001F;N     # Cc    [32] <control-0000>..<control-001F>
0020;Na          # Zs         SPACE
0021..0023;Na    # Po     [3] EXCLAMATION MARK..NUMBER SIGN
0024;Na          # Sc         DOLLAR SIGN

15.1.0 draft: https://www.unicode.org/Public/draft/UCD/ucd/EastAsianWidth.txt

# The comments following the number sign "#" list the General_Category
# property value or the L& alias of the derived value LC, the Unicode
# character name or names, and, in lines with ranges of code points,
# the code point count in square brackets.
#
# For more information, see UAX #11: East Asian Width,
# at https://www.unicode.org/reports/tr11/
#
# @missing: 0000..10FFFF; N
0000..001F     ; N  # Cc    [32] <control-0000>..<control-001F>
0020           ; Na # Zs         SPACE
0021..0023     ; Na # Po     [3] EXCLAMATION MARK..NUMBER SIGN
0024           ; Na # Sc         DOLLAR SIGN

Therefore, bin/generate_east_asian_width will fail.

% ruby bin/generate_east_asian_width EastAsianWidth-15.1.0-draft.txt
bin/generate_east_asian_width:15:in `block (2 levels) in <main>': undefined method `to_sym' for nil:NilClass (NoMethodError)

    type = type.to_sym
               ^^^^^^^
        from bin/generate_east_asian_width:10:in `each_line'
        from bin/generate_east_asian_width:10:in `block in <main>'
        from bin/generate_east_asian_width:8:in `open'
        from bin/generate_east_asian_width:8:in `<main>'
%

So, I fix this.

Test

Convert the current version 15.0.0.

% ruby bin/generate_east_asian_width EastAsianWidth-15.0.0.txt > eaw-15.0.0.rb
% diff lib/reline/unicode/east_asian_width.rb eaw-15.0.0.rb
3c3
<   # EastAsianWidth.txt
---
>   # EastAsianWidth-15.0.0.txt
%

The same as the current version is generated.

Convert the draft version 15.1.0.

% ruby bin/generate_east_asian_width EastAsianWidth-15.1.0-draft.txt > eaw-15.1.0.rb
% diff eaw-15.0.0.rb eaw-15.1.0.rb
3c3
<   # EastAsianWidth-15.0.0.txt
---
>   # EastAsianWidth-15.1.0-draft.txt
63c63
<     \u{2FF0}-\u{2FFB}
---
>     \u{2FF0}-\u{2FFF}
70c70
<     \u{31F0}-\u{321E}
---
>     \u{31EF}-\u{321E}
%

Probably OK.

By the way, this EastAsianWidth.txt is still a draft, so it does not seem to include any New CJK Ideographs, etc.

@tompng
Copy link
Member

tompng commented Aug 23, 2023

Thank you. The implementation looks good 👍
One thing, could it be m = a.match(b) and m[c] instead of Regexp.last_match(c) ?

Regexp.last_match is not used in this repository.
In some code in ruby/irb, Regexp.last_match is used in this form that Regexp#match can't be used.

when /regexp1/
  puts :a, Regexp.last_match(1)
when /regexp2/
  puts :b, Regexp.last_match(2)
end

@elfham
Copy link
Contributor Author

elfham commented Aug 23, 2023

Thank you for the suggestion.
I have corrected the code.

Copy link
Member

@tompng tompng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@tompng tompng merged commit 7d2084b into ruby:master Aug 23, 2023
30 checks passed
@elfham elfham deleted the fix-generate_east_asian_width-15_1_0 branch August 23, 2023 16:20
@st0012 st0012 added the bug Something isn't working label Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

Successfully merging this pull request may close these issues.

3 participants