Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IsKatakana / IsHiragana being too permissive? #35

Open
richardgarnier opened this issue Apr 13, 2020 · 0 comments
Open

IsKatakana / IsHiragana being too permissive? #35

richardgarnier opened this issue Apr 13, 2020 · 0 comments

Comments

@richardgarnier
Copy link

richardgarnier commented Apr 13, 2020

While working on the half-width support, I noticed that IsKatakana (as well as IsHiragana) are based on golang utf8 tables, and the range being used a probably too wide for what a japanese speaker would consider being a katakana or not.

For example, this was slightly unexpected:

  • IsKatakana("ㇰ") = true // attention ㇰ != ク and ㇰ != ク
  • IsKatakana("ウカ") = true

This even more:

  • IsKatakana("㋾") = true

And I would say this is wrong:

  • IsKatakana("㍓") = true

IsHiragana as fewer kirks, but still funny yet unexpected behavior. For example:

  • IsHiragana("🈀") = true
  • IsHiragana("𛁟") = true // \u1b05f

For IsKanji, I also suspect the range to be wider than what would make sense to a human reader, but considering the difficulty to put any kind of boundary to it, I've skipped checking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant