-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode 15.1 linebreaking #48
Conversation
This is an experimental implementation of the line breaking rules proposed in the Unicode document L2/22-080R. It is not suitable for merging into ICU main. Limitations: - ICU4C only. - Root locale only (not implemented for the various LB tailorings). - New Line Break properties implemented with hard-coded UnicodeSets. (unmaintainable) - RBBIMonkeyTest not updated. (There are two ICU monkey tests; the other is updated.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- thanks!!
- looks plausible to me
- I don't pretend to have reviewed the rules or logic.
- some code style comments
UnicodeString(rules, -1, US_INV), 0, status); | ||
UnicodeString CMx {uR"([[\p{Line_Break=CM}]\u200d])"}; | ||
UnicodeString rules; | ||
rules = rules + u"((\\p{Line_Break=PR}|\\p{Line_Break=PO})(" + CMx + u")*)?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional / simpler:
UnicodeString rules =
u"..."
u"..."
u"...";
using C++ string literal concatenation in the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does not work: CMx is not a literal (it would work if we made it a macro; do we want that?).
[175-C23] Consensus: Replace rule LB 15 by LB 15a and LB 15b in UAX #14, as described in L2/23-063 Line breaking around quotation marks, changing the references to the sets [:Pi:] and [:Pf:] to [[:Pi:]&QU] and [[:Pf:]&QU], respectively, for Unicode Version 15.1.
[175-C27] Consensus: Add line breaking classes AF, AK, AP, AS, VI, and VF, as well as a new line breaking rule LB 28b, and change Line_Break property values, as described in L2/23-072.