-
-
Notifications
You must be signed in to change notification settings - Fork 767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICU-22564 Java API for the host env to register a low level break eng… #2697
base: main
Are you sure you want to change the base?
Conversation
…ine to break CJ + Southeastern Asia script
@richgillam Somehow my FakeYueBreakEngine break the PersonNameConsistencyTest on yue.txt and yue_Hans.txt . I am suprise that changes to BreakIterator will impact the outcome of PersonName. Could you explain how does PersonName depends on BreakIterator? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PersonName is using BreakIterator for a few things, as I remember it:
The first two don't seem relevant to Yue, but the third one might be (if it's real-- I don't remember for sure now). Some of the foreign-name support might be triggering the capitalization/initial logic even though it's not strictly relevant (e.g., formatting Yue with an English person name formatter). If you can tell me more about which tests are failing, what they expect, and what you're getting instead, I can help you track the problem down (if you need help, of course). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that what we agreed to last week? I thought I came away from that discussion thinking we were adding a third method to say whether the break engine supported a locale, not turning the isFor()
method into that method by getting rid of the c
parameter. Do we no longer need an isFor()
function that tests a character to see if it can begin a run of characters to be broken?
* Register a new external break engine. The external break engine will be adopted. | ||
* Because ICU may choose to cache break engine internally, this must | ||
* be called at application startup, prior to any calls to | ||
* object methods of RuleBasedBreakIterator to avoid undefined behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Register a new external break engine. The external break engine will be adopted. | |
* Because ICU may choose to cache break engine internally, this must | |
* be called at application startup, prior to any calls to | |
* object methods of RuleBasedBreakIterator to avoid undefined behavior. | |
* Register a new external break engine. The external break engine will be adopted. | |
* Because ICU may choose to cache break engines internally, | |
* to avoid undefined behavior this must | |
* be called at application startup, prior to any calls to | |
* object methods of RuleBasedBreakIterator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but someone more familiar with the break iterator api should review.
* @param text A CharacterIterator representing the text | ||
* @param rangeStart The start of the range of known characters | ||
* @param rangeEnd The end of the range of known characters | ||
* @param foundBreaks Output of a list of Integer to denote break positions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should specify whether the contents of the List must be cleared before calling, or whether the fillBreaks is required to clear internally.
AFAIK @FrankYFTang has not yet addressed the feedback on his proposal from last year, and hasn't yet sent an updated proposal. |
…ine to break CJ + Southeastern Asia script
Checklist