-
-
Notifications
You must be signed in to change notification settings - Fork 763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICU-23004: experiment with UTF-8/16 C++ iterators #3096
base: main
Are you sure you want to change the base?
Conversation
2ce8c27
to
1bcd5ee
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
Hi @eggrobin I think this is worth taking another look. I rebased on recent main, made changes from our discussions, and I think this looks roughly like a reasonable validating, forward-only (so far) Unicode 16-bit-string code point iterator. It no longer tries to be clever: It no longer reads & validates the code point while iterating, and no longer stores the result in the iterator. Plenty of TODOs and questions left, but I would appreciate feedback on the shape of what I've got so far. |
I experimented with godbolt, and found that the compiler does its best fusing operator*() and operator++() when they both call the same implementation function. This makes operator++() look horribly inefficient, but the machine code for a regular range-based for loop from the optimizing clang 19 looks very concise. I then also made the iterator bidirectional and added a special version for efficient rbegin() & rend() using the same principles. The bidirectional iterator also exposes explicit but non-colloquial functions. |
(As noted over email, I’ll take a look on Monday when I’m back from the holidays.) |
Checklist