Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String performance #5

Open
walterdejong opened this issue Oct 19, 2016 · 1 comment
Open

String performance #5

walterdejong opened this issue Oct 19, 2016 · 1 comment

Comments

@walterdejong
Copy link
Owner

The performance of the String class is rather poor. This is because the methods call utf8_decode() all the time. This is a consequence of the design decision to have the String be an UTF-8 string internally and have it present itself as a string of characters rather than bytes.

It's probably better to have both a UTF-8 byte-string String class and a UTF-32 String32 or uString class and let the programmer decide what she wants to use.
For example, like in Python:

>>> s = '普通话/普通話'
>>> s
'\xe6\x99\xae\xe9\x80\x9a\xe8\xaf\x9d/\xe6\x99\xae\xe9\x80\x9a\xe8\xa9\xb1'
>>> len(s)
19
>>> s[0]
'\xe6'
>>> s[1]
'\x99'
>>> s[2]
'\xae'

>>> us = u'普通话/普通話'
>>> len(us)
7
>>> us[0]
u'\u666e'

(This example demonstrates behavior of len and operator[]).

Note that changing the design of String is a major change that would break backwards compatibility.

walterdejong added a commit that referenced this issue Oct 22, 2016
This fixes performance issue mentioned in github issue #5
@walterdejong
Copy link
Owner Author

Commit 05b020b presumably fixes this. It changes the String class as described.
There is no String32 class as of yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant