You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The performance of the String class is rather poor. This is because the methods call utf8_decode() all the time. This is a consequence of the design decision to have the String be an UTF-8 string internally and have it present itself as a string of characters rather than bytes.
It's probably better to have both a UTF-8 byte-string String class and a UTF-32 String32 or uString class and let the programmer decide what she wants to use.
For example, like in Python:
>>> s = '普通话/普通話'
>>> s
'\xe6\x99\xae\xe9\x80\x9a\xe8\xaf\x9d/\xe6\x99\xae\xe9\x80\x9a\xe8\xa9\xb1'
>>> len(s)
19
>>> s[0]
'\xe6'
>>> s[1]
'\x99'
>>> s[2]
'\xae'
>>> us = u'普通话/普通話'
>>> len(us)
7
>>> us[0]
u'\u666e'
(This example demonstrates behavior of len and operator[]).
Note that changing the design of String is a major change that would break backwards compatibility.
The text was updated successfully, but these errors were encountered:
The performance of the
String
class is rather poor. This is because the methods callutf8_decode()
all the time. This is a consequence of the design decision to have theString
be an UTF-8 string internally and have it present itself as a string of characters rather than bytes.It's probably better to have both a UTF-8 byte-string
String
class and a UTF-32String32
oruString
class and let the programmer decide what she wants to use.For example, like in Python:
(This example demonstrates behavior of
len
andoperator[]
).Note that changing the design of
String
is a major change that would break backwards compatibility.The text was updated successfully, but these errors were encountered: