You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Crystal currently relies on iconv or GNU libiconv for conversions between text encodings. This has a few problems:
iconv does not guarantee the support for any encoding at all, yet it doesn't provide a standard way to query or enumerate this information. (The nonstandard iconvlist or libiconvlist is present in BSD libc and GNU libiconv respectively.) For all we know, an iconv implementation that doesn't support UTF-8 nor UTF-16 is still POSIX-compliant. The same goes for the invalid: :skip option.
The standard library already has separate APIs to deal with UTF-16, and technically UTF-32 too if we consider Char to be equivalent to Int32, yet they are not integrated into the usual transcoding APIs like String#encode and IO#set_encoding. In particular, it makes sense that these encodings should remain supported in those places, even when -Dwithout_iconv is defined.
Some system iconv implementations are known to be buggy, such as the macOS one and the Android one (Bionic libc, API level 28+).
GNU libiconv being licensed under LGPLv2.1 complicates certain deployment scenarios.
The essence of, for example, UTF-16 to UTF-8 conversion can be implemented on top of iconv's function signature as:
Going in the opposite direction would need something like #13639 to be equally concise, but the point is that we could indeed achieve this without using iconv at all. If both the source and destination encodings are one of UTF-8, UTF-16, UTF-32, or maybe ASCII, then we could use our own native transcoders instead of iconv; or if we are ambitious enough, we could port the entire set of ICU character set mapping tables in an automated manner, and remove our dependency on iconv.
The text was updated successfully, but these errors were encountered:
ICU4C's ucnv_* API, main problem is either the source or the destination has to be UTF-16
encoding_c, bindings for the encoding_rs Rust library implementing the W3C Encoding Standard (by the way this is a good baseline of what a standard library should probably provide if we end up not having the same coverage as GNU libiconv)
The W3C Encoding Standard already sets the bar quite high, but seems to support a good list of general encodings 👍
There's a part 2 to the comparison article that focuses on C and presents ztd.cuneicode. I'm not saying we should use it, but it sounds like a solid reference, and both articles are treasure trove of information.
Crystal currently relies on iconv or GNU libiconv for conversions between text encodings. This has a few problems:
iconvlist
orlibiconvlist
is present in BSD libc and GNU libiconv respectively.) For all we know, an iconv implementation that doesn't support UTF-8 nor UTF-16 is still POSIX-compliant. The same goes for theinvalid: :skip
option.Char
to be equivalent toInt32
, yet they are not integrated into the usual transcoding APIs likeString#encode
andIO#set_encoding
. In particular, it makes sense that these encodings should remain supported in those places, even when-Dwithout_iconv
is defined.The essence of, for example, UTF-16 to UTF-8 conversion can be implemented on top of
iconv
's function signature as:Going in the opposite direction would need something like #13639 to be equally concise, but the point is that we could indeed achieve this without using iconv at all. If both the source and destination encodings are one of UTF-8, UTF-16, UTF-32, or maybe ASCII, then we could use our own native transcoders instead of iconv; or if we are ambitious enough, we could port the entire set of ICU character set mapping tables in an automated manner, and remove our dependency on iconv.
The text was updated successfully, but these errors were encountered: