Skip to content
Pannous edited this page Dec 10, 2022 · 3 revisions

char

In Angle a char is a unicode character represented as char32_t ≈ uint32. This is different to c, where char is an ascii byte.

See string for character sequences

counting characters

because there are many different aspects in the size of characters sequences, angle has different ways of counting and iteration : count bytes of "abc" == 3 count chars of "aβc" == 3 count graphemes of "⚠️βć" == 3 todo count footprint of "aβc" == 31 # memory footprint of object including meta fields

graphemes

We intentionally boycott control codes that influence the color/appearence of characters, like 🫲 ≈ 🫲🏻 ⚠️ ≈ ⚠ + ef b8 8f

Of cause these can appear in angle strings, just don't use these where string indexing or manipultation is required. If you make use of these and rely on safe string manipultation, please use a third party library or wait for the grapheme iterator to be implemented.

direct access

Internally byte chars are known and used for optimization:

Bracket indexing is as a general rule close to metal:

"abc"[1]='B' manipultates the byte sequence blindly, whereas "abc"#2='β' sets the character unicode safely.

"abc"[1]='β' should give a strong compiler warning! ⚠️ todo inject codepoint or internally switch from utf-8 to codepoint representation

Likewise "aβc"#2 == 'β' is safe but "aβc"[3] can yield unexpected results.

internals

The internal representation of strings as utf-8 sequence or char32_t sequence should be completely oblique to users/developers except for bare metal indexing.

Remeber:

In angle, char is shorthand for utf-8 character ( codepoint ), different to unsigned int of 8 bit == byte (historically ascii-char with ill defined 0x80-0xFF latin ... range)

Home

Philosophy

data & code blocks

features

inventions

evaluation

keywords

iteration

tasks

examples

todo : bad ideas and open questions

⚠️ specification and progress are out of sync

Clone this wiki locally