You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the GLSL 4.6 specification, add the following paragraph to the 3.1 section:
The given files for compilation must be in the form of a well-formed UTF-8 code unit sequence. These files are decoded to produce their corresponding sequence of Unicode scalar values. A sequence of character set tokens is then formed by mapping each Unicode scalar value to the corresponding character set token. In the resulting sequence, each pair of characters in the input sequence consisting of U+000D CARRIAGE RETURN followed by U+000A LINE FEED, as well as each U+000D CARRIAGE RETURN not immediately followed by a U+000A LINE FEED, is replaced by a single new-line character.
The text was updated successfully, but these errors were encountered:
I'm not sure what ambiguity you're aiming to clear up here, perhaps because I'm not sufficiently knowledgeable about UTF-8. Is there an alternative way of interpreting a UTF-8 sequence other than what you describe? I'm fine with spelling things out clearly, but this seems to be straying into territory that should be covered by the UTF-8 spec, rather than GLSL.
One specific concern that I have, for example, is that the proposed text talks about mapping the UTF-8 characters into the character set but doesn't say what the mapping is. I think that the UTF-8 codepoints actually already represent the characters, so don't need mapping, which is why the correct mapping is obvious, but if they're different enough to require mapping then we should say what the mapping is.
I'm not convinced that the handling of new lines in the proposed text is correct according to the current spec. GLSL currently says that any of "\r", "\n" or "\r\n" are a valid line break, which isn't the same as in your comment. I'm not sure what glslang implements for this.
It looks like glslang currently treats "\n" or "\r\n" as line terminators, the situation with bare "\r" is more complicated in that I think it will not produce syntax errors but also will not give the right numbers. Note that the spec actually limits the valid characters in GLSL tokens to (a subset of) ASCII and the core language does not have strings. The GLSL_EXT_debug_printf extension does add string literals but the extension spec language still does not allow the use of codepoints above 126 in tokens, so the only place where non-ASCII characters can occur is in comments, where the current spec allows allows any byte values and doesn't require well-formed UTF-8. In practice, glslang doesn't enforce this and just accepts any sequence of bytes in a string literal (or in a header name in a #include, another place where arbitrary strings are allowed).
At the GLSL 4.6 specification, add the following paragraph to the 3.1 section:
The text was updated successfully, but these errors were encountered: