-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix RT_USE_LOOKUP_TABLE when reading unicode chars #66
Fix RT_USE_LOOKUP_TABLE when reading unicode chars #66
Conversation
sh-runtime.c
Outdated
@@ -371,7 +371,7 @@ DEFINE_RUNTIME_FUN(char_to_int) | |||
putstr(" '}') __c=125 ;;\n"); | |||
putstr(" '~') __c=126 ;;\n"); | |||
putstr(" *)\n"); | |||
putstr(" __c=$(LC_CTYPE=C printf \"%d\" \"'$1\")\n"); | |||
putstr(" __c=$(printf \"%d\" \"'$1\")\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting LC_ALL
sets LC_CTYPE
so we can remove this.
Do we know if |
It's part of the prologue so it's one of the first thing that's executed when the script starts. I don't think any of the commands we use overwrite it, so setting it once should be fine (as the passing CI tests demonstrate). |
> Do we know if LC_ALL=... is guaranteed to immediately affect the locale
> behavior of the current shell process?
It's part of the prologue so it's one of the first thing that's executed
when the script starts. I don't think any of the commands we use overwrite
it, so setting it once should be fine (as the passing CI tests demonstrate).
The problem I point at is that LC_ALL may be read only at startup,
i.e. before executing even the first line of the script.
The var setting will naturally affect all processes launched afterwards,
but in order to affect the current process, the AFAIK shell has to take
extra steps to "watch" this special variable and re-set the libc locale
whenever the var is modified.
Hence my question whether the standard requires this kind of monitoring.
|
This change also has the nice side effect of speeding up ksh by 12%. Other shells show no significant improvement when bootstrapping and in the compile-times benchmarks.
30b6c62
to
5fab88c
Compare
5fab88c
to
eefc97b
Compare
Ah I see what you mean. Setting the variable seems to affect the locale of the current shell for all shells except ➜ cat test-locale.sh
str="é"
case "$str" in
[[:alnum:]]) echo "[:alnum:]";;
[a-zA-Z0-9]) echo "[a-zA-Z0-9]";;
*) echo "other";;
esac
LC_ALL=C
case "$str" in
[[:alnum:]]) echo "[:alnum:]";;
[a-zA-Z0-9]) echo "[a-zA-Z0-9]";;
*) echo "other";;
esac
( # Open subshell
case "$str" in
[[:alnum:]]) echo "[:alnum:]";;
[a-zA-Z0-9]) echo "[a-zA-Z0-9]";;
*) echo "other";;
esac
) # Close subshell
➜ locale
LANG="en_CA.UTF-8"
LC_COLLATE="en_CA.UTF-8"
LC_CTYPE="en_CA.UTF-8"
LC_MESSAGES="en_CA.UTF-8"
LC_MONETARY="en_CA.UTF-8"
LC_NUMERIC="en_CA.UTF-8"
LC_TIME="en_CA.UTF-8"
LC_ALL=
➜ ksh test-locale.sh
[:alnum:]
other
other
➜ dash test-locale.sh
other
other
other
➜ bash test-locale.sh
[:alnum:]
other
other
➜ zsh test-locale.sh
[:alnum:]
other
other
➜ mksh test-locale.sh
other
other
other
➜ yash test-locale.sh
[:alnum:]
[:alnum:] # Wrong, yash does not update the locale when `LC_` variables are set
[:alnum:] # Even more wrong!? I would have expected the locale to be updated in the subshell As the test shows, the implementation of locale varies between shells, with dash and mksh not respecting the Anyway, the PR fixes the bug on all shells except |
Context
I tried to compile examples/repl.c with the
bootstrap-results/pnut.sh
and it failed withline 6774: __c2i_é: parameter not set
. This is because the[a-zA-Z0-9]
pattern can include non-ASCII characters depending on the locale.This change also has the nice side effect of speeding up ksh by 12% taking 28s instead of 32s on my machine. Other shells show no significant improvement when bootstrapping and in the compile-times benchmarks.