-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve printf and simplify runtime #72
Conversation
putstr(" 126) __char=\"~\" ;;\n"); | ||
putstr(" 10) __char=\"\\n\" ;;\n"); | ||
putstr(" *)\n"); | ||
putstr(" echo \"Invalid character code: $1\" ; exit 1\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a proof that this function wasn't very useful (nor well tested), it doesn't even work for all characters!
@@ -829,7 +682,7 @@ DEPENDS_ON(pack_string) | |||
putstr(" : $((__cursor_fd$__fd = 0)) # Make buffer empty\n"); | |||
putstr(" : $((__buflen_fd$__fd = 1000)) # Init buffer length\n"); | |||
putstr(" : $((__state_fd$__fd = $3)) # Mark the fd as opened\n"); | |||
putstr(" pack_string $2\n"); | |||
putstr(" __res=$(_put_pstr __ $2)\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be slower on certain shells, but opening a file is a relatively rare operation so it feels like a good compromise since it allows us to get rid of pack_string
.
putstr(" esac\n"); | ||
DEFINE_RUNTIME_FUN(printf) | ||
DEPENDS_ON(put_pstr) | ||
putstr("read_int() {\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pack_string
was called in a few places to do the equivalent of reading a number. This function is much simpler and easier to understand.
putstr(" done\n"); | ||
putstr("}\n"); | ||
END_RUNTIME_FUN(pack_string) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💣
sh-runtime.c
Outdated
putstr(" __flags=\n"); | ||
putstr(" __width=\n"); | ||
putstr(" __width=\n"); | ||
putstr(" __precision=\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of trying to parse everything at once, this new printf
implementation processes each part of the format specifier in their own iteration. Each iteration advances the state machine encoded by those variables until it hits a terminal case ('d'|'i'|'o'|'u'|'x'|'X'
) in which case the shell's printf
function is called and the machine resets.
bd57796
to
28a3bcd
Compare
28a3bcd
to
63432fc
Compare
int->char conversion are not very common and can often be avoided. In the few cases where they are not (when opening a file and when writing bindings for utilities such as in posix-utils.sh), it is because the C string needs to be passed to a shell function or utility, and the strings in those cases are usually small. It may also be faster to use $(put_pstr $str_ptr) to convert the string since that skips the lookup process all together (by using the `printf \\000` trick) and only uses 1 subshell for the whole string. For the bootstrap process, int->char conversions are not needed at all (except when opening a file), and so removing the lookup table pays for itself in terms of time saved by not having to compile it. On bootstrap-pnut.sh, it saves 0.3s for ksh, 1.6s for dash and 1.5s for bash. In terms of number of lines, this commit removes 106 lines (1046 => 940) from sh-runtime.c and 202 lines from pnut.sh (7265 => 7063).
This function was only used in `printf` and can easily be replaced with calls to `put_pstr` calls. In terms of number of lines, this commit removes 22 lines (940 => 918) from sh-runtime.c and 29 lines from pnut.sh (7063 => 7034).
The previous implementation was badly written and only supported very specific formats. This new implementation relies on the shell's `printf` utility which supports most of the C printf formats making the implementation much simpler and more complete.
63432fc
to
8721a90
Compare
Unlike the string unpacking operation, packing is only required when a C string must be passed a shell command. While bootstrapping pnut, this only happens when opening a file, which happens only a few times and is not performance critical. It is much simpler to reuse the `put_pstr` function, which is already used to output strings, and capture its output. This may be faster on certain shells as it only creates a single subshell for the whole string instead of 1 per character. One downside is that `pack_string` was used in `posix-utils.sh` which may suffer from a slight performance hit. However, a faster `pack_string` could be implemented directly in `posix-utils.sh` if needed. For now, `posix-utils.sh` includes a `pack_string` function that uses `put_pstr`. In terms of number of lines, this commit removes 20 lines (918 => 898) from sh-runtime.c and 52 lines from pnut.sh (7034 => 6982).
The printf unrolling optimization used to only support very specific format specifiers. This commit rewrites it to be much more general, parsing the format string and generating the appropriate code for each specifier while reusing the shell's printf as much as possible. The flags, width and precision are now recognized and handled correctly. The format specifiers %d, %i, %o, %u, %x, %X, %c and %s are compiled into the appropriate shell code.
This makes it possible to recompile pnut with test specific options.
d1888dd
to
786c07e
Compare
Certain versions of ksh do not support the %0s format specifier: # printf "%0s" abc printf: %0s: invalid conversion specification and so padding when using a width parameter with %c must not result in %0s being called.
The cast was wrong and the pointer arithmetic was needlessly complicated. `wrap_str_const` now takes the end of the string as an argument instead of the length of the string which makes everything simpler and no longer rely on undefined behavior.
@@ -1,12 +1,15 @@ | |||
// Yash does not support * for precision like in printf "%.*s" 5 "abc" | |||
// printf_runtime works on yash because * are replaced with values before calling printf | |||
// expect_failure_for: yash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yash's read
and printf
utilities are not POSIX compliant and so this test doesn't pass.
Context
The
printf
function from the runtime library was an incomplete and unmaintainable mess and pulled a lot of code with it. It also took an important part ofsh-runtime.c
which inflated the bootstrap times even whenprintf
was not used as the runtime library function generating it needed to be compiled.This PR rewrites the
printf
function so it supports most of the C format specifiers (including flags, width and precision options) and reuse existing functions when possible. It does so both in the runtime (in_printf
) and in theprintf
unrolling optimization that is done onprintf
calls with literal format string.The end result is:
int_to_char
,print_string
andpack_string
sh-runtime.c
shrinks from 1046 to 918 lines (-128) whileprintf
is much more usefulpnut.sh
shrinks from 7360 to 7191 lines (-169)printf
calls with literal format string are compiled directly to the shell'sprintf
in all cases except when%s
and%c
are used.