Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simpler and less buggy preprocessor, support #include <> and more! #64

Merged
merged 25 commits into from
Sep 2, 2024

Conversation

laurenthuberdeau
Copy link
Collaborator

@laurenthuberdeau laurenthuberdeau commented Aug 15, 2024

Context

I recently merged #8 and noted that I would come back to improving the tokenizer and preprocessor. This is that PR. It includes:

  • A simpler tokenizer that works better with macros, including the ability to produce '\n' tokens to detect end of lines: 46bbebc
  • Macro calls: empty arguments are no longer ignored. Example: M(1,,2,3) now has 4 arguments instead of being parsed as M(1,2,3): 9af3839
  • More flexible token pasting: empty arguments and C keywords identifiers can be pasted. Integers can be pasted to the left of identifiers, assuming another token pasting will result in a valid identifier: 9af3839 and 4ee81e9
  • Allow C keywords to be defined by the preprocessor. This allows us to replace unsupported types with supported types such as #define float int: 801be22
  • Support #include <> directives and the -I option which, when used together, makes it easy to use the portable libc: 64642cc
  • Reuse the C parser for #if expressions, reducing code duplication and giving us the ability to evaluate constant expressions (will be used to support constant non-integer array lengths): 1d9dd56

A bunch of bug fixes and other quality of life improvements:

  • Macro lines ending in EOF instead of \n no longer crash: b496f4b
  • Using a macro and immediately redefining it no longer causes the new macro to be used in the expansion before the redefinition: 7ec88a8
  • The sh backend compiles a = b = 0 expressions to : $((a = b = 0)) instead of a=$((b = 0)).
  • Produce an error when parse_definition fails to parse instead of causing the code generator to crash: 8374691
  • Include file location when crashing with fatal_error: bae4cbe

The end result is that we can tokenize TCC-0.9.27 🎉

Because the preprocessor treats whitespace as important, it used to
parse macros by looking for certain characters. That was a problem
when the preprocessor encountered whitespace and comments, and prevented
it from recognizing the end of a macro.

This commit fixes that issue by adding a flag (skip_newlines) indicating
to the tokenizer if it should skip '\n' or not. When treating
preoprocessor directives, this flag is set to false so that the macro
parser can stop at the end of the line.
This creates an invalid identifier, but the result
may be pasted with another identifier (to the left)
resulting in a valid identifier.
Otherwise, while reading the arguments, the macro may be redefined and
the expansion would use the new definition (only valid after the #define)
instead of the previous one.

An example showing the bug:

  #define FOO 1
  int foo_val = FOO
  #define FOO 3 // Overwrites FOO
  ;

Before, foo_val was assigned the value 3 and now 1 as expected.
This is useful to allow unused types to be redefined
to something supported by pnut.
@laurenthuberdeau laurenthuberdeau changed the title Changes for TCC More preprocessor changes Aug 18, 2024
Now that the tokenizer can produce NEWLINE tokens, we can reuse the C
parser to parse #if expressions. Without newlines, the C parser would
keep reading until the end of the expression, skipping over the newlines.
Now, if it encounters a newline, a newline token is produced and the
C parser fails to parse the expression.

This replaces the code that implemented the shunting yard algorithm
with a function that can evaluate constant expressions. This function
evaluates AST nodes that represent constant expressions, and will be
used to support non-integer literal expressions for array lengths.
@laurenthuberdeau
Copy link
Collaborator Author

laurenthuberdeau commented Sep 2, 2024

TCC can now be preprocessed by pnut! 🎉 The only change required is making the string pool and heap (controlled by STRING_POOL_SIZE and HEAP_SIZE) 10x larger to not run out of memory.

When attempting to expand a macro, the list of argument is parsed by
get_macro_args_toks which produced the next token after ')'. This token
was then pushed on the tokens stack so it would be processed after the
expanded macro's tokens.

When multiple macros were expanded sequentially, this caused the last
stack entry to never be empty, which broke the stack reuse mechanism
(similar to TCO).

This bug was not visible when bootstrapping pnuts because not enough
macros were expanded in a row to trigger the issue. This is however a
common pattern in TCC.
@laurenthuberdeau
Copy link
Collaborator Author

There are no meaningful difference in the bootstrap times:

========== Branch 'main' ==========

PLATFORM: Darwin laurent-mbp 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6000 arm64
SHELL: ksh
PNUT_SH_OPTIONS_EXTRA: 
0.110s for: gcc -DRT_NO_INIT_GLOBALS -Dsh  pnut.c -o pnut-sh-compiled-by-gcc.exe
0.187s for: pnut-sh-compiled-by-gcc.exe -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh.sh
31.903s for: ksh pnut-sh.sh -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh-compiled-by-pnut-sh-sh.sh
11.539s for: ksh pnut-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-sh-sh.sh
0.024s for: ksh pnut-i386-compiled-by-pnut-sh-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-i386-sh.exe
0.001s for: pnut-i386-compiled-by-pnut-i386-sh.exe -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-pnut-i386-exe.exe
PLATFORM: Darwin laurent-mbp 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6000 arm64
SHELL: dash
PNUT_SH_OPTIONS_EXTRA: 
0.101s for: gcc -DRT_NO_INIT_GLOBALS -Dsh  pnut.c -o pnut-sh-compiled-by-gcc.exe
0.189s for: pnut-sh-compiled-by-gcc.exe -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh.sh
45.131s for: dash pnut-sh.sh -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh-compiled-by-pnut-sh-sh.sh
12.368s for: dash pnut-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-sh-sh.sh
0.014s for: dash pnut-i386-compiled-by-pnut-sh-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-i386-sh.exe
0.001s for: pnut-i386-compiled-by-pnut-i386-sh.exe -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-pnut-i386-exe.exe
PLATFORM: Darwin laurent-mbp 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6000 arm64
SHELL: bash
PNUT_SH_OPTIONS_EXTRA: 
0.101s for: gcc -DRT_NO_INIT_GLOBALS -Dsh  pnut.c -o pnut-sh-compiled-by-gcc.exe
0.192s for: pnut-sh-compiled-by-gcc.exe -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh.sh
84.046s for: bash pnut-sh.sh -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh-compiled-by-pnut-sh-sh.sh
29.969s for: bash pnut-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-sh-sh.sh
0.038s for: bash pnut-i386-compiled-by-pnut-sh-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-i386-sh.exe
0.001s for: pnut-i386-compiled-by-pnut-i386-sh.exe -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-pnut-i386-exe.exe
PLATFORM: Darwin laurent-mbp 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6000 arm64
SHELL: yash
PNUT_SH_OPTIONS_EXTRA: 
0.102s for: gcc -DRT_NO_INIT_GLOBALS -Dsh  pnut.c -o pnut-sh-compiled-by-gcc.exe
0.207s for: pnut-sh-compiled-by-gcc.exe -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh.sh
150.327s for: yash pnut-sh.sh -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh-compiled-by-pnut-sh-sh.sh
51.881s for: yash pnut-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-sh-sh.sh
0.045s for: yash pnut-i386-compiled-by-pnut-sh-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-i386-sh.exe
0.001s for: pnut-i386-compiled-by-pnut-i386-sh.exe -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-pnut-i386-exe.exe
PLATFORM: Darwin laurent-mbp 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6000 arm64
SHELL: zsh
PNUT_SH_OPTIONS_EXTRA: 
0.109s for: gcc -DRT_NO_INIT_GLOBALS -Dsh  pnut.c -o pnut-sh-compiled-by-gcc.exe
0.192s for: pnut-sh-compiled-by-gcc.exe -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh.sh
1753.253s for: zsh pnut-sh.sh -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh-compiled-by-pnut-sh-sh.sh
245.234s for: zsh pnut-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-sh-sh.sh
0.038s for: zsh pnut-i386-compiled-by-pnut-sh-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-i386-sh.exe
0.001s for: pnut-i386-compiled-by-pnut-i386-sh.exe -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-pnut-i386-exe.exe
M	benchmark-bootstrap-with-options.sh

========== Branch 'laurent/changes-for-TCC' ==========

PLATFORM: Darwin laurent-mbp 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6000 arm64
SHELL: ksh
PNUT_SH_OPTIONS_EXTRA: 
0.201s for: gcc -DRT_NO_INIT_GLOBALS -Dsh  pnut.c -o pnut-sh-compiled-by-gcc.exe
0.164s for: pnut-sh-compiled-by-gcc.exe -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh.sh
32.880s for: ksh pnut-sh.sh -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh-compiled-by-pnut-sh-sh.sh
11.980s for: ksh pnut-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-sh-sh.sh
0.022s for: ksh pnut-i386-compiled-by-pnut-sh-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-i386-sh.exe
0.001s for: pnut-i386-compiled-by-pnut-i386-sh.exe -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-pnut-i386-exe.exe
PLATFORM: Darwin laurent-mbp 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6000 arm64
SHELL: dash
PNUT_SH_OPTIONS_EXTRA: 
0.101s for: gcc -DRT_NO_INIT_GLOBALS -Dsh  pnut.c -o pnut-sh-compiled-by-gcc.exe
0.185s for: pnut-sh-compiled-by-gcc.exe -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh.sh
46.880s for: dash pnut-sh.sh -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh-compiled-by-pnut-sh-sh.sh
12.703s for: dash pnut-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-sh-sh.sh
0.014s for: dash pnut-i386-compiled-by-pnut-sh-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-i386-sh.exe
0.001s for: pnut-i386-compiled-by-pnut-i386-sh.exe -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-pnut-i386-exe.exe
PLATFORM: Darwin laurent-mbp 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6000 arm64
SHELL: bash
PNUT_SH_OPTIONS_EXTRA: 
0.101s for: gcc -DRT_NO_INIT_GLOBALS -Dsh  pnut.c -o pnut-sh-compiled-by-gcc.exe
0.190s for: pnut-sh-compiled-by-gcc.exe -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh.sh
85.792s for: bash pnut-sh.sh -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh-compiled-by-pnut-sh-sh.sh
30.380s for: bash pnut-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-sh-sh.sh
0.039s for: bash pnut-i386-compiled-by-pnut-sh-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-i386-sh.exe
0.001s for: pnut-i386-compiled-by-pnut-i386-sh.exe -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-pnut-i386-exe.exe
PLATFORM: Darwin laurent-mbp 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6000 arm64
SHELL: yash
PNUT_SH_OPTIONS_EXTRA: 
0.105s for: gcc -DRT_NO_INIT_GLOBALS -Dsh  pnut.c -o pnut-sh-compiled-by-gcc.exe
0.186s for: pnut-sh-compiled-by-gcc.exe -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh.sh
152.946s for: yash pnut-sh.sh -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh-compiled-by-pnut-sh-sh.sh
52.065s for: yash pnut-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-sh-sh.sh
0.045s for: yash pnut-i386-compiled-by-pnut-sh-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-i386-sh.exe
0.001s for: pnut-i386-compiled-by-pnut-i386-sh.exe -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-pnut-i386-exe.exe
PLATFORM: Darwin laurent-mbp 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6000 arm64
SHELL: zsh
PNUT_SH_OPTIONS_EXTRA: 
0.103s for: gcc -DRT_NO_INIT_GLOBALS -Dsh  pnut.c -o pnut-sh-compiled-by-gcc.exe
0.191s for: pnut-sh-compiled-by-gcc.exe -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh.sh
1765.779s for: zsh pnut-sh.sh -DRT_NO_INIT_GLOBALS -Dsh  pnut.c > pnut-sh-compiled-by-pnut-sh-sh.sh
257.923s for: zsh pnut-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-sh-sh.sh
0.041s for: zsh pnut-i386-compiled-by-pnut-sh-sh.sh -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-by-pnut-i386-sh.exe
0.001s for: pnut-i386-compiled-by-pnut-i386-sh.exe -DRT_NO_INIT_GLOBALS -Di386 pnut.c > pnut-i386-compiled-pnut-i386-exe.exe

@laurenthuberdeau laurenthuberdeau changed the title More preprocessor changes Simpler and less buggy preprocessor, support #include <> and more! Sep 2, 2024
@laurenthuberdeau laurenthuberdeau merged commit f9b88d7 into main Sep 2, 2024
26 checks passed
@laurenthuberdeau laurenthuberdeau deleted the laurent/changes-for-TCC branch September 2, 2024 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant