-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simpler and less buggy preprocessor, support #include <> and more! #64
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
laurenthuberdeau
force-pushed
the
laurent/changes-for-TCC
branch
from
August 18, 2024 20:27
9bdbff8
to
105e163
Compare
Because the preprocessor treats whitespace as important, it used to parse macros by looking for certain characters. That was a problem when the preprocessor encountered whitespace and comments, and prevented it from recognizing the end of a macro. This commit fixes that issue by adding a flag (skip_newlines) indicating to the tokenizer if it should skip '\n' or not. When treating preoprocessor directives, this flag is set to false so that the macro parser can stop at the end of the line.
laurenthuberdeau
force-pushed
the
laurent/changes-for-TCC
branch
from
August 18, 2024 21:26
105e163
to
3e4ec9d
Compare
This creates an invalid identifier, but the result may be pasted with another identifier (to the left) resulting in a valid identifier.
Otherwise, while reading the arguments, the macro may be redefined and the expansion would use the new definition (only valid after the #define) instead of the previous one. An example showing the bug: #define FOO 1 int foo_val = FOO #define FOO 3 // Overwrites FOO ; Before, foo_val was assigned the value 3 and now 1 as expected.
This is useful to allow unused types to be redefined to something supported by pnut.
laurenthuberdeau
force-pushed
the
laurent/changes-for-TCC
branch
from
August 18, 2024 21:29
3e4ec9d
to
e2db147
Compare
The -I option specifies the search path of files that are included with #include <...>.
laurenthuberdeau
force-pushed
the
laurent/changes-for-TCC
branch
from
August 19, 2024 01:31
d83feef
to
64642cc
Compare
Now that the tokenizer can produce NEWLINE tokens, we can reuse the C parser to parse #if expressions. Without newlines, the C parser would keep reading until the end of the expression, skipping over the newlines. Now, if it encounters a newline, a newline token is produced and the C parser fails to parse the expression. This replaces the code that implemented the shunting yard algorithm with a function that can evaluate constant expressions. This function evaluates AST nodes that represent constant expressions, and will be used to support non-integer literal expressions for array lengths.
TCC can now be preprocessed by pnut! 🎉 The only change required is making the string pool and heap (controlled by |
When attempting to expand a macro, the list of argument is parsed by get_macro_args_toks which produced the next token after ')'. This token was then pushed on the tokens stack so it would be processed after the expanded macro's tokens. When multiple macros were expanded sequentially, this caused the last stack entry to never be empty, which broke the stack reuse mechanism (similar to TCO). This bug was not visible when bootstrapping pnuts because not enough macros were expanded in a row to trigger the issue. This is however a common pattern in TCC.
laurenthuberdeau
force-pushed
the
laurent/changes-for-TCC
branch
from
September 2, 2024 18:07
568be2a
to
f845ac9
Compare
There are no meaningful difference in the bootstrap times:
|
laurenthuberdeau
changed the title
More preprocessor changes
Simpler and less buggy preprocessor, support #include <> and more!
Sep 2, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
I recently merged #8 and noted that I would come back to improving the tokenizer and preprocessor. This is that PR. It includes:
'\n'
tokens to detect end of lines: 46bbebcM(1,,2,3)
now has 4 arguments instead of being parsed asM(1,2,3)
: 9af3839#define float int
: 801be22#include <>
directives and the-I
option which, when used together, makes it easy to use the portable libc: 64642cc#if
expressions, reducing code duplication and giving us the ability to evaluate constant expressions (will be used to support constant non-integer array lengths): 1d9dd56A bunch of bug fixes and other quality of life improvements:
\n
no longer crash: b496f4ba = b = 0
expressions to: $((a = b = 0))
instead ofa=$((b = 0))
.parse_definition
fails to parse instead of causing the code generator to crash: 8374691fatal_error
: bae4cbeThe end result is that we can tokenize
TCC-0.9.27
🎉