-
Notifications
You must be signed in to change notification settings - Fork 37
Project: Improve efficiency of the generated C parsing code
Currently, the generated C++ code coming out of HILTI can be pretty verbose and will often include logic & state that remains unused, or unneeded, by the final parser. That leads to unnecessary overhead in execution time and memory usage, and often also in JIT compile time. We want to optimize that.
Generally, everything is in scope here that cuts down on the generated C++ code—the more the better. However, we need to keep in mind that there’s a trade-off between the effort required to improve our code generator vs. the gain we will see from the changes; even an obvious optimization won’t change much at all if the external C++ compiler will already applying a similar transformation anyways. As a rule of thumb, let’s not try to compete with standard C++-side optimizations; they are likely to be better than whatever we could come up with it. (Exception to the rule: low-effort ways to improve compilation speed and readability of our C++ code).
With that in mind, the most promising class of optimizations is anything that exploits global knowledge about what the final, fully linked HTO module is actually going to make use of. The following is a list of specific instances that seems worth looking into:
-
Skip any hooks that aren’t ever implemented.
-
Skip types that aren't needed
- Don't emit anything relating to a non-public type that's not used anywhere (units, enums, etc.). If that leaves a compilation unit, empty, skip compiling it all together.
-
Skip unit state that’s never used:
- Sink state (including for reassembly)
- Filter state (and aim to infer
%filter
automatically) - Random access state (and aim to infer
%random-access
automatically) - Fields that are never read anywhere (treat them like anonymous fields instead)
- Type information that's never accessed
-
Skip logic that’s not needed:
- Random access: most hooks will never update the input position
- Skip
__self
locals and parameters where not needed - If the value of an anonymous field is never used, skip anything that’s not directly needed for just skipping over it.
- Suspending: if we know that we won’t need to suspend a parser, remove everything related to that, including the top-level fiber wrapping (and/or: provide a separate entry point for the host application that skips the wrapping)
- Skip top-level
parseX
methods that a host application doesn’t need.
In some cases, the compiler won’t be able to know if a host applications needs certain functionality. For those cases, we can provide compiler options to enable/disable certain features (e.g., the ability for a parser to suspend; availability of full type information; requirement to always provide all fields for an application like spicy-dump
).
Beyond these “global view” optimizations, the following list collects some further code optimizations that may be worth considering:
-
Move per-type overloads into type information
__str__
operator<<
- Do we still need
__visit
then?
-
Move inline static struct members out of line
__parser
-
Move
ValueReference
instance to pure stack values where possible. -
C++ readibility:
-
(*__self).<method>(...);
-><method>(...)
-
(*x).foo
->x->foo
-
-
Perform a unity build for all the C++ code going into a single HLTO?
Making a unit %random-access
currently adds support code embedded in the
control flow. The unit then provides the member functions set_input
, input
and position
for users to access and control input.
Since random-access support is embedded in the control flow, it is hard to remove if unused without performing control flow analysis. Instead we can only emit the support code if needed.
Currently input is fully parsed and resolved; during HILTI codegen we then emit
the needed members and their uses. We can instead remove the support for users
to trigger %random-access
, and inject it automatically when encountering uses
of functions depending on it.
- remove validators for
unit::Position
,unit::Input
andunit::SetInput
- add visitors for these methods which inject mark the unit
%random-access
. This could happen e.g., duringresolve_operators
.
Separating HILTI codegen from translation to C++ allows us to insert transformations on a global view of all code before emitting C++ which we cannot analyze anymore.
- move uses of
Unit::codegen
andUnit::linkerMetaData
inDriver::_compileUnit
into new functionDriver::codegenUnits
- invoke
Driver::codegenUnits
fromDriver::compile
after call toDriver::compileUnits
After generating HILTI ASTs for all inputs we can perform another perform global AST transformations.
These transformations can only be performed if all we know all users of generated code, so we e.g., need to skip this pass if any input file cannot be represented as a HILTI AST (e.g., any C++ input file). We need to skip all steps below for such cases.
class GlobalOptimizer {
GlobalOptimizer(vector<Unit&> units);
void run() const {
// Create visitors.
while (changed) {
for (auto& unit: _units) {
auto& module = unit.module();
// Invoke each visitor on `module`.
}
}
vector<Unit&> _units;
// Members for shared/persistent visitor state.
};
This pass can be injecting after compilation to HILTI is done, but before C++ codegen.
We inject calls to hooks during HILTI codegen, but if no custom hook implementations where given only inject default implementations at C++ link time. If we know that we see all code we can still detect unimplemented and unused hooks and remove them.
For the implementation we would first detect all hooks which are candidates for removal, and then remove their declaration and uses. This needs to be done in a loop until no changes are performed anymore.
- select all AST nodes
struct_::Field
which have aFunction
child withhook
flavor- record their field
ID
, theID
of the parentType
(getType
parent and select itsID
child), and similarly the enclosingModule
ID
. We also store theposition_t
of theField
.
- record their field
- select all
Function
declarations where theirFunction
child hashook
flavor- check if their
Function
ID
matches any of the previously seen candidates in the formMODULE::TYPE::FIELD
and remove any found candidate from the candidate set.
- check if their
After running this twice the resulting set contains the unimplemented, but present hooks.
While some hooks can be declared without ever being used, for others we inject
calls to their default implementation. We can identify such calls by selecting
struct_::MemberCall
nodes with a Member
child with an ID
matching the
Field
ID.
If no uses are found we can directly remove the fields via a visitor selecting
Field
and checking whether it matches a removal candidate. If uses are found
we also need to remove them, up to the enclosing Expression
.
- add a
Visitor
visitingField
andFunction
andMemberCall
- on construction the
Visitor
receives a pointer to storage which is owned by theGlobalOptimizer
pass, e.g., aunordered_multimap<tuple<ID, ID, ID>, position_t> _hooks
- add
Visitor
methodvoid removeField(position_t)
which removes the field via the parentNode
-
Visitor
should provide two methods,void collect()
which invokes the visitors andvoid prune()
which triggers removal based on the stored information.
Unused functions can use the infrastructure described above. To catch both
their declaration and the implementation we here need to select both on
Function
declarations just under the global Module
and also in the body. We
can use ResolvedID
s to distinguish nodes.
We could store this in a unordered_multimap<ResolvedID, position_t> _rids
in
GlobalOptimizer
passed to the Visitor
.
Callers could be identified by filtering during collect
for ResolvedID
nodes
corresponding to a known ResolvedID
. If encountered not in a Function
declaration all rids with a matching ResolvedID
should be removed.
remove
would need to be changed to also remove unused functions (all
functions present in _rids
after collecting).
This can use almost the same the infrastructure already implemented for
removing unused functions. We would select for Type
with linkage=private
and add them to the know rids.
Removal should already be supported.