Skip to content

Project: Improve efficiency of the generated C parsing code

Benjamin Bannier edited this page Apr 26, 2021 · 7 revisions

Goals

Currently, the generated C++ code coming out of HILTI can be pretty verbose and will often include logic & state that remains unused, or unneeded, by the final parser. That leads to unnecessary overhead in execution time and memory usage, and often also in JIT compile time. We want to optimize that.

Generally, everything is in scope here that cuts down on the generated C++ code—the more the better. However, we need to keep in mind that there’s a trade-off between the effort required to improve our code generator vs. the gain we will see from the changes; even an obvious optimization won’t change much at all if the external C++ compiler will already applying a similar transformation anyways. As a rule of thumb, let’s not try to compete with standard C++-side optimizations; they are likely to be better than whatever we could come up with it. (Exception to the rule: low-effort ways to improve compilation speed and readability of our C++ code).

With that in mind, the most promising class of optimizations is anything that exploits global knowledge about what the final, fully linked HTO module is actually going to make use of. The following is a list of specific instances that seems worth looking into:

  • Skip any hooks that aren’t ever implemented.

  • Skip types that aren't needed

    • Don't emit anything relating to a non-public type that's not used anywhere (units, enums, etc.). If that leaves a compilation unit, empty, skip compiling it all together.
  • Skip unit state that’s never used:

    • Sink state (including for reassembly)
    • Filter state (and aim to infer %filter automatically)
    • Random access state (and aim to infer %random-access automatically)
    • Fields that are never read anywhere (treat them like anonymous fields instead)
    • Type information that's never accessed
  • Skip logic that’s not needed:

    • Random access: most hooks will never update the input position
    • Skip __self locals and parameters where not needed
    • If the value of an anonymous field is never used, skip anything that’s not directly needed for just skipping over it.
    • Suspending: if we know that we won’t need to suspend a parser, remove everything related to that, including the top-level fiber wrapping (and/or: provide a separate entry point for the host application that skips the wrapping)
    • Skip top-level parseX methods that a host application doesn’t need.

In some cases, the compiler won’t be able to know if a host applications needs certain functionality. For those cases, we can provide compiler options to enable/disable certain features (e.g., the ability for a parser to suspend; availability of full type information; requirement to always provide all fields for an application like spicy-dump).

Beyond these “global view” optimizations, the following list collects some further code optimizations that may be worth considering:

  • Move per-type overloads into type information

    • __str__
    • operator<<
    • Do we still need __visit then?
  • Move inline static struct members out of line

    • __parser
  • Move ValueReference instance to pure stack values where possible.

  • C++ readibility:

    • (*__self).<method>(...); -> <method>(...)
    • (*x).foo-> x->foo
  • Perform a unity build for all the C++ code going into a single HLTO?


Only emit random access functionality if used

Making a unit %random-access currently adds support code embedded in the control flow. The unit then provides the member functions set_input, input and position for users to access and control input.

Since random-access support is embedded in the control flow, it is hard to remove if unused without performing control flow analysis. Instead we can only emit the support code if needed.

Currently input is fully parsed and resolved; during HILTI codegen we then emit the needed members and their uses. We can instead remove the support for users to trigger %random-access, and inject it automatically when encountering uses of functions depending on it.

  • remove validators for unit::Position, unit::Input and unit::SetInput
  • add visitors for these methods which inject mark the unit %random-access. This could happen e.g., during resolve_operators.

Remove unused global information with new optimizer pass

Introduce Driver::codegenUnits to generate C++ code for units

Separating HILTI codegen from translation to C++ allows us to insert transformations on a global view of all code before emitting C++ which we cannot analyze anymore.

  • move uses of Unit::codegen and Unit::linkerMetaData in Driver::_compileUnit into new function Driver::codegenUnits
  • invoke Driver::codegenUnits from Driver::compile after call to Driver::compileUnits

Introduce a new optimizer pass between Driver::codgenUnits and Driver::compileUnits

After generating HILTI ASTs for all inputs we can perform another perform global AST transformations.

These transformations can only be performed if all we know all users of generated code, so we e.g., need to skip this pass if any input file cannot be represented as a HILTI AST (e.g., any C++ input file). We need to skip all steps below for such cases.

class GlobalOptimizer {
    GlobalOptimizer(vector<Unit&> units);

    void run() const {
        // Create visitors.

        while (changed) {
            for (auto& unit: _units) {
                auto& module = unit.module();
                // Invoke each visitor on `module`.
            }
    }

    vector<Unit&> _units;

    // Members for shared/persistent visitor state.
};

This pass can be injecting after compilation to HILTI is done, but before C++ codegen.

Pass to remove unimplemented hooks

We inject calls to hooks during HILTI codegen, but if no custom hook implementations where given only inject default implementations at C++ link time. If we know that we see all code we can still detect unimplemented and unused hooks and remove them.

For the implementation we would first detect all hooks which are candidates for removal, and then remove their declaration and uses. This needs to be done in a loop until no changes are performed anymore.

Identifying candidates

  • select all AST nodes struct_::Field which have a Function child with hook flavor
    • record their field ID, the ID of the parent Type (get Type parent and select its ID child), and similarly the enclosing Module ID. We also store the position_t of the Field.
  • select all Function declarations where their Function child has hook flavor
    • check if their Function ID matches any of the previously seen candidates in the form MODULE::TYPE::FIELD and remove any found candidate from the candidate set.

After running this twice the resulting set contains the unimplemented, but present hooks.

Removal

While some hooks can be declared without ever being used, for others we inject calls to their default implementation. We can identify such calls by selecting struct_::MemberCall nodes with a Member child with an ID matching the Field ID.

If no uses are found we can directly remove the fields via a visitor selecting Field and checking whether it matches a removal candidate. If uses are found we also need to remove them, up to the enclosing Expression.

Implementation notes

  • add a Visitor visiting Field and Function and MemberCall
  • on construction the Visitor receives a pointer to storage which is owned by the GlobalOptimizer pass, e.g., a unordered_multimap<tuple<ID, ID, ID>, position_t> _hooks
  • add Visitor method void removeField(position_t) which removes the field via the parent Node
  • Visitor should provide two methods, void collect() which invokes the visitors and void prune() which triggers removal based on the stored information.

Pass to remove unused functions

Unused functions can use the infrastructure described above. To catch both their declaration and the implementation we here need to select both on Function declarations just under the global Module and also in the body. We can use ResolvedIDs to distinguish nodes.

We could store this in a unordered_multimap<ResolvedID, position_t> _rids in GlobalOptimizer passed to the Visitor.

Callers could be identified by filtering during collect for ResolvedID nodes corresponding to a known ResolvedID. If encountered not in a Function declaration all rids with a matching ResolvedID should be removed.

remove would need to be changed to also remove unused functions (all functions present in _rids after collecting).

Pass to remove unused types

This can use almost the same the infrastructure already implemented for removing unused functions. We would select for Type with linkage=private and add them to the know rids.

Removal should already be supported.