Skip to content

Project: Improve efficiency of the generated C parsing code

Benjamin Bannier edited this page Sep 6, 2021 · 7 revisions

Goals

Currently, the generated C++ code coming out of HILTI can be pretty verbose and will often include logic & state that remains unused, or unneeded, by the final parser. That leads to unnecessary overhead in execution time and memory usage, and often also in JIT compile time. We want to optimize that.

Generally, everything is in scope here that cuts down on the generated C++ code—the more the better. However, we need to keep in mind that there’s a trade-off between the effort required to improve our code generator vs. the gain we will see from the changes; even an obvious optimization won’t change much at all if the external C++ compiler will already applying a similar transformation anyways. As a rule of thumb, let’s not try to compete with standard C++-side optimizations; they are likely to be better than whatever we could come up with it. (Exception to the rule: low-effort ways to improve compilation speed and readability of our C++ code).

With that in mind, the most promising class of optimizations is anything that exploits global knowledge about what the final, fully linked HTO module is actually going to make use of. The following is a list of specific instances that seems worth looking into:

  • Skip any hooks that aren’t ever implemented.

  • Skip types that aren't needed

    • Don't emit anything relating to a non-public type that's not used anywhere (units, enums, etc.). If that leaves a compilation unit, empty, skip compiling it all together.
  • Skip unit state that’s never used:

    • Sink state (including for reassembly)
    • Filter state (and aim to infer %filter automatically)
    • Random access state (and aim to infer %random-access automatically)
    • Fields that are never read anywhere (treat them like anonymous fields instead)
    • Type information that's never accessed
  • Skip logic that’s not needed:

    • Random access: most hooks will never update the input position
    • Skip __self locals and parameters where not needed
    • If the value of an anonymous field is never used, skip anything that’s not directly needed for just skipping over it.
    • Suspending: if we know that we won’t need to suspend a parser, remove everything related to that, including the top-level fiber wrapping (and/or: provide a separate entry point for the host application that skips the wrapping)
    • Skip top-level parseX methods that a host application doesn’t need.

In some cases, the compiler won’t be able to know if a host applications needs certain functionality. For those cases, we can provide compiler options to enable/disable certain features (e.g., the ability for a parser to suspend; availability of full type information; requirement to always provide all fields for an application like spicy-dump).

Beyond these “global view” optimizations, the following list collects some further code optimizations that may be worth considering:

  • Move per-type overloads into type information

    • __str__
    • operator<<
    • Do we still need __visit then?
  • Move inline static struct members out of line

    • __parser
  • Move ValueReference instance to pure stack values where possible.

  • C++ readibility:

    • (*__self).<method>(...); -> <method>(...)
    • (*x).foo-> x->foo
  • Perform a unity build for all the C++ code going into a single HLTO?


Technical design