Skip to content

Notes on Runtime Library

Robin Sommer edited this page Apr 29, 2020 · 11 revisions

Some pieces of the HILTI/Spicy runtime libraries come with some specific constraints and/or design goals. We collect some notes on that here.

Types

Containers (bytes/list/map/set/stream/vector)

  • Our iterator interfaces aren't standard compliant at the moment, we should make them so.

  • Different from iterators in the C++ standard library, we need iterators to be safe against undefined behavior. We want the following semantics:

    • An existing iterator must generally remain valid as long as the underlying container sticks around.

    • An existing iterator becomes invalid in two cases:

      1. The underlying container gets destroyed.
      2. Due to container changes, the iterator's current position semantically does not "make sense" anymore. For example, a vector iterator pointing to an index i will be invalid once the vector shrinks to less than i+1 elements. (Note the "semantically" here: For example, resizing of a vector's internal array should not invalidate any iterators.)
    • Once invalid, any attempt to deference an iterator must throw an exception.

    • Even when invalid, iterator operations other than dereferencing should remain supported. For example, if one keep incrementing a vector iterator beyond the vector's size, it may eventually become valid again if the vector grows sufficiently. (This property is particularly important for streams; we'll discuss that further down below.)

    • As the previous bullet suggests, it is possible for an invalid iterator to become valid once more if the container changes accordingly.

    • Iterators must not prevent their underlying containers from being destroyed (e.g., they can't store a strong reference to the container, as that would force it to stay around until the iterator dies.)

  • These safe properties came with some overhead.

    • Generally we accept that overhead.

    • Internally, however, we should avoid the overhead where we safely can. For example, if another part of libhilti-rt executes an iteration over a bytes instance where iterators can't become invalid while the operation is in progress, we should just skip the safety checks (this is the reason for the current "Safe" vs "Non-Safe" iterators; but not sure that's the best way to approach it).

    • Longer-term, the code generator could make use of such "unsafe" paths as well in cases where it can guarantee that the safety properties are still maintained by the particular code being generated.

Integers

TODO

String & Bytes

TODO

Stream

TODO

Global State & Thread Safety

  • Generally, the runtime library should avoid global state wherever possible to remain thread-safe. While we currently are not spawning multiple threads, we will do so eventually.

  • To that end, most global state that the library needs is encapsulated in the hilti::rt::Context struct. Each thread gets its own context instance, and receives access to it through hilti::rt::context::get() (which internally uses a TLS variable). That means that all accesses to anything in Context are automatically thread-safe.

  • There's a small set of truely global state in {hilti/spicy}/rt/global-state.h.

    • Some of this is read-only configuration information, which is thread-safe.
    • Some is state that needs to be managed acrosss threads; we need to review this, as currently some of that is not used in a thread-safe manner.
    • Generally, we should avoid adding any truely global state wherever we can, unless there's a strong reason for doing so.

Fibers

HILTI uses co-routines to suspend execution at arbitrary times and return control back to host applications. That's how Spicy implements incremental parsing: whenever it runs out of input, it returns back to the host application (e.g., Zeek) until more becomes availablee. It the resumes just where it left off.

This is implemented through a Fiber abstraction that internally uses the libtask co-routine implementation, plus recycling of old fibers through a cache so that we don't need to reallocated stacks constantly.

This code is unfortunately quite hard to follow because the control flow just jumps around so awkwardly. It generally seems to be working, but would benefit from some review:

  1. Are there other co-routine implementation out there are either either (1) more efficient, or (2) more portable (but not less efficient)? (Just waiting for C++20 co-routines is an option, too).

  2. Fibers are on Spicy's hot path, so performance here is critical: Is there anything we can speed up here? For example, we are currently wrapping callbacks into std::function, which I understand comes with some overhead.

  3. We have some unit tests, but should extend. A little benchmark tool to measure context switch overhead would be valuable, too.

Use of tinyformat (fmt.h)

I'm thinking we should switch to {fmt} as the more standard implementation. However, there's a catch: I believe we will eventually need our own implementation of format string rendering because otherwise we're getting semantic mismatches between HILTI's types and how the C++ side happens to interpret them. (This has already come up in a couple tickets, and I imagine we'll keep running into that.)

Result

We should generally use Result and ``result::Error` more consistently across the API, instead of where currently use booleans or null values.

Utilities (util.h)

There's probably some cleanup & optimization we can apply to a number of these utility functions, potentially also split them out into some subgroups.