-
Notifications
You must be signed in to change notification settings - Fork 38
Performance profiling of Spicy parsers
This captures some notes for gathering performance statistics for Spicy parsers. The following assumes you are using Spicy and plugin bundled with Zeek.
The two main challenges when creating flame graphs is (1) compiling everything with frame pointer to get consistent stack traces, and (2) cleaning up the collected performance samples to account for Spicy's stack switching.
-
Build Zeek (and hence Spicy's runtime library) with frame pointers:
# CFLAGS=-fno-omit-frame-pointer CXXFLAGS=-fno-omit-frame-pointer ./configure --generator=Ninja --ccache --disable-zeekctl --disable-python
If you also want to use a non-standard compiler (say, clang), add
CC=/path/to/clang CXX=/path/to/clang++
. -
We also need
malloc
and friends with frame pointers. One way to do that is using a custom jemalloc. It's easy to build with frame pointers:# CFLAGS=-fno-omit-frame-pointer ./configure && make && make install
Then inject it into processes through
LD_PRELOAD
, like globally like this:# export LD_PRELOAD=/usr/local/lib/libjemalloc.so.2
With frame pointers in place, we can now collect perf
data for Zeek:
# perf record -F 99 --call-graph fp -- zeek -Cr trace.pcap my-analyzer.hlto
To create a flame graph out of that, we need to clean up the stack traces first. Get the flame graph tools and run:
# perf script | awk '/^$/ { skip=0; } !skip { print; } /hilti::rt::detail::Callback::Callback/ { skip=1; }' | stackcollapse-perf.pl /dev/stdin >perf.folded
# flamegraph.pl perf.folded >~/tmp/perf.svg
You should now see the Spicy parsers as their stacks in the graphs, placed right on top of the main Zeek process.
Normally, when using a profiler/debugger with *.hlto
files, they won't have access to the generated C++ code, meaning they can't show source code information for a lot of what's going on. To retain the source code when building the analyzer, use spicyz
with -T
:
# TMPDIR=/dir/to/place/code spicyz -T -o my-analyzer.hlto my-analyzer.spicy my-analyzer.evt
With a recent Spicy plugin from git, you can also set that flag when building your analyzer through CMake:
# `cmake -DSPICYZ_FLAGS=-T [...]`
Since version 1.8 (currently only in git main
), Spicy also provides a built-in profiling mode (-Z
) that adds measurement instrumentation to your analyzer:
# spicyz -Z -o my-analyzer.hlto my-analyzer.spicy my-analyzer.evt
Then run Zeek with Spicy::enable_profiling
to get a timing dump at the end:
[TODO: Update example output, it looks a bit different now.]
# zeek -r -Cr trace.pcap my-analyzer.hlto Spicy::enable_profiling=T
#
# Profiling results
#
# name count time avg-% total-%
hlt/fiber/save-stack 2 8938 0.09 0.18
hlt/func/Foo::X::__on_x 1 4446076 87.58 87.58
hlt/func/Foo::X::__parse_Foo_X_stage2 1 4565265 89.93 89.93
hlt/func/Foo::X::__parse_stage1 1 4649733 91.59 91.59
hlt/func/Foo::X::parse3 1 4690787 92.40 92.40
hlt/func/Foo::__register_Foo_X 1 5705 0.11 0.11
hlt/func/Foo::fibo 1 4401818 86.71 86.71
hlt/func/Foo::y 88 141955 0.03 2.80
hlt/total 1 5076714 100.00 100.00
spicy/unit/Foo::X 1 4608137 90.77 90.77
spicy/unit/Foo::X::x 1 4506502 88.77 88.77
Usually the most interesting parts are the lines starting with spicy/unit
(showing execution times for parsing units and their fields), and spicy/event
(showing the time it takes to create (not process) a Zeek event). The latter isn't shown in the example.
The time values represent a high-precision internal clock with no direct meaning, so the percentages are most useful. The measure corresponding blocks of code executed. count
counts how often the blocks were executed. For recursive functions/units, only the top-level instance has their time measured (but all instances are counted).
Spicy internally uses fibers (co-routines) to switch quickly between different analyzers. To get some statistics on their usage, the Spicy plugin comes with a Zeek script:
# zeek -Cr trace.pcap my-analyzer.hlto Zeek/scripts/misc/resource-usage
[TODO: Insert example output]