support periodic generation of pprof heap profiles and other mem info #2072

zackattack01 · 2025-01-28T18:23:08Z

While recently debugging some memory issues and looking into the memprofile we generate for flares (see code here) I noticed a few things:

there are several code paths that are specifically busy while generating a flare, and these tend to overwhelm the results in a way that isn't indicative of what we'd normally expect launcher to be doing (e.g. reading/copying/writing files and zipping for the flare, etc.)
it is very difficult to find anything meaningful from these taken only at a single point in time

We could likely put ourselves in a much better position to catch real memory issues by supporting a debug mode that generates a collection of these every so often (e.g. once an hour?). I'm not sure what the best way to gate this behavior would be, if we use the existing debug flag we should probably add some sort of rotation/cap for the generated profiles. I'd think we could just pick some temp directory to store them. this could look something like this:

if (existing or new) debug flag is set:
- every hour or so we create some artifacts that include:
  - device's runtime when the report is generated
  - launcher's runtime when the report is generated
  - some basic memory stats (e.g. we collect process mem info here)
  - the heap profile noted above
  - anything else fun that people can think of

directionless · 2025-01-28T22:37:16Z

We sorta used to do this... https://github.com/kolide/launcher/blob/main/pkg/debug listens for a signal, and enables an http interface to net/http/pprof. Two problems...

Local only -- it was really designed for the developer working locally, not for debugging remote customers
no windows -- because it's triggered off a signal, there's no support on windows for it. I took a couple stabs at making a filewatch style twigger, but never got it performant.

I think it's reasonable to explore creating these periodically. Some brainstorming:

Needing to put a machine into debug mode sounds hard, and potentially like we'd miss something. What's the impact to generating every hour?
How big are would they be?
Do we have the ability to detect anomalous ones? That can feed into retention, or even an automatic submission mode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support periodic generation of pprof heap profiles and other mem info #2072

support periodic generation of pprof heap profiles and other mem info #2072

zackattack01 commented Jan 28, 2025

directionless commented Jan 28, 2025

support periodic generation of pprof heap profiles and other mem info #2072

support periodic generation of pprof heap profiles and other mem info #2072

Comments

zackattack01 commented Jan 28, 2025

directionless commented Jan 28, 2025