Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support periodic generation of pprof heap profiles and other mem info #2072

Open
zackattack01 opened this issue Jan 28, 2025 · 1 comment
Open

Comments

@zackattack01
Copy link
Contributor

While recently debugging some memory issues and looking into the memprofile we generate for flares (see code here) I noticed a few things:

  • there are several code paths that are specifically busy while generating a flare, and these tend to overwhelm the results in a way that isn't indicative of what we'd normally expect launcher to be doing (e.g. reading/copying/writing files and zipping for the flare, etc.)
  • it is very difficult to find anything meaningful from these taken only at a single point in time

We could likely put ourselves in a much better position to catch real memory issues by supporting a debug mode that generates a collection of these every so often (e.g. once an hour?). I'm not sure what the best way to gate this behavior would be, if we use the existing debug flag we should probably add some sort of rotation/cap for the generated profiles. I'd think we could just pick some temp directory to store them. this could look something like this:

  • if (existing or new) debug flag is set:
    • every hour or so we create some artifacts that include:
      • device's runtime when the report is generated
      • launcher's runtime when the report is generated
      • some basic memory stats (e.g. we collect process mem info here)
      • the heap profile noted above
      • anything else fun that people can think of
@directionless
Copy link
Contributor

We sorta used to do this... https://github.com/kolide/launcher/blob/main/pkg/debug listens for a signal, and enables an http interface to net/http/pprof. Two problems...

  1. Local only -- it was really designed for the developer working locally, not for debugging remote customers
  2. no windows -- because it's triggered off a signal, there's no support on windows for it. I took a couple stabs at making a filewatch style twigger, but never got it performant.

I think it's reasonable to explore creating these periodically. Some brainstorming:

  • Needing to put a machine into debug mode sounds hard, and potentially like we'd miss something. What's the impact to generating every hour?
  • How big are would they be?
  • Do we have the ability to detect anomalous ones? That can feed into retention, or even an automatic submission mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants