-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preview: Use dumb-init as PID 1 #2982
Conversation
build_docs.pl runs a number of git commands through its normal course of operation. After a certain point, these git commands will implicitly run git gc in the background. By design, the original git process forks off the gc child process and forgets about it. The child process will get reparented to PID 1, which is either build_docs.pl or a shell. Neither of those will handle SIGCHLD for processes they're not aware of, so the number of zombie processes grows indefinitely. With a large number of zombie processes (80k processes observed in elastic-apps-web), metricbeat starts consuming all available CPU resources, severely degradating the performance of the cluster. This has a side effect of reducing the network performance to around ~50Mbps and increasing the I/O wait to ~30%. dumb-init was picked because it's dumb and doesn't try to do anything fancy. There should be nothing that interferes with build_docs.pl's operation. Signals are forwarded to dumb-init's child. dumb-init will also detach itself from the controlling TTY and attach its child process instead. Ticket: https://elasticco.atlassian.net/browse/SRVCS-1367 Signed-off-by: Andrew Gunnerson <[email protected]>
A documentation preview will be available soon. Request a new doc build by commenting
If your PR continues to fail for an unknown reason, the doc build pipeline may be broken. Elastic employees can check the pipeline status here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great investigation and thorough explanation, @agunnerson-elastic ! I'm not familiar with dumb-init
(I've often seen tini
used for this purpose) but it seems to do exactly what we need.
cc: @nik9000 in case you know of anything related to docs-preview that might be relevant that I'm not aware of.
It also looks like my review isn't enough here, since I'm not an owner of this repo anymore 🙃 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thank you for this!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very neat. Thank you!
build_docs.pl
runs a number of git commands through its normal course of operation. After a certain point, these git commands will implicitly rungit gc
in the background. By design, the original git process forks off the gc child process and forgets about it. The child process will get reparented to PID 1, which is eitherbuild_docs.pl
or a shell. Neither of those will handleSIGCHLD
for processes they're not aware of, so the number of zombie processes grows indefinitely.With a large number of zombie processes (80k processes observed in elastic-apps-web), metricbeat starts consuming all available CPU resources, severely degradating the performance of the cluster. This has a side effect of reducing the network performance to around ~50Mbps and increasing the I/O wait to ~30%.
dumb-init was picked because it's dumb and doesn't try to do anything fancy. There should be nothing that interferes with
build_docs.pl
's operation. Signals are forwarded to dumb-init's child. dumb-init will also detach itself from the controlling TTY and attach its child process instead.Ticket: https://elasticco.atlassian.net/browse/SRVCS-1367
(2024-04-12 update: This has been successfully tested in the elastic-apps-web cluster.)