-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add --pid-file
option to write PID files
#25474
base: main-2.x
Are you sure you want to change the base?
Conversation
Add `--pid-file` option to write PID files on startup. The PID filename is specified by the argument after `--pid-file`. Example: `influxd --pid-file /var/lib/influxd/influxd.pid` PID files are automatically removed when the influxd process is shutdown. Closes: 25473
o.PIDFile = pidFilename | ||
}) | ||
defer func() { | ||
l.ShutdownOrFail(t, ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran in to a similar thing as this recently: #25398 (comment)
I think that this method call ShutdownOrFail
will get called immediately? Might be good to check but this may not be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's inside a lambda, so it will not be evaluated until the defer calls the lambda. Also note that if it got called immediately the require.FileExists()
on line 184 would fail because the PID file gets removed when l.ShutdownOrFail(t, ctx)
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devanbenz - the arguments to a deferred function are evaluated on the execution of the defer
line, and those evaluated arguments are stored for the eventual call, but the deferred call itself doesn't happen until after the end of the closing method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some error message changes, and a question about handling a (currently) undetected error.
|
||
// Create directory to PIDfile if needed. | ||
if err := os.MkdirAll(filepath.Dir(pidFile), 0777); err != nil { | ||
return fmt.Errorf("mkdir: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add the path in the fmt.Errorf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context of the PID filename is added by the caller:
influxdb/cmd/influxd/launcher/launcher.go
Lines 252 to 254 in a91ba1c
if err := m.writePIDFile(opts.PIDFile); err != nil { | |
return fmt.Errorf("error writing PIDFile %q: %w", opts.PIDFile, err) | |
} |
I figured this was acceptable elimination of code inside writePIDFile
since it is a private, single-use method. For a public method I definitely would have added the filename into each message.
cmd/influxd/launcher/launcher.go
Outdated
if writeErr := os.WriteFile(pidFile, []byte(pidStr), 0666); writeErr != nil { | ||
// Let's make sure we don't leave a PID file behind on error. | ||
removeErr := os.Remove(pidFile) | ||
return fmt.Errorf("write file: %w; remove file: %w", writeErr, removeErr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add the path to the error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we want to use something lower level than WriteFile
like OpenFile
so we can pass in O_EXCL
and detect existing PID files that weren't cleaned up (or multiple InfluxDB instances). I think that might be an error we want to detect, report, and possibly abort on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to find using the PID file as a lock file more annoying than useful (looking at you PostgreSQL!). It also generates tons of support questions.
The less annoying way of preventing multiple instances is to check for for already listening sockets. We kind of do that today by aborting if we can't bind to the bind addresses. We do the abort after we've opened the PID file, so we would blow away the PID file on shutdown in those situations even though the other (presumed) influxd instance is running.
If we want more robust handling of the PID file if you accidentally launch multiple influxd instances with the same configuration, my suggestion would be a feature request to try connecting to the bind addresses very early on to see if anything is listening on them, or use a Unix domain socket / named pipe for the check. This way when the application dies, there is nothing listening or responding on the socket. With a PID or lock file, the file can be stay around if the application is not shut down cleanly.
Or we could simply write the PID after we bind the sockets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I don't like the idea of dropping the PID file after we're listening for connections, because that could put a long delay between influxd
starts and when the PID file drops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion then, to avoid support questions, would be to check for existence, issue a warning if present, and then continue, rather than a fatal error. That will let the customer know that the WAL flush may not have happened on shutdown, for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can do a compromise between the PID approaches. If you specify --pid-file
, the default behavior is to abort with an error if the PID file already exists and the PID file is not touched. If you additionally specify a binary flag --overwrite-pid-file
, then a warning is issued if the PID file is already present, but it will be overwritten and startup will continue.
Adding an extra parameter does add complexity, but the impact of --overwrite-pid-file
would be extremely localized and simple to exhaustively test in the automated unit tests.
Thoughts?
cmd/influxd/launcher/launcher.go
Outdated
label: "pidfile", | ||
closer: func(context.Context) error { | ||
if err := os.Remove(pidFile); err != nil { | ||
return fmt.Errorf("removing PID file %q: %w", pidFile, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case the caller to the cleanup function has no context, so the error must contain the filename.
Change default behavior to abort with an error if the PID file is requested but it already exists. Add `-ooverwrite-pid-file` flag to change behavior to attempt to overwrite PID file if the file exists instead of aborting.
Add
--pid-file
option to write PID files on startup. The PID filename is specified by the argument after--pid-file
.Example:
influxd --pid-file /var/lib/influxd/influxd.pid
PID files are automatically removed when the influxd process is shutdown.
Closes: #25473