Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add --pid-file option to write PID files #25474

Open
wants to merge 4 commits into
base: main-2.x
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions cmd/influxd/launcher/cmd.go
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,8 @@ type InfluxdOpts struct {
TracingType string
ReportingDisabled bool

PIDFile string

AssetsPath string
BoltPath string
SqLitePath string
Expand Down Expand Up @@ -213,6 +215,8 @@ func NewOpts(viper *viper.Viper) *InfluxdOpts {
FluxLogEnabled: false,
ReportingDisabled: false,

PIDFile: "",

BoltPath: filepath.Join(dir, bolt.DefaultFilename),
SqLitePath: filepath.Join(dir, sqlite.DefaultFilename),
EnginePath: filepath.Join(dir, "engine"),
Expand Down Expand Up @@ -325,6 +329,12 @@ func (o *InfluxdOpts) BindCliOpts() []cli.Opt {
Default: o.ReportingDisabled,
Desc: "disable sending telemetry data to https://telemetry.influxdata.com every 8 hours",
},
{
DestP: &o.PIDFile,
Flag: "pid-file",
Default: o.PIDFile,
Desc: "write process ID to a file",
},
{
DestP: &o.SessionLength,
Flag: "session-length",
Expand Down
39 changes: 39 additions & 0 deletions cmd/influxd/launcher/launcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (
nethttp "net/http"
"os"
"path/filepath"
"strconv"
"strings"
"sync"
"time"
Expand Down Expand Up @@ -248,6 +249,10 @@ func (m *Launcher) run(ctx context.Context, opts *InfluxdOpts) (err error) {
}
}

if err := m.writePIDFile(opts.PIDFile); err != nil {
return fmt.Errorf("error writing PIDFile %q: %w", opts.PIDFile, err)
}

m.reg = prom.NewRegistry(m.log.With(zap.String("service", "prom_registry")))
m.reg.MustRegister(collectors.NewGoCollector())

Expand Down Expand Up @@ -970,6 +975,40 @@ func (m *Launcher) initTracing(opts *InfluxdOpts) {
}
}

// writePIDFile will write the process ID to pidFile and register a cleanup function to delete it during
// shutdown. If pidFile is empty, then no PID file is written and no cleanup function is registered.
func (m *Launcher) writePIDFile(pidFile string) error {
if pidFile == "" {
return nil
}

// Create directory to PIDfile if needed.
if err := os.MkdirAll(filepath.Dir(pidFile), 0777); err != nil {
return fmt.Errorf("mkdir: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add the path in the fmt.Errorf

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context of the PID filename is added by the caller:

if err := m.writePIDFile(opts.PIDFile); err != nil {
return fmt.Errorf("error writing PIDFile %q: %w", opts.PIDFile, err)
}

I figured this was acceptable elimination of code inside writePIDFile since it is a private, single-use method. For a public method I definitely would have added the filename into each message.

}

// Write PID to file
pidStr := strconv.Itoa(os.Getpid())
if writeErr := os.WriteFile(pidFile, []byte(pidStr), 0666); writeErr != nil {
// Let's make sure we don't leave a PID file behind on error.
removeErr := os.Remove(pidFile)
return fmt.Errorf("write file: %w; remove file: %w", writeErr, removeErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add the path to the error message

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we want to use something lower level than WriteFile like OpenFile so we can pass in O_EXCL and detect existing PID files that weren't cleaned up (or multiple InfluxDB instances). I think that might be an error we want to detect, report, and possibly abort on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to find using the PID file as a lock file more annoying than useful (looking at you PostgreSQL!). It also generates tons of support questions.

The less annoying way of preventing multiple instances is to check for for already listening sockets. We kind of do that today by aborting if we can't bind to the bind addresses. We do the abort after we've opened the PID file, so we would blow away the PID file on shutdown in those situations even though the other (presumed) influxd instance is running.

If we want more robust handling of the PID file if you accidentally launch multiple influxd instances with the same configuration, my suggestion would be a feature request to try connecting to the bind addresses very early on to see if anything is listening on them, or use a Unix domain socket / named pipe for the check. This way when the application dies, there is nothing listening or responding on the socket. With a PID or lock file, the file can be stay around if the application is not shut down cleanly.

Or we could simply write the PID after we bind the sockets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I don't like the idea of dropping the PID file after we're listening for connections, because that could put a long delay between influxd starts and when the PID file drops.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion then, to avoid support questions, would be to check for existence, issue a warning if present, and then continue, rather than a fatal error. That will let the customer know that the WAL flush may not have happened on shutdown, for instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can do a compromise between the PID approaches. If you specify --pid-file, the default behavior is to abort with an error if the PID file already exists and the PID file is not touched. If you additionally specify a binary flag --overwrite-pid-file, then a warning is issued if the PID file is already present, but it will be overwritten and startup will continue.

Adding an extra parameter does add complexity, but the impact of --overwrite-pid-file would be extremely localized and simple to exhaustively test in the automated unit tests.

Thoughts?

}

// Add a cleanup function.
m.closers = append(m.closers, labeledCloser{
label: "pidfile",
closer: func(context.Context) error {
if err := os.Remove(pidFile); err != nil {
return fmt.Errorf("removing PID file %q: %w", pidFile, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the way!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the caller to the cleanup function has no context, so the error must contain the filename.

}
return nil
},
})

return nil
}

// openMetaStores opens the embedded DBs used to store metadata about influxd resources, migrating them to
// the latest schema expected by the server.
// On success, a unique ID is returned to be used as an identifier for the influxd instance in telemetry.
Expand Down
22 changes: 22 additions & 0 deletions cmd/influxd/launcher/launcher_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ import (
"encoding/json"
"io"
nethttp "net/http"
"os"
"path/filepath"
"strconv"
"testing"
"time"

Expand All @@ -14,6 +17,7 @@ import (
"github.com/influxdata/influxdb/v2/http"
"github.com/influxdata/influxdb/v2/tenant"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)

// Default context.
Expand Down Expand Up @@ -164,3 +168,21 @@ func TestLauncher_PingHeaders(t *testing.T) {
assert.Equal(t, []string{"OSS"}, resp.Header.Values("X-Influxdb-Build"))
assert.Equal(t, []string{"dev"}, resp.Header.Values("X-Influxdb-Version"))
}

func TestLauncher_PIDFile(t *testing.T) {
pidDir := t.TempDir()
pidFilename := filepath.Join(pidDir, "influxd.pid")

l := launcher.RunAndSetupNewLauncherOrFail(ctx, t, func(o *launcher.InfluxdOpts) {
o.PIDFile = pidFilename
})
defer func() {
l.ShutdownOrFail(t, ctx)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran in to a similar thing as this recently: #25398 (comment)

I think that this method call ShutdownOrFail will get called immediately? Might be good to check but this may not be the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's inside a lambda, so it will not be evaluated until the defer calls the lambda. Also note that if it got called immediately the require.FileExists() on line 184 would fail because the PID file gets removed when l.ShutdownOrFail(t, ctx) is called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devanbenz - the arguments to a deferred function are evaluated on the execution of the defer line, and those evaluated arguments are stored for the eventual call, but the deferred call itself doesn't happen until after the end of the closing method.

require.NoFileExists(t, pidFilename)
}()

require.FileExists(t, pidFilename)
pidBytes, err := os.ReadFile(pidFilename)
require.NoError(t, err)
require.Equal(t, strconv.Itoa(os.Getpid()), string(pidBytes))
}