Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix server.address field in HTTP logs #4142

Merged
merged 3 commits into from
Nov 22, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Kind can be one of:
# - breaking-change: a change to previously-documented behavior
# - deprecation: functionality that is being removed in a later release
# - bug-fix: fixes a problem in a previous version
# - enhancement: extends functionality but does not break or fix existing behavior
# - feature: new functionality
# - known-issue: problems that we are aware of in a given version
# - security: impacts on the security of a product or a user’s deployment.
# - upgrade: important information for someone upgrading from a prior version
# - other: does not fit into any of the other categories
kind: bug-fix

# Change summary; a 80ish characters long description of the change.
summary: Fix server.address field in HTTP logs

# Long description; in case the summary is not enough to describe the change
# this field accommodate a description without length limits.
# NOTE: This field will be rendered only for breaking-change and known-issue kinds at the moment.
#description:

# Affected component; usually one of "elastic-agent", "fleet-server", "filebeat", "metricbeat", "auditbeat", "all", etc.
component: fleet-server

# PR URL; optional; the PR number that added the changeset.
# If not present is automatically filled by the tooling finding the PR where this changelog fragment has been added.
# NOTE: the tooling supports backports, so it's able to fill the original PR number instead of the backport PR number.
# Please provide it if you are adding a fragment for a different PR.
#pr: https://github.com/owner/repo/1234

# Issue URL; optional; the GitHub issue related to this changeset (either closes or is part of).
# If not present is automatically filled by the tooling with the issue linked to the PR number.
#issue: https://github.com/owner/repo/1234
11 changes: 7 additions & 4 deletions internal/pkg/logger/http.go
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,11 @@ func Middleware(next http.Handler) http.Handler {
w.Header().Set(HeaderRequestID, reqID)

// get the server bound addr from the req ctx
addr, _ := r.Context().Value(http.LocalAddrContextKey).(string)
var addr string
netAddr, ok := r.Context().Value(http.LocalAddrContextKey).(net.Addr)
if ok {
addr = netAddr.String()
}

// Add trace correlation fields
ctx := r.Context()
Expand Down Expand Up @@ -250,13 +254,12 @@ func Middleware(next http.Handler) http.Handler {
next.ServeHTTP(wrCounter, r)
httpMeta(r, e)

if zlog.Debug().Enabled() || (wrCounter.statusCode < 200 && wrCounter.statusCode >= 300) {
if zlog.Info().Enabled() {
e.Uint64(ECSHTTPRequestBodyBytes, rdCounter.Count())
e.Uint64(ECSHTTPResponseBodyBytes, wrCounter.Count())
e.Int(ECSHTTPResponseCode, wrCounter.statusCode)
e.Int64(ECSEventDuration, time.Since(start).Nanoseconds())

e.Msgf("%d HTTP Request", wrCounter.statusCode)
e.Msg("HTTP Request")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the middleware layer to log an info level message for each request if info is enabled.

The previous behaviour was to log an (info level) message if Debug is enabled, or the server responded in a non 2xx status

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this is a good idea? How frequent would these logs be in the median case?

Copy link
Contributor Author

@michel-laterman michel-laterman Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the number of connected agents.
IIRC agents will long poll 10m by default, so we would expect a log every 10m per agent for checkins + additional entries for each action's acks (when an action or policy change is sent)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like potentially quite a bit. Could we rate limit or aggregate these based on return code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I'm fine with the rest of this PR, it's just this change that seems potentially dangerous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually the previous condition is broken; it would only output the (info level) message if debug is enabled.

Copy link
Contributor Author

@michel-laterman michel-laterman Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll revert & fix the condition; then make an issue for sampling logs

EDIT:
issue: #4145

}
}
return http.HandlerFunc(fn)
Expand Down
36 changes: 26 additions & 10 deletions internal/pkg/logger/http_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,20 @@
package logger

import (
"bytes"
"context"
"encoding/json"
"net"
"net/http"
"net/http/httptest"
"testing"
"time"

"github.com/rs/zerolog"
"github.com/stretchr/testify/require"

testlog "github.com/elastic/fleet-server/v7/internal/pkg/testing/log"
)

func TestMiddleware(t *testing.T) {
ctx := testlog.SetLogger(t).WithContext(context.Background())
h := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
ts, ok := CtxStartTime(r.Context())
require.True(t, ok, "expected context to have start time")
Expand All @@ -27,15 +28,30 @@
_, _ = w.Write([]byte(`hello, world`))
})

w := httptest.NewRecorder()
req := httptest.NewRequest("GET", "/", nil).WithContext(ctx)

Middleware(h).ServeHTTP(w, req)
res := w.Result()
var b bytes.Buffer
logger := zerolog.New(&b).Level(zerolog.InfoLevel)
ctx := logger.WithContext(context.Background())

srv := httptest.NewUnstartedServer(Middleware(h))
srv.Config.BaseContext = func(_ net.Listener) context.Context {
return ctx
}
srv.Start()
defer srv.Close()
req, err := http.NewRequest("GET", srv.URL, nil)

Check failure on line 41 in internal/pkg/logger/http_test.go

View workflow job for this annotation

GitHub Actions / lint (linux)

should rewrite http.NewRequestWithContext or add (*Request).WithContext (noctx)
require.NoError(t, err)

res, err := srv.Client().Do(req)
require.NoError(t, err)
res.Body.Close()
require.Equal(t, http.StatusOK, res.StatusCode)
_, ok := res.Header[HeaderRequestID]
require.True(t, ok, "expected to have a request ID")
reqID := req.Header.Get(HeaderRequestID)
require.NotEmpty(t, reqID)

var obj map[string]any
err = json.Unmarshal(b.Bytes(), &obj)
require.NoError(t, err)
v, ok := obj[ECSServerAddress]
require.Truef(t, ok, "expected to find key: %s in %v", ECSServerAddress, obj)
require.NotEmpty(t, v)
}
Loading