Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Otel Memory Leak #6315

Open
tjsampson opened this issue Nov 8, 2024 · 0 comments
Open

Otel Memory Leak #6315

tjsampson opened this issue Nov 8, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@tjsampson
Copy link

tjsampson commented Nov 8, 2024

Description

Memory Leak in otel library code.

Environment

  • OS: Linux
  • Architecture: x86
  • Go Version: 1.23.2
    - go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.56.0
    • go.opentelemetry.io/otel v1.31.0
    • go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc v1.31.0
    • go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.31.0
    • go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.31.0
    • go.opentelemetry.io/otel/sdk v1.31.0
    • go.opentelemetry.io/otel/trace v1.31.0

Steps To Reproduce

See Comment here: #5190 (comment)

I am pretty sure this is still an issue or something else in the golang otel ecosystem. I will get a pprof setup possibly tomorrow, but here's some anecdotal evidence I have:

Screenshot 2024-11-07 at 6 49 14 PM

Pretty easy to see when tracing was implemented from that graph. And yes. I have removed our tracing implementation and it's back to normal memory usage.

Here is a rough draft of our setup. Please let me know if I am doing anything egregiously dumb, but for the most part, it's all pretty standard stuff take from various docs:

go.mod

go 1.23.2

require (
	go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.56.0
	go.opentelemetry.io/otel v1.31.0
	go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc v1.31.0
	go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.31.0
	go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.31.0
	go.opentelemetry.io/otel/sdk v1.31.0
	go.opentelemetry.io/otel/trace v1.31.0
)

We are wrapping the otelhttp.NewHandler around the toplevel muxer, so everything is traced. Yes, I know this is expensive, but it shouldn't leak memory. Eventually we will change this to include/exclude/drop stuff, just so we aren't taking in so much volume (ping routes, health checks, etc.) and do more aggressive down sampling.

func NewApi(c config.Config) *Api {
	return &Api{
		c:           &c,
		controllers: []Controller{newControllers(c)},
		server: &http.Server{
			ReadTimeout:  c.ReadTimeout,
			WriteTimeout: c.WriteTimeout,
			IdleTimeout:  c.IdleTimeout,
			Addr:         fmt.Sprintf(":%d", c.Port),
			Handler:      otelhttp.NewHandler(chi.NewMux(), "INGRESS", otelhttp.WithFilter(traceFilter)),
		},
		done:       make(chan bool),
		sigChannel: make(chan os.Signal, 1024),
	}
}

Here is how we are initializing our trace and metrics providers once on boot:

// TracerProvider an OTLP exporter, and configures the corresponding trace provider.
func TracerProvider(ctx context.Context, res *resource.Resource) (func(context.Context) error, error) {
	// Set up a trace exporter
	traceExporter, err := otlptrace.New(ctx, otlptracegrpc.NewClient())
	if err != nil {
		return nil, errors.Wrap(err, "failed to create trace exporter")
	}

	// Register the trace exporter with a TracerProvider, using a batch
	// span processor to aggregate spans before export.
	tracerProvider := sdktrace.NewTracerProvider(
		sdktrace.WithSampler(sdktrace.AlwaysSample()),
		sdktrace.WithResource(res),
		sdktrace.WithBatcher(traceExporter),
	)
	otel.SetTracerProvider(tracerProvider)
	otel.SetTextMapPropagator(
		propagation.NewCompositeTextMapPropagator(
			propagation.TraceContext{},
			propagation.Baggage{},
		))

	// Shutdown will flush any remaining spans and shut down the exporter.
	return tracerProvider.Shutdown, nil
}

// MeterProvider an OTLP exporter, and configures the corresponding meter provider.
func MeterProvider(ctx context.Context, res *resource.Resource) (func(context.Context) error, error) {
	metricExporter, err := otlpmetricgrpc.New(ctx)
	if err != nil {
		return nil, errors.Wrap(err, "failed to create metric exporter")
	}

	meterProvider := sdkmetric.NewMeterProvider(
		sdkmetric.WithReader(sdkmetric.NewPeriodicReader(metricExporter)),
		sdkmetric.WithResource(res),
	)
	otel.SetMeterProvider(meterProvider)

	return meterProvider.Shutdown, nil
}

Then called and shutdown on main:

	shutDownTracer, err := traceinstrument.TracerProvider(ctx, traceRes)
	if err != nil {
		log.Logger.Fatal("failed to create trace provider", zap.Error(err))
	}
	
	defer func(onShutdown func(ctx context.Context) error) {
		if errr := onShutdown(ctx); errr != nil {
			log.Logger.Error("error shutting down trace provider", zap.Error(errr))
		}
	}(shutDownTracer)

	shutdownTraceMetrics, err := traceinstrument.MeterProvider(ctx, traceRes)
	if err != nil {
		log.Logger.Fatal("failed to create meter provider", zap.Error(err))
	}

	defer func(onShutdown func(ctx context.Context) error) {
		if errr := onShutdown(ctx); errr != nil {
			log.Logger.Error("error shutting down metrics provider", zap.Error(errr))
		}
	}(shutdownTraceMetrics)

Note. We are also using the otelhttp.NewTransport to wrap the default logging transport:

http.DefaultTransport = otelhttp.NewTransport(http.DefaultTransport)

If we remove tracing setup, memory usage goes back to normal. So the leak is definitely in our tracing setup.

Expected behavior

Memory does not continuously increase over time.

@tjsampson tjsampson added the bug Something isn't working label Nov 8, 2024
@MrAlias MrAlias added this to Go: Bugs Nov 9, 2024
@github-project-automation github-project-automation bot moved this to Needs triage in Go: Bugs Nov 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Needs triage
Development

No branches or pull requests

1 participant