You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a Next.js 14.1.3 project, migrate from dd-trace to OpenTelemetry.
Configure OpenTelemetry using NodeSDK with various instrumentations (Http, DNS, Net, Undici, etc.).
Use a configuration that includes both metricReader and resourceDetectors.
Run the application, which eventually triggers an Out Of Memory (OOM) error.
Expected Result
OpenTelemetry should collect traces and metrics without causing the application to run out of memory.
Actual Result
The application process terminates abnormally due to an OOM error during runtime.
Additional Details
The issue occurs in a Next.js environment after switching from dd-trace to OpenTelemetry.
The OpenTelemetry configuration includes a NodeSDK setup with W3CTraceContextPropagator, OTLPTraceExporter, and OTLPMetricExporter.
The HttpInstrumentation's requestHook is used to set the HTTP route for spans.
Investigation Findings: When both metricReader and resourceDetectors are removed from the OpenTelemetry configuration, the OOM error no longer occurs. This indicates that these configurations might be contributing to the memory issue.
memory trends for datadog-agent and my application are shown below:
datadog-agent's memory usage:
application's memory usage:
OpenTelemetry Setup Code
// instrumentation.node.tsimport{IncomingMessage}from"node:http";import{context}from"@opentelemetry/api";import{W3CTraceContextPropagator}from"@opentelemetry/core";import{RPCType,getRPCMetadata,setRPCMetadata}from"@opentelemetry/core";import{OTLPMetricExporter}from"@opentelemetry/exporter-metrics-otlp-grpc";import{OTLPTraceExporter}from"@opentelemetry/exporter-trace-otlp-grpc";import{DnsInstrumentation}from"@opentelemetry/instrumentation-dns";import{HttpInstrumentation}from"@opentelemetry/instrumentation-http";import{NetInstrumentation}from"@opentelemetry/instrumentation-net";import{UndiciInstrumentation}from"@opentelemetry/instrumentation-undici";import{awsEcsDetector}from"@opentelemetry/resource-detector-aws";import{Resource,envDetector,hostDetector,osDetector,processDetector,}from"@opentelemetry/resources";import{PeriodicExportingMetricReader}from"@opentelemetry/sdk-metrics";import{NodeSDK}from"@opentelemetry/sdk-node";import{BatchSpanProcessor}from"@opentelemetry/sdk-trace-node";import{ATTR_SERVICE_NAME,ATTR_SERVICE_VERSION,}from"@opentelemetry/semantic-conventions";import{ATTR_CONTAINER_IMAGE_TAGS,ATTR_DEPLOYMENT_ENVIRONMENT,}from"@opentelemetry/semantic-conventions/incubating";constsdk=newNodeSDK({textMapPropagator: newW3CTraceContextPropagator(),traceExporter: newOTLPTraceExporter(),resource: newResource({[ATTR_SERVICE_NAME]: process.env.SERVICE_NAME,[ATTR_DEPLOYMENT_ENVIRONMENT]: process.env.NEXT_PUBLIC_DEPLOYMENT_ENV,[ATTR_SERVICE_VERSION]: process.env.SERVICE_VERSION,[ATTR_CONTAINER_IMAGE_TAGS]: process.env.SERVICE_VERSION,}),instrumentations: [newHttpInstrumentation({requestHook: (span,request)=>{constroute=(requestasIncomingMessage)?.url;if(route){if(route&&(route.endsWith(".json")||!route.includes("."))){// Try to apply the route only for pages and client side fetchesconstrpcMetadata=getRPCMetadata(context.active());// retrieve rpc metadata from the active contextif(rpcMetadata){if(rpcMetadata?.type===RPCType.HTTP){rpcMetadata.route=route;}}else{setRPCMetadata(context.active(),{type: RPCType.HTTP,
route,
span,});}}}},}),newDnsInstrumentation(),newNetInstrumentation(),newUndiciInstrumentation(),],metricReader: newPeriodicExportingMetricReader({exporter: newOTLPMetricExporter(),}),spanProcessors: [newBatchSpanProcessor(newOTLPTraceExporter())],resourceDetectors: [awsEcsDetector,envDetector,processDetector,hostDetector,osDetector,],});sdk.start();
That request hook looks unsafe, especially around the route metadata in your requestHook. route must be just that, a Route - not a URL as that is likely to be high cardinality (think query strings, etc.).
Every unique set of attributes (including http.route) is a metrics stream. By default OTel (no matter which language implementation) has no limit set for metrics streams that can be allocated, and without that your app will run out of memory if the SDK is supplied high-cardinality metrics (it has to try to keep every set of attributes for every stream that was ever created in memory).
I think the fix would be to adapt either the hook to supply low-cardinality data to route. A safeguard for this to happen in the future is to use a cardinality limit through a View configuration in your NodeSDK constructor:
newNodeSDK({
... // your configviews: [newView({instrumentName: '*',// wildcard selector, means "apply this view to every instrument"aggregationCardinalityLimit: 2000// limits cardinality to 2000 streams per metric})]})
this will limit cardinality by introducing an overflow metric stream, you will loose data on your metric when hitting the limit though so the recommendation to adapt data passed to route is still valid even when using such a cardinality limit.
(a way to test my above theory would be to run without the request hook for a while and see if that makes a difference. If it still runs OOM then something else may be the culprit)
What happened?
Steps to Reproduce
Expected Result
Actual Result
Additional Details
metricReader
andresourceDetectors
are removed from the OpenTelemetry configuration, the OOM error no longer occurs. This indicates that these configurations might be contributing to the memory issue.memory trends for datadog-agent and my application are shown below:
datadog-agent's memory usage:

application's memory usage:

OpenTelemetry Setup Code
package.json
Relevant log output
Operating System and Version
Docker containers
Runtime and Version
datadog-agent: 7.50.3
Node.js: 20.15.1
The text was updated successfully, but these errors were encountered: