Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[detectors-aws] incomplete containerId detected for AWS ECS Fargate #2455

Open
Steffen911 opened this issue Sep 27, 2024 · 5 comments
Open
Assignees
Labels
bug Something isn't working good first issue pkg:resource-detector-aws priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect up-for-grabs Good for taking. Extra help will be provided by maintainers

Comments

@Steffen911
Copy link

What version of OpenTelemetry are you using?

    "@opentelemetry/api": "^1.9.0",
    "@opentelemetry/core": "^1.26.0",
    "@opentelemetry/exporter-trace-otlp-proto": "^0.53.0",
    "@opentelemetry/instrumentation": "^0.53.0",
    "@opentelemetry/instrumentation-aws-sdk": "^0.44.0",
    "@opentelemetry/instrumentation-http": "^0.53.0",
    "@opentelemetry/instrumentation-ioredis": "^0.43.0",
    "@opentelemetry/instrumentation-winston": "^0.40.0",
    "@opentelemetry/resource-detector-aws": "^1.6.1",
    "@opentelemetry/resource-detector-container": "^0.4.1",
    "@opentelemetry/resources": "^1.26.0",
    "@opentelemetry/sdk-node": "^0.53.0",
    "@opentelemetry/sdk-trace-base": "^1.26.0",
    "@opentelemetry/sdk-trace-node": "^1.26.0",
    "@opentelemetry/winston-transport": "^0.6.0",

What version of Node are you using?

20

What did you do?

I initialize my tracing similar to the setup below as one of the first actions within my Node.js express server. I bundle the whole application via Docker and deploy it to AWS ECS on Fargate.

import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-proto";
import { AwsInstrumentation } from "@opentelemetry/instrumentation-aws-sdk";
import {
  envDetector,
  processDetector,
  Resource,
} from "@opentelemetry/resources";
import { awsEcsDetectorSync } from "@opentelemetry/resource-detector-aws";
import { containerDetector } from "@opentelemetry/resource-detector-container";
import { env } from "@/src/env.mjs";

  const sdk = new NodeSDK({
    resource: new Resource({
      "service.name": env.OTEL_SERVICE_NAME,
    }),
    traceExporter: new OTLPTraceExporter({
      url: `${env.OTEL_EXPORTER_OTLP_ENDPOINT}/v1/traces`,
    }),
    instrumentations: [
      new AwsInstrumentation(),
    ],
    resourceDetectors: [
      envDetector,
      processDetector,
      awsEcsDetectorSync,
      containerDetector,
    ],
  });

  sdk.start();
}

What did you expect to see?

I would expect to see something like <taskId>-<containerId> as the container.id in my span attributes. This would fit the conventions that observability vendors like Datadog use. For a task named c23e5f76c09d438aa1824ca4058bdcab I'd expect to see something like c23e5f76c09d438aa1824ca4058bdcab-1234678 for a single container.

What did you see instead?

As part of aws/amazon-ecs-agent#1119 AWS apparently shifted their cgroup naming convention to something like /ecs/<taskId>/<taskId>-<containerId>. Given the 64 character limit it usually cuts off in the middle of taskId. For the example above, this would yield a container.id value like 438aa1824ca4058bdcab/c23e5f76c09d438aa1824ca4058bdcab-1234678.

Is there some way to consistently receive only the part after the / in the cgroup name, i.e. the last chunk? Happy to contribute in case this seems desirable. It should probably follow a regex based approach like in DataDog/dd-trace-js#1176.

@Steffen911 Steffen911 added the bug Something isn't working label Sep 27, 2024
@pichlermarc pichlermarc added pkg:resource-detector-aws priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect up-for-grabs Good for taking. Extra help will be provided by maintainers good first issue labels Oct 2, 2024
@Victorsesan
Copy link

Victorsesan commented Oct 10, 2024

@pichlermarc Please assign this to me let me give it a try.

@Annosha
Copy link

Annosha commented Oct 12, 2024

@Victorsesan are you actively working on it?

@Victorsesan
Copy link

@Annosha Nope ,has not yet been assigend to anyone @pichlermarc > @Victorsesan are you actively working on it?

@pichlermarc
Copy link
Member

I'll assign @Victorsesan as they were first.

@Victorsesan
Copy link

@pichlermarc Thanks for the opportunity man i appreciate, i have gone through datadog trace issue but couldn't get my head around the regex based approach, and pardon my ignorance though I'm a complete novice at this i still want to fix it. I'm not 100% sure if the incomplete containerId is found in the provided setup above or from which of the aws detectors file in the repo is it found? If from the setup above, i have managed to make some few changes which might help fix the issue; I have created a custom resource detector that can processes the cgroup name and extracts the last segment, and for the regex approach which i'm not sure is the desired based approach which matches that of datadog, I have used a regex pattern like this [^/]+$ to match everything after the last / and extract it. @Steffen911 Any pointers you have on how to implement or go about this to help serve your need will also be very much appreciated. TIA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue pkg:resource-detector-aws priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect up-for-grabs Good for taking. Extra help will be provided by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants