Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow X-Ray and Application Signals Port Compatibility #280

Merged
merged 6 commits into from
Jan 10, 2025
Merged

Conversation

musa-asad
Copy link
Contributor

@musa-asad musa-asad commented Dec 26, 2024

Description of the issue

AWS X-Ray can be utilized in two distinct configurations within the Amazon CloudWatch Agent: Application Signals X-Ray, which uses the AWS Distro for OpenTelemetry (ADOT) SDK for instrumentation, and the traditional X-Ray SDK configuration. To collect Application Signals traces, we can specify application_signals under the traces section and to collect X-Ray SDK traces, we can specify xray under the traces section.

Currently, when users of the Amazon CloudWatch Agent Operator try to utilize Application Signals and X-Ray (via the X-Ray SDK), they run into the following issues:

  • The same port number, despite being of a different protocol, can't be configured on the cloudwatch-agent service. This makes it that when Application Signals is enabled, it overrides UDP port 2000 for X-Ray trace segment collection with TCP port 2000 for Application Signals X-Ray sampling rules. This means users can't collect X-Ray SDK traces when Application Signals is enabled, which holds true when Application Signals is only enabled for metrics.
  • The TCP port 2000 isn't set up when Application Signals is only set up for traces, but it is set up when it's set up for metrics.
  • The TCP port 2000 isn't set up by default with the default X-Ray configuration because it expects a tcp_proxy to be configured in the CloudWatch Agent.

Description of changes

  • Re-arranged constants and receiverDefaultPortsMap to make them consistent.
  • Added isAppSignalEnabledTraces() and GetApplicationSignalsTracesConfig() to be able to check when traces are enabled under application_signals.
  • Updated getReceiverServicePort() logic to be able to allow for the same port for different protocols.
  • Always call getReceiverServicePort() for XrayProxy, even if it isn't passed in the agent configuration.
  • Update getApplicationSignalsReceiversServicePorts() to only enable port TCP 2000 when traces are enabled for application_signals. This isn't breaking since trace collection wouldn't work when Application Signals is enabled under metrics only since the X-Ray exporter isn't configured in such a case: https://github.com/aws/amazon-cloudwatch-agent/blob/b462f7ff9a6cfd9845c49da1602b110875ac169a/translator/translate/otel/pipeline/applicationsignals/translator.go#L75.
  • Added unit tests to verify Application Signals and X-Ray work together as expected.
  • Fixed imports.

Testing

In my test set-up, I created an EKS cluster that deploys the Amazon CloudWatch Observability EKS Add-On. For before testing, I just used the image that was deployed with the add-on for the operator. For the after testing, I used the image built from this appsig-xray branch. As for the sample applications, I deployed one instrumented with an ADOT SDK by following https://aws-otel.github.io/docs/getting-started/adot-eks-add-on/sample-app-deprecated and one instrumented with an X-Ray SDK by creating a K8s deployment that includes a Docker image to a basic Node.js Express app that integrates AWS X-Ray SDK to trace HTTP requests and create dummy segments. I created this app myself, but a similar application is https://github.com/aws-samples/aws-xray-sdk-node-sample/blob/master/index.js for reference.

The ADOT SDK sample application is configured with the following environmental variables:

Environment:                                                                                                                                                                          │
│       AWS_REGION:                   us-west-2                                                                                                                                             │
│       LISTEN_ADDRESS:               0.0.0.0:4567                                                                                                                                          │
│       OTEL_EXPORTER_OTLP_ENDPOINT:  http://cloudwatch-agent.amazon-cloudwatch:4315                                                                                                        │
│       OTEL_RESOURCE_ATTRIBUTES:     service.namespace=GettingStarted,service.name=GettingStartedService                                                                                   │
│       OTEL_TRACES_SAMPLER:          xray                                                                                                                                                  │
│       OTEL_TRACES_SAMPLER_ARG:      endpoint=http://cloudwatch-agent.amazon-cloudwatch:2000

The X-Ray SDK sample application is configured with the following environmental variables:

Environment:                                                                                                                                                                          │
│       AWS_XRAY_DAEMON_ADDRESS:  cloudwatch-agent.amazon-cloudwatch.svc.cluster.local:2000                                                                                                 │
│       AWS_XRAY_DEBUG_MODE:      1

Relevant parts of the translated OTEL configuration:

│     awsproxy/application_signals:                                                                                                                                                         │
│         aws_endpoint: ""                                                                                                                                                                  │
│         certificate_file_path: ""                                                                                                                                                         │
│         dialer:                                                                                                                                                                           │
│             timeout: 0s                                                                                                                                                                   │
│         endpoint: 0.0.0.0:2000                                                                                                                                                            │
│         imds_retries: 1                                                                                                                                                                   │
│         local_mode: false                                                                                                                                                                 │
│         profile: ""                                                                                                                                                                       │
│         proxy_address: ""                                                                                                                                                                 │
│         region: us-west-2                                                                                                                                                                 │
│         role_arn: ""                                                                                                                                                                      │
│         service_name: ""
...
│     awsxray:                                                                                                                                                                              │
│         dialer:                                                                                                                                                                           │
│             timeout: 0s                                                                                                                                                                   │
│         endpoint: 0.0.0.0:2000                                                                                                                                                            │
│         proxy_server:                                                                                                                                                                     │
│             aws_endpoint: ""                                                                                                                                                              │
│             certificate_file_path: ""                                                                                                                                                     │
│             dialer:                                                                                                                                                                       │
│                 timeout: 0s                                                                                                                                                               │
│             endpoint: 0.0.0.0:2000                                                                                                                                                        │
│             imds_retries: 1                                                                                                                                                               │
│             local_mode: false                                                                                                                                                             │
│             profile: ""                                                                                                                                                                   │
│             proxy_address: ""                                                                                                                                                             │
│             region: us-west-2                                                                                                                                                             │
│             role_arn: ""                                                                                                                                                                  │
│             service_name: xray                                                                                                                                                            │
│         transport: udp                                                                                                                                                                    │
│     otlp/application_signals:                                                                                                                                                             │
│         protocols:                                                                                                                                                                        │
│             grpc:                                                                                                                                                                         │
│                 dialer:                                                                                                                                                                   │
│                     timeout: 0s                                                                                                                                                           │
│                 endpoint: 0.0.0.0:4315                                                                                                                                                    │
│                 include_metadata: false                                                                                                                                                   │
│                 max_concurrent_streams: 0                                                                                                                                                 │
│                 max_recv_msg_size_mib: 0                                                                                                                                                  │
│                 read_buffer_size: 524288                                                                                                                                                  │
│                 transport: tcp                                                                                                                                                            │
│                 write_buffer_size: 0                                                                                                                                                      │
│             http:                                                                                                                                                                         │
│                 endpoint: 0.0.0.0:4316                                                                                                                                                    │
│                 include_metadata: false                                                                                                                                                   │
│                 logs_url_path: /v1/logs                                                                                                                                                   │
│                 max_request_body_size: 0                                                                                                                                                  │
│                 metrics_url_path: /v1/metrics                                                                                                                                             │
│                 traces_url_path: /v1/traces

Enable Ports for Traces (Application Signals)

config: '{"agent":{"debug":true,"region":"us-west-2"},"logs":{"metrics_collected":{"kubernetes":{"cluster_name":"xray-appsig","enhanced_container_insights":true}}},"traces":{"traces_collected":{"application_signals":{}}}}'

Before

No service deployed.

After

Screenshot 2024-12-29 at 7 37 39 PM

Don't Enable TCP 2000 for Metrics (Application Signals)

config: '{"agent":{"debug":true,"region":"us-west-2"},"logs":{"metrics_collected":{"application_signals":{},"kubernetes":{"cluster_name":"xray-appsig","enhanced_container_insights":true}}},"traces":{"traces_collected":{}}}'

Before

Screenshot 2024-12-29 at 6 45 49 PM

After

Screenshot 2024-12-29 at 7 41 27 PM

Set Up TCP Port 2000 when X-Ray is Enabled by Default

config: '{"agent":{"debug":true,"region":"us-west-2"},"logs":{"metrics_collected":{"kubernetes":{"cluster_name":"xray-appsig","enhanced_container_insights":true}}},"traces":{"traces_collected":{"xray":{"bind_address":"0.0.0.0:2001"}}}}'

Before

Screenshot 2024-12-29 at 6 50 13 PM

After

Screenshot 2024-12-29 at 7 44 42 PM

X-Ray and Application Signals Emit Together

config: '{"agent":{"debug":true,"region":"us-west-2"},"logs":{"metrics_collected":{"application_signals":{},"kubernetes":{"cluster_name":"xray-appsig","enhanced_container_insights":true}}},"traces":{"traces_collected":{"application_signals":{},"xray":{"bind_address":"0.0.0.0:2000","tcp_proxy":{"bind_address":"0.0.0.0:2000"}}}}}'

Before

Screenshot 2024-12-29 at 8 08 36 PM

As seen below, we are only able to collect traces from an application instrumented with an ADOT SDK, and not an application instrumented with the X-Ray SDK:
Screenshot 2024-12-29 at 8 11 20 PM

After

Screenshot 2024-12-29 at 7 58 41 PM

As seen below, we're able to collect traces from an application instrumented with an ADOT SDK and an application instrumented with the X-Ray SDK:
Screenshot 2024-12-29 at 8 03 25 PM

Additionally, when we apply the following sampling rule options:

Priority: 1
Fixed Rate: 0.0
Reservoir Size: 0

Then we don't get traces from the X-Ray SDK Application and ADOT SDK Application on the console, meaning there is no port collision for 2000 when fetching sampling rules. We know that there is no port conflict because both correctly don't send traces under this sampling rule and both sample applications include a sampling rule endpoint of port 2000.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.


if isApplicationSignalsEnabled {
if agentConfig.GetApplicationSignalsConfig().TLS != nil {
if agentConfig.GetApplicationSignalsMetricsConfig().TLS != nil {
exporterPrefix = https
Copy link
Contributor Author

@musa-asad musa-asad Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done in a separate PR, but we should also set exporterPrefix = https when agentConfig.GetApplicationSignalsTracesConfig().TLS != nil since we can also set up TLS in the traces section for the agent: https://github.com/aws/amazon-cloudwatch-agent/blob/03cbd6b0b46837a8025807bbff2254c4a0e1e3c0/translator/translate/otel/receiver/otlp/translator.go#L94-L98. Also, for auto-instrumentation, we should only pass in the environmental variables needed depending on the Application Signals configuration (e.g. OTEL_EXPORTER_OTLP_TRACES_ENDPOINT should only be set for traces, and not metrics.) with the GetApplicationSignalsTracesConfig() function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there going to be a follow-up PR for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@musa-asad musa-asad marked this pull request as ready for review December 26, 2024 21:44
@musa-asad musa-asad requested review from sky333999 and removed request for lisguo and mitali-salvi December 27, 2024 20:27
nathalapooja
nathalapooja previously approved these changes Jan 2, 2025
jefchien
jefchien previously approved these changes Jan 9, 2025
Copy link
Contributor

@jefchien jefchien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are some minor lint check failures from previous commits. Not sure why the checks pass.


if isApplicationSignalsEnabled {
if agentConfig.GetApplicationSignalsConfig().TLS != nil {
if agentConfig.GetApplicationSignalsMetricsConfig().TLS != nil {
exporterPrefix = https
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there going to be a follow-up PR for this change?

internal/manifests/collector/ports.go Outdated Show resolved Hide resolved
@musa-asad musa-asad dismissed stale reviews from jefchien and nathalapooja via 8d9576e January 10, 2025 08:57
@musa-asad musa-asad merged commit a4cec37 into main Jan 10, 2025
9 checks passed
@musa-asad musa-asad deleted the appsig-xray branch January 10, 2025 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants