-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow X-Ray and Application Signals Port Compatibility #280
Conversation
|
||
if isApplicationSignalsEnabled { | ||
if agentConfig.GetApplicationSignalsConfig().TLS != nil { | ||
if agentConfig.GetApplicationSignalsMetricsConfig().TLS != nil { | ||
exporterPrefix = https |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be done in a separate PR, but we should also set exporterPrefix = https
when agentConfig.GetApplicationSignalsTracesConfig().TLS != nil
since we can also set up TLS in the traces section for the agent: https://github.com/aws/amazon-cloudwatch-agent/blob/03cbd6b0b46837a8025807bbff2254c4a0e1e3c0/translator/translate/otel/receiver/otlp/translator.go#L94-L98. Also, for auto-instrumentation, we should only pass in the environmental variables needed depending on the Application Signals configuration (e.g. OTEL_EXPORTER_OTLP_TRACES_ENDPOINT
should only be set for traces, and not metrics.) with the GetApplicationSignalsTracesConfig()
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there going to be a follow-up PR for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there are some minor lint check failures from previous commits. Not sure why the checks pass.
|
||
if isApplicationSignalsEnabled { | ||
if agentConfig.GetApplicationSignalsConfig().TLS != nil { | ||
if agentConfig.GetApplicationSignalsMetricsConfig().TLS != nil { | ||
exporterPrefix = https |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there going to be a follow-up PR for this change?
Description of the issue
AWS X-Ray can be utilized in two distinct configurations within the Amazon CloudWatch Agent: Application Signals X-Ray, which uses the AWS Distro for OpenTelemetry (ADOT) SDK for instrumentation, and the traditional X-Ray SDK configuration. To collect Application Signals traces, we can specify
application_signals
under thetraces
section and to collect X-Ray SDK traces, we can specifyxray
under the traces section.Currently, when users of the Amazon CloudWatch Agent Operator try to utilize Application Signals and X-Ray (via the X-Ray SDK), they run into the following issues:
cloudwatch-agent
service. This makes it that when Application Signals is enabled, it overrides UDP port 2000 for X-Ray trace segment collection with TCP port 2000 for Application Signals X-Ray sampling rules. This means users can't collect X-Ray SDK traces when Application Signals is enabled, which holds true when Application Signals is only enabled for metrics.tcp_proxy
to be configured in the CloudWatch Agent.Description of changes
receiverDefaultPortsMap
to make them consistent.isAppSignalEnabledTraces()
andGetApplicationSignalsTracesConfig()
to be able to check when traces are enabled underapplication_signals
.getReceiverServicePort()
logic to be able to allow for the same port for different protocols.getReceiverServicePort()
forXrayProxy
, even if it isn't passed in the agent configuration.getApplicationSignalsReceiversServicePorts()
to only enable port TCP 2000 when traces are enabled forapplication_signals
. This isn't breaking since trace collection wouldn't work when Application Signals is enabled under metrics only since the X-Ray exporter isn't configured in such a case: https://github.com/aws/amazon-cloudwatch-agent/blob/b462f7ff9a6cfd9845c49da1602b110875ac169a/translator/translate/otel/pipeline/applicationsignals/translator.go#L75.Testing
In my test set-up, I created an EKS cluster that deploys the Amazon CloudWatch Observability EKS Add-On. For
before
testing, I just used the image that was deployed with the add-on for the operator. For theafter
testing, I used the image built from thisappsig-xray
branch. As for the sample applications, I deployed one instrumented with an ADOT SDK by following https://aws-otel.github.io/docs/getting-started/adot-eks-add-on/sample-app-deprecated and one instrumented with an X-Ray SDK by creating a K8s deployment that includes a Docker image to a basic Node.js Express app that integrates AWS X-Ray SDK to trace HTTP requests and create dummy segments. I created this app myself, but a similar application is https://github.com/aws-samples/aws-xray-sdk-node-sample/blob/master/index.js for reference.The ADOT SDK sample application is configured with the following environmental variables:
The X-Ray SDK sample application is configured with the following environmental variables:
Relevant parts of the translated OTEL configuration:
Enable Ports for Traces (Application Signals)
config: '{"agent":{"debug":true,"region":"us-west-2"},"logs":{"metrics_collected":{"kubernetes":{"cluster_name":"xray-appsig","enhanced_container_insights":true}}},"traces":{"traces_collected":{"application_signals":{}}}}'
Before
No service deployed.
After
Don't Enable TCP 2000 for Metrics (Application Signals)
config: '{"agent":{"debug":true,"region":"us-west-2"},"logs":{"metrics_collected":{"application_signals":{},"kubernetes":{"cluster_name":"xray-appsig","enhanced_container_insights":true}}},"traces":{"traces_collected":{}}}'
Before
After
Set Up TCP Port 2000 when X-Ray is Enabled by Default
config: '{"agent":{"debug":true,"region":"us-west-2"},"logs":{"metrics_collected":{"kubernetes":{"cluster_name":"xray-appsig","enhanced_container_insights":true}}},"traces":{"traces_collected":{"xray":{"bind_address":"0.0.0.0:2001"}}}}'
Before
After
X-Ray and Application Signals Emit Together
config: '{"agent":{"debug":true,"region":"us-west-2"},"logs":{"metrics_collected":{"application_signals":{},"kubernetes":{"cluster_name":"xray-appsig","enhanced_container_insights":true}}},"traces":{"traces_collected":{"application_signals":{},"xray":{"bind_address":"0.0.0.0:2000","tcp_proxy":{"bind_address":"0.0.0.0:2000"}}}}}'
Before
As seen below, we are only able to collect traces from an application instrumented with an ADOT SDK, and not an application instrumented with the X-Ray SDK:
After
As seen below, we're able to collect traces from an application instrumented with an ADOT SDK and an application instrumented with the X-Ray SDK:
Additionally, when we apply the following sampling rule options:
Then we don't get traces from the X-Ray SDK Application and ADOT SDK Application on the console, meaning there is no port collision for 2000 when fetching sampling rules. We know that there is no port conflict because both correctly don't send traces under this sampling rule and both sample applications include a sampling rule endpoint of port 2000.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.