You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The CloudWatch Agent fails to start if log and trace configurations are omitted. It's assumed at least one of the two exist.
Details
When config-translator is given an amazon-cloudwatch-agent.json without any tracing configurations, it will first generate an amazon-cloudwatch-agent.yaml that contains null and then delete it.
start-amazon-cloudwatch-agent does not check if amazon-cloudwatch-agent.yaml is deleted before calling amazon-cloudwatch-agent ... -otelconfig {...}/amazon-cloudwatch-agent.yaml.
When the CloudWatch Agent attempts to read the various config files, it assumes amazon-cloudwatch-agent.yaml will always exist if no logging configurations are specified.
The path for amazon-cloudwatch-agent.yaml is then passed to an OpenTelemetry configuration provider. When the provider attempts to read the file, it throws a not found error.
2024-08-28T06:06:41Z E! [telegraf] Error running agent: cannot resolve the configuration: cannot retrieve the configuration: unable to read the file file:/run/amazon-cloudwatch-agent/amazon-cloudwatch-agent.yaml: open /run/amazon-cloudwatch-agent/amazon-cloudwatch-agent.yaml: no such file or directory
Steps to reproduce
Start the CloudWatch Agent with log and trace configurations omitted.
We're currently trying to add amazon-cloudwatch-agent to the Nix package manager and a systemd unit to NixOS.
This currently involves rewriting the systemd configuration provided in this repository since it can't be used in NixOS due to the provided systemd configuration using start-amazon-cloudwatch-agent which hardcodes the agent installation directory.
This effectively does the same thing as start-amazon-cloudwatch-agent but without the path hardcoding.
Like start-amazon-cloudwatch-agent, this will always pass the -otelconfig option to amazon-cloudwatch-agent even if config-translator deletes the expected amazon-cloudwatch-agent.yaml file.
This was uncovered when running a NixOS test for this systemd unit which:
Starts a VM running NixOS with the agent as a systemd service. The agent is in onPremise mode without any log, metric, or trace configurations.
Waits for the agent service to be active.
Checks for the configuration files generated by config-translator and the PID file generated by the agent.
We noticed the agent was repeatedly crashing right after systemd started it. Checking the agent logs revealed this file not found error.
The text was updated successfully, but these errors were encountered:
commiterate
changed the title
Agent fails to start if log and trace configurations are omitted
Agent fails to start if log and trace configurations are omitted.
Aug 29, 2024
Describe the bug
The CloudWatch Agent fails to start if log and trace configurations are omitted. It's assumed at least one of the two exist.
Details
When
config-translator
is given anamazon-cloudwatch-agent.json
without any tracing configurations, it will first generate anamazon-cloudwatch-agent.yaml
that containsnull
and then delete it.https://github.com/aws/amazon-cloudwatch-agent/blob/v1.300045.0/cmd/config-translator/translator.go#L130
https://github.com/aws/amazon-cloudwatch-agent/blob/v1.300045.0/translator/cmdutil/translatorutil.go#L237
start-amazon-cloudwatch-agent
does not check ifamazon-cloudwatch-agent.yaml
is deleted before callingamazon-cloudwatch-agent ... -otelconfig {...}/amazon-cloudwatch-agent.yaml
.https://github.com/aws/amazon-cloudwatch-agent/blob/v1.300045.0/cmd/start-amazon-cloudwatch-agent/path.go#L68-L74
When the CloudWatch Agent attempts to read the various config files, it assumes
amazon-cloudwatch-agent.yaml
will always exist if no logging configurations are specified.https://github.com/aws/amazon-cloudwatch-agent/blob/v1.300045.0/cmd/amazon-cloudwatch-agent/amazon-cloudwatch-agent.go#L309-L332
The path for
amazon-cloudwatch-agent.yaml
is then passed to an OpenTelemetry configuration provider. When the provider attempts to read the file, it throws a not found error.Steps to reproduce
Start the CloudWatch Agent with log and trace configurations omitted.
What did you expect to see?
The agent doesn't crash.
What did you see instead?
The agent crashes.
What version did you use?
v1.300045.0
What config did you use?
amazon-cloudwatch-agent.json
Environment
OS: NixOS
Additional context
NixOS/nixpkgs#337212 (comment)
We're currently trying to add
amazon-cloudwatch-agent
to the Nix package manager and a systemd unit to NixOS.This currently involves rewriting the systemd configuration provided in this repository since it can't be used in NixOS due to the provided systemd configuration using
start-amazon-cloudwatch-agent
which hardcodes the agent installation directory.https://github.com/aws/amazon-cloudwatch-agent/blob/v1.300045.0/packaging/dependencies/amazon-cloudwatch-agent.service
#1319
The resulting systemd configuration looks approximately like this:
This effectively does the same thing as
start-amazon-cloudwatch-agent
but without the path hardcoding.Like
start-amazon-cloudwatch-agent
, this will always pass the-otelconfig
option toamazon-cloudwatch-agent
even ifconfig-translator
deletes the expectedamazon-cloudwatch-agent.yaml
file.This was uncovered when running a NixOS test for this systemd unit which:
onPremise
mode without any log, metric, or trace configurations.config-translator
and the PID file generated by the agent.We noticed the agent was repeatedly crashing right after systemd started it. Checking the agent logs revealed this file not found error.
The text was updated successfully, but these errors were encountered: