-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
amazon-cloudwatch-agent doesnt respect SIGTERM to flush metrics #961
Comments
here are the shutdown logs. I would have assumed the flushing to have happened between Stopping and Stopped state of Cloudwatch output plugin
|
Hello, Is it possible for you to provide as with cloudwatch-agent logs from |
The logs posted above should be the same logs as written in this logfile. Since Im running the agent in my process, the logs are part of the output from my process |
Here are the steps to reproduce it locally
Shutdown sequence
Also note that the behavior is the same if I set the intervals of telegraf to 1s (keeping the otel flush interval at 10sec). Sometimes:
Also we cant afford 3 seconds of flush time, because a lambda extension is shutdown by AWS lambda after a max of 2seconds |
Hello, I am working on recreating the issue and wanted to double check few points:
|
Built the agent on a darwin arm64 and linux x86_64 as well, both with the same effect |
Hello @okankoAMZ were you able to somehow reproduce the issue? |
Hi! |
Describe the bug
I am trying to run the aws-cloudwatch-agent as part of a lambda extension, so that metrics sent by lambda executions are batched.
Basically, I have a go program that runs this command under the hood with some static configuration files that are pasted below
amazon-cloudwatch-agent -config config.toml -otelconfig config.yaml
So far, the bootup works. However, the problem is that the agent doesnt seem to flush the metrics when it receives a SIGTERM signal. It just seems to close all extensions and shutdown without flushing any metrics to cloudwatch. This is really problematic for using inside a lambda extension because until a lambda environment shutdown if the flush interval is not reached, metrics are lost completely
Apparently, the telegraf agent understands a SIGTERM or SIGUSR1 and flushes its cache before shutting down.
This issue already goes in this direction.
What did you expect to see?
The metrics should be flushed during shutdown
What did you see instead?
The metrics are lost
What config did you use?
config.yaml
The text was updated successfully, but these errors were encountered: