-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aws::CloudWatch::Client doesn't automatically refresh credentials #3162
Comments
You are correct that the We added "static stability" logic awhile back that allow expired credentials to be used after their expiration when the underlying credentials service isn't refreshing credentials on the hosts (eg: an outage). Are you seeing any logs like Have you tested your PR that does the explicit credentials refresh - does that fix the issue for you? |
Hi @alextwoods, I’m unable to reproduce the error, so I can’t verify whether the issue has been resolved. Even if my PR makes it to production, I won’t be able to confirm if it fixes the issue for me, as the error occurs randomly across different production environments.
No, I didn't find any logs such as what you suggested. |
This may be a naive question, but are you certain that the other gem is resolving instance profile credentials, and credentials are not set globally or in some other way? Reading your other thread, it looks like you tried creating a client on your own, and the credentials defaulted to instance profile credentials (which is last in the chain) but it's possible in your app in production, it is set to something else. I would actually output an instance of your credentials provider after deploying somehow. |
Hi @mullermp,
The other gem's CloudWatch publisher requires a client argument, and users can determine how this client is configured. In my case, I explicitly pass
I shared this in the other thread as well, but here’s the truncated output of
This confirms that the client resolves to |
Yes I saw that. Try passing an instance of instance profile credentials explicitly to the client in your production code. Looking at the gem source, I don't think any patch is necessary. The SDK will already refresh and retry if necessary. |
@mullermp How come it doesn't already refresh and retry if it does use an instance of instance profile credentials? |
It does and should. You can verify that by fetching instance credentials from irb or pry from your host and inspect them after accessing them before expiration. How did you verify the credentials class in production? Using a rails console or print statement? Or through local testing? By passing in credentials explicitly, we can be sure that is what is used. What version of aws-sdk-cloudwatch and aws-sdk-core are you using? |
Rails console. I initialized an
aws-sdk-cloudwatch (1.108.0) |
They shouldn't differ. Trust but verify, but did you pass rails env as production in your console? Are you setting global configuration anywhere such as Aws.config[key]? Otherwise I'm not sure what may be going on. I'm inclined to believe it's something with your environment because if those credentials don't refresh then there would be a lot more noise and our internal usage would stop working. |
We use a setup where the Rails environment name matches the server name (e.g.,
No, we are not setting
Interestingly, the only component affected is the |
Do you use other AWS services in your application? If so, its interesting that you would get expired token issues only in cloudwatch. Do you know if you are using IMDS V1 or V2? And do you know if you've disabled V1 through configuration (example the |
Yes, S3, SSM, STS...
We're using IMDS V2.
Yes, IMDS V1 is disabled. |
I believe the backtrace of the error could provide valuable insights for the investigation: /gems/aws-sdk-core-3.214.0/lib/seahorse/client/plugins/raise_response_errors.rb:17 in call
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/checksum_algorithm.rb:111 in call
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:16 in call
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/invocation_id.rb:16 in call
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/idempotency_token.rb:19 in call
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/param_converter.rb:26 in call
/gems/aws-sdk-core-3.214.0/lib/seahorse/client/plugins/request_callback.rb:89 in call
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/response_paging.rb:12 in call
/gems/aws-sdk-core-3.214.0/lib/seahorse/client/plugins/response_target.rb:24 in call
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/telemetry.rb:39 in block in call
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/telemetry/no_op.rb:29 in in_span
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/telemetry.rb:53 in span_wrapper
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/telemetry.rb:39 in call
/gems/aws-sdk-core-3.214.0/lib/seahorse/client/request.rb:72 in send_request
/gems/aws-sdk-cloudwatch-1.108.0/lib/aws-sdk-cloudwatch/client.rb:3865 in put_metric_data
/gems/sidekiq-cloudwatchmetrics-2.6.0/lib/sidekiq/cloudwatchmetrics.rb:235 in block in publish
/gems/sidekiq-cloudwatchmetrics-2.6.0/lib/sidekiq/cloudwatchmetrics.rb:234 in each
/gems/sidekiq-cloudwatchmetrics-2.6.0/lib/sidekiq/cloudwatchmetrics.rb:234 in each_slice
/gems/sidekiq-cloudwatchmetrics-2.6.0/lib/sidekiq/cloudwatchmetrics.rb:234 in publish
/gems/sidekiq-cloudwatchmetrics-2.6.0/lib/sidekiq/cloudwatchmetrics.rb:77 in run
/gems/sidekiq-6.5.12/lib/sidekiq/component.rb:8 in watchdog
/gems/sidekiq-6.5.12/lib/sidekiq/component.rb:17 in block in safe_thread |
I don't think that backtrace is helpful. One thing you can try is to add |
Yeah, if possible adding http_debug_output to an instance of the instance profile credentials would be the most useful as otherwise theres no logging from the credential refresh. My best guess is that something unusual is going on with the refresh (maybe an error from the instance thats getting ignored?). If you can't enable before_refresh = proc do |creds| # called with self
# do some logging....
end
credentials = Aws::InstanceProfileCredentials.new(before_refresh: before_refresh).
client = Aws::Cloudwatch::Client.new(credentials: credentials) |
Are the |
You can pass an instance of a logger (or technically anything that responds to credentials = Aws::InstanceProfileCredentials.new(http_debug_output: Rails.logger) But again as a general warning - this will log the actual requests and responses which include credentials. |
Hey @alextwoods It seems like the current direction of resolving this issue is using the forked version of If I'll find that it does solve the issue, would it prove the need for a fix on your side? Is there something else I can do to provide extra information? |
I think we would want proof and root cause before taking a patch. Your fix in the other gem is a brute force solution which just retries after refreshing. I am inclined to believe it is something with your setup or environment. |
@Roy-Gal-Git - yeah that makes sense. I'd recommend in that case at least adding a I do think it will be useful to know whether you're sidekiq patch does address the issue. An additional possible cause - time drift on your instance - if the system clock is off, the refresh of credentials on in the instance itself will be incorrect. Are you running anything like chrony, Amazon time Sync service or some other NTP setup? |
I see that the current label is
No, we don't use those. |
@mullermp @alextwoods I have 2 small updates:
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/sign.rb:119 in rescue in initialize
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/sign.rb:108 in initialize
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/sign.rb:33 in new
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/sign.rb:33 in signer_for
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/sign.rb:45 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/transfer_encoding.rb:27 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/helpful_socket_errors.rb:12 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/s3_signer.rb:53 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/redirects.rb:20 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/user_agent.rb:69 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/retry_errors.rb:365 in block in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/user_agent.rb:60 in metric
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/retry_errors.rb:385 in with_metric
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/retry_errors.rb:365 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/md5s.rb:32 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/http_checksum.rb:20 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/endpoint_pattern.rb:30 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/checksum_algorithm.rb:137 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/request_compression.rb:94 in block in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/request_compression.rb:104 in with_metric
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/request_compression.rb:94 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/rest/content_type_handler.rb:27 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/express_session_auth.rb:50 in block in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/express_session_auth.rb:56 in with_metric
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/express_session_auth.rb:50 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/expect_100_continue.rb:23 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/bucket_name_restrictions.rb:21 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/rest/handler.rb:10 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/recursion_detection.rb:18 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/endpoints.rb:52 in block in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/user_agent.rb:60 in metric
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/endpoints.rb:66 in with_metrics
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/endpoints.rb:52 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/endpoint_discovery.rb:84 in call
/gems/aws-sdk-core-3.212.0/lib/seahorse/client/plugins/endpoint.rb:46 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/param_validator.rb:26 in call
/gems/aws-sdk-core-3.212.0/lib/seahorse/client/plugins/raise_response_errors.rb:16 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/sse_cpk.rb:24 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/dualstack.rb:21 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/accelerate.rb:43 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/checksum_algorithm.rb:111 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:16 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/invocation_id.rb:16 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/idempotency_token.rb:19 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/param_converter.rb:26 in call
/gems/aws-sdk-core-3.212.0/lib/seahorse/client/plugins/request_callback.rb:89 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/response_paging.rb:12 in call
/gems/aws-sdk-core-3.212.0/lib/seahorse/client/plugins/response_target.rb:24 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/telemetry.rb:39 in block in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/telemetry/no_op.rb:29 in in_span
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/telemetry.rb:53 in span_wrapper
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/telemetry.rb:39 in call
/gems/aws-sdk-core-3.212.0/lib/seahorse/client/request.rb:72 in send_request
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/client.rb:16571 in put_object
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/file_uploader.rb:66 in block in put_object
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/file_uploader.rb:55 in block in open_file
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/file_uploader.rb:55 in open
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/file_uploader.rb:55 in open_file
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/file_uploader.rb:65 in put_object
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/file_uploader.rb:46 in block in upload
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/user_agent.rb:60 in metric
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/file_uploader.rb:40 in upload
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/customizations/object.rb:477 in block in upload_file
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/user_agent.rb:60 in metric
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/customizations/object.rb:476 in upload_file And
/gems/aws-sigv4-1.9.1/lib/aws-sigv4/signer.rb:717 in extract_credentials_provider
/gems/aws-sigv4-1.9.1/lib/aws-sigv4/signer.rb:147 in initialize
/gems/aws-sdk-s3-1.113.2/lib/aws-sdk-s3/plugins/s3_signer.rb:235 in new
/gems/aws-sdk-s3-1.113.2/lib/aws-sdk-s3/plugins/s3_signer.rb:235 in build_v4_signer
/gems/aws-sdk-s3-1.113.2/lib/aws-sdk-s3/plugins/s3_signer.rb:14 in block in <class:S3Signer>
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:72 in call
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:215 in block in resolve_defaults
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:59 in each
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:59 in each
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:214 in resolve_defaults
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:207 in value_at
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:191 in block in resolve
/usr/local/lib/ruby/3.2.0/set.rb:511 in each_key
/usr/local/lib/ruby/3.2.0/set.rb:511 in each
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:191 in resolve
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:179 in apply_defaults
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:152 in build!
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/base.rb:65 in build_config
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/base.rb:21 in initialize
/gems/aws-sdk-s3-1.113.2/lib/aws-sdk-s3/client.rb:435 in initialize
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/base.rb:102 in new
/gems/aws-sdk-s3-1.113.2/lib/aws-sdk-s3/resource.rb:28 in initialize Do you think that the 3 errors are related in any kind of way?
|
With the test for you are doing with the patch, are you also running with the If the patch does work, we'll want to understand if the SDK is trying to refresh credentials already. Those errors could be related - in general both are caused by some environment setup issue that results in the SDK being unable to find valid credentials during client initialization. Assuming these errors are happening from code running on EC2/ECS it would indicate a potential issue with IMDS not having valid credentials. This could happen for example if there is a clock skew issue on the host or something else is misconfigured. |
No, it's just the patch.
It runs on EC2 machines that are managed by an Elastic Beanstalk environment. |
Your previous code had expired credentials (a valid ruby object with just invalid credentials) but that new error indicates your entire credentials object has been blown away. Here is where that is raised: https://github.com/aws/aws-sdk-ruby/blob/version-3/gems%2Faws-sdk-core%2Flib%2Faws-sdk-core%2Fplugins%2Fsign.rb. I do think it's still something with your setup or environment - no SDK code will remove credentials from configuration that I'm aware of. |
@mullermp Interesting! Is it possible that initializing Aws::S3::Resource.new(region: 'REGION_PLACEHOLDER', access_key_id: nil, secret_access_key: nil) I also noticed that sometimes we pass a hardcoded region, while other times we use a memoized instance of For context, we always use the latest AMI for our Elastic Beanstalk environments and don’t make any manual changes to it. In practice, we run everything with Docker on ECS. Could something in this setup be affecting credential persistence? |
Why are you setting a fake region and nil credentials in your client/resource? |
@mullermp I just changed the region to the placeholder. It's |
I was wrong when I said we don't use those services. The instance is using Amazon Time Sync Service (169.254.169.123) with Chrony. The system time is accurate, with an offset of only a few nanoseconds, and chronyc tracking confirms synchronization is stable. Time drift is not an issue. |
You should not need to set any keys to nil at all for the credentials provider chain to work. They can be omitted. At some point, your credentials are being blown away. I'm not sure why, but perhaps usage in your code base that we cannot see. |
So is there anything else I can provide for you to be able to investigate this further? |
I can't really think of any, I'm sorry. The code in the cloudwatch metrics sidekiq gem looks fine - it just takes a client instance parameter and calls an operation. Are you always supplying that parameter of your created client? You should only need to create the client once, perhaps as a singleton early in your app. Confirm it is actually resolving instance profile credentials and confirm you can call |
I already confirmed both, which is why I implemented this patch for the
Yes, we always supply the client parameter. The AWS CloudWatch client is created once in our Sidekiq initializer and reused within the Sidekiq process. aws_region = AWSable.get_region # Memoized `Aws::EC2Metadata` object fetches region
Sidekiq::CloudWatchMetrics.enable!(
client: Aws::CloudWatch::Client.new(region: aws_region),
namespace: "FILTERED" # Placeholder value
)
Same here. Since deploying the patched version, we have had 0 occurrences of:
Our application does not explicitly use multithreading. However, some gems may internally use background threads.
No, we pass it directly in the Sidekiq initializer (see the code block above). |
Can you try creating an instance of your client outside of With that patch, just to confirm, you went from expired credentials error to no credentials are configured, correct? |
I don’t think it matters whether the client is created inside or outside Even if Sidekiq runs multiple threads within a process, the AWS SDK client is thread-safe, so it should not cause any issues.
No, the no credentials are configured is another problem we have that I thought might result from the same root cause. |
Its good to know that the patch is working and seems to have eliminated the issue - but its also a bit confusing. The only possible reason I could see for the patch to fix the issue is if the credentials have an expiration time in the future and so don't trigger the refresh logic but the service considers them expired/invalid. I think there are two things you can do to help us understand what might be going on:
logger.warn("#{@client.class} security token expired (expiration: #{@client.config.credentials.expiration}). Refreshing client and retrying... (attempt #{retry_count})") |
Describe the bug
We use the sidekiq-cloudwatchmetrics gem to send Sidekiq-related metrics to CloudWatch. Occasionally, we encounter the following error:
Aws::CloudWatch::Errors::ExpiredToken: The security token included in the request is expired
Upon investigation, we found that the gem initializes an
Aws::CloudWatch::Client
, which usesAws::InstanceProfileCredentials
. These credentials are supposed to auto-refresh before expiration. However, the error indicates that the token is not being refreshed in time, causing the metrics to fail. Once the error starts, it repeats, and the metrics fail to recover.Regression Issue
Expected Behavior
The
Aws::InstanceProfileCredentials
should automatically refresh the security token before expiration (approximately 5 minutes before expiry) and ensure that the put_metric_data API call succeeds without errors. Metrics should continue to be sent to CloudWatch without interruption.Current Behavior
The
Aws::CloudWatch::Client
raisesAws::CloudWatch::Errors::ExpiredToken
errors when attempting to send metrics.Once this error occurs:
Aws::InstanceProfileCredentials
does not appear to trigger correctly in this context.Reproduction Steps
Configure the sidekiq-cloudwatchmetrics gem in a Sidekiq setup that uses an IAM role (e.g., on EC2 or ECS).
You can find the configuration I used here:
Allow the process to run for an extended period, relying on auto-refreshable credentials from
Aws::InstanceProfileCredentials
.Observe that after some time, the following error may appear:
Aws::CloudWatch::Errors::ExpiredToken: The security token included in the request is expired
Possible Solution
No response
Additional Information/Context
Here are some useful links for more context regarding my problem:
sidekiq-cloudwatchmetrics
repo Reoccurring Aws::CloudWatch::Errors::ExpiredToken errors sj26/sidekiq-cloudwatchmetrics#48sidekiq-cloudwatchmetrics
repo https://github.com/sj26/sidekiq-cloudwatchmetrics/pull/49/filesGem name ('aws-sdk', 'aws-sdk-resources' or service gems like 'aws-sdk-s3') and its version
aws-sdk-cloudwatch
Environment details (Version of Ruby, OS environment)
Ruby -
3.2.5
, OS -Debian GNU/Linux 12 (bookworm)
, Docker image -ruby:3.2.5-slim
, sidekiq-cloudwatchmetrics -2.6.0
The text was updated successfully, but these errors were encountered: