-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle BatchWriteSpans retries when a retryable error occurs #523
Comments
Triaged |
@aabmass @dashpole Would this be related to errors I am seeing in Error Reporter like:
Happened on 5 April 2023 but nothing Google Cloud status page for Google Trace. Timing around 21:07, 21:38, and 16:08-16:14 |
Retries could have possibly helped, but its hard to say. We would eventually drop data even with retries if the backend wasn't responding for long enough. |
Is this related to this issue: Please answer the following three questions:
|
It looks like the comment in that issue is unrelated to that issue.
I don't think that would help. If you're seeing this with Cloud Run, it's probably related to CPU throttling which we have seen a few times. You can try the "CPU is always allocated" option which is described in that page (see discussion on #62). If you want to leave CPU throttling as is, the retries could help get your data sent, but you're likely to still see some error logs. open-telemetry/opentelemetry-js#3740 (comment) explains how to silence the logs. |
I think our best bet for implementing retries "generate a client library and migrate to use it" as described in this issue. However, that depends on another Google team first generating the client library and then this exporter pulling it in. I don't have a ton of time to work on this unfortunately. |
Hey @aabmass, I'm the commenter from the OT issue linked above. Like I mention in my comment, we are seeing a bunch of these We aren't using Cloud Functions, but we are using Kubernetes Engine to run our deployments. I don't mind dropping the logs if they are getting retried, but is there a way for me to verify they are getting retried? |
They are not getting retried right now 🙁 This issue is for implementing retries |
Follow up to #181
Trace
Since we are using gRPC, I believe we will get automatic retries for well known retry-able statuses. However, the trace exporter still needs to handle retries for idempotent BatchWriteSpans() calls. Unfortunately, we are not generating a client library into the
googleapis/google-cloud-node
repo that we can use instead of raw gRPC. I think this is because we have a "handwritten" client lib which is actually the Cloud Trace agent (https://github.com/googleapis/cloud-trace-nodejs). If we had a client library, we would get automatic retries pulled from the service config.Possible fixes:
@grpc/grpc-js
actually supports retry config encoded in thegrpc.service_config
channel options. I have not tried this, some details here.The text was updated successfully, but these errors were encountered: