-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry on etcd too many requests error #132
Conversation
# tl;dr A fix for a rare error: ``` Error: finalizing Caused by: 0: commiting etcd cache to actual etcd 1: grpc request error: status: Unknown, message: "etcdserver: too many requests", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} } ``` # Background When committing our in-memory etcd representation to actual etcd, we send all delete requests concurrently (we have many). # Issue Sometimes this leads to us receiving an error from etcd which says "etcdserver: too many requests". Recert treated this error as a hard error and as a result it exits. # Solution Compare the error string to this exact phrasing (as there doesn't seem to be a more robust error code we can check, the code just says `Unknown`), and if we encounter it, just repeat the request again. Eventually hopefully all requests should go through.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: omertuc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
1 similar comment
/retest |
/test baremetalds-sno-recert-cluster-rename |
/lgtm |
/retest |
2 similar comments
/retest |
/retest |
I'm starting to think it's actually broken |
/retest |
Checking clean CI in #134 |
/retest |
2 similar comments
/retest |
/retest |
/test baremetalds-sno-recert-cluster-rename |
/lgtm |
/hold not sure if works |
/retest |
/test e2e-aws-ovn-single-node-recert-serial |
1 similar comment
/test e2e-aws-ovn-single-node-recert-serial |
/unhold |
/test e2e-aws-ovn-single-node-recert-serial |
/override ci/prow/e2e-aws-ovn-single-node-recert-parallel |
@mresvanis: Overrode contexts on behalf of mresvanis: ci/prow/e2e-aws-ovn-single-node-recert-parallel In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
3b58208
into
rh-ecosystem-edge:main
Solves MGMT-17651
tl;dr
A fix for a rare error:
Background
When committing our in-memory etcd representation to actual etcd, we send all delete requests concurrently (we have many).
Issue
Sometimes this leads to us receiving an error from etcd which says "etcdserver: too many requests". Recert treated this error as a hard error and as a result it exits.
Solution
Compare the error string to this exact phrasing (as there doesn't seem to be a more robust error code we can check, the code just says
Unknown
), and if we encounter it, just repeat the request again. Eventually hopefully all requests should go through.