-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SNI] Upgrade of K8S nodes hosting haproxy and scylla-operator causes too long connectivity failures #1341
Comments
Faced too long absence of the SNI connectivity also in one more CI job during the Issue descriptionhaproxy pod-1:
haproxy pod-2:
SCT.log:
Installation detailsKernel Version: 5.10.184-175.749.amzn2.x86_64 Operator Image: scylladb/scylla-operator:1.10.0-rc.0 Scylla Nodes used in this run: OS / Image: `` (k8s-eks: undefined_region) Test: Logs and commands
Logs:
|
Faced too long absence of the SNI connectivity also in one more CI job during the Issue descriptionhaproxy pod-1:
haproxy pod-2:
SCT.log:
Installation detailsKernel Version: 5.10.184-175.749.amzn2.x86_64 Operator Image: scylladb/scylla-operator:1.10.0-rc.0 Scylla Nodes used in this run: OS / Image: `` (k8s-eks: undefined_region) Test: Logs and commands
Logs:
|
With the scylla-operator-1.10 we started getting HA problems [1] with the 'haproxy' service. So, fix it by using the latest available ingress controller version for haproxy. [1] scylladb/scylla-operator#1341
With the scylla-operator-1.10 we started getting HA problems [1] with the 'haproxy' service. So, fix it by using the latest available ingress controller version for haproxy. [1] scylladb/scylla-operator#1341
With the scylla-operator-1.10 we started getting HA problems [1] with the 'haproxy' service. So, fix it by using the latest available ingress controller version for haproxy. [1] scylladb/scylla-operator#1341
Reported an issue in haproxy ingress controller: haproxytech/kubernetes-ingress#564 |
The Scylla Operator project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
/lifecycle stale |
The Scylla Operator project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
/lifecycle rotten |
The Scylla Operator project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
/close not-planned |
@scylla-operator-bot[bot]: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Issue description
We have the test for upgrading K8S platform with following steps:
haproxy
andscylla-operator
pods. Each service has 2 pods and provisioner of different nodesSo, during the step
6
where we upgrade theauxiliary
node pool which hosts haproxy pods our loaders lose connectivity for long time, enough to fail the load.Notes
scylla-operator v1.9.0
and everything else the same. Proof: Argus, CIImpact
Loss of the network connectivity to Scylla pods using the SNI/haproxy for significant amount of time
How frequently does it reproduce?
100% using scylla-operator
1.10.0-rc.0
Installation details
Kernel Version: 5.10.186-179.751.amzn2.x86_64
Scylla version (or git commit hash):
2023.1.0~rc8-20230731.b6f7c5a6910c
with build-idf6e718548e76ccf3564ed2387b6582ba8d37793c
Operator Image: scylladb/scylla-operator:1.10.0-rc.0
Operator Helm Version: 1.10.0-rc.0
Operator Helm Repository: https://storage.googleapis.com/scylla-operator-charts/latest
Cluster size: 3 pods (i4i.4xlarge)
OS / Image: `` (k8s-eks:
eu-north-1
)Test:
upgrade-platform-k8s-eks
Test id:
379e7cd3-3b74-4f39-bb7f-b561a8251126
Test name:
scylla-operator/operator-1.10/upgrade/upgrade-platform-k8s-eks
Test config file(s):
Logs and commands
$ hydra investigate show-monitor 379e7cd3-3b74-4f39-bb7f-b561a8251126
$ hydra investigate show-logs 379e7cd3-3b74-4f39-bb7f-b561a8251126
Logs:
Jenkins job URL
Argus
The text was updated successfully, but these errors were encountered: