-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High availability loss caused by simultaneous HAProxy configuration changes #564
Comments
Hi @zimnx , the controllers are all indepedent and have no means to coordinate. It's intended that they react simultaneously but anyway keep in mind that a transaction is actually a time window for change set to 5 secs s by default. I guess that all the instances are not really started at the same millisecond so there will always be a period where one has already started the new configuration while an other one has still the same configuration. Maybe you can tweak the |
@zimnx, since reloads are seamless, you could use different |
It seem like all the tips are meant at just making it less likely to happen but how can users make sure it's always HA?
I assume they could sync using the kube api, that all of them are connected to. It kind of seems like the issue relates to doing a supervisor that restart the app within the container, while the other HA apps let Kubernetes rollout the change through a workload controller that respects HA and PodDisruptionBudgets so they don't have to do it on their own.
@fabianonunes How would you do this in production? Given you had 1 Deployment with 5 replicas wouldn't that mean 5 Deployments with 1 replica to have separate configs?
What happens if it is disabled? Will the old processes pile up and OOM at some point? If you use |
Hi @zimnx ,
That would not solve everything, but would help, and allow to have only 1 deployment with several replicas but different values. How would you like this option ? |
This would be a workaround, which brings undeterministic behavior with a chance that all nodes will restart at once even when this parameter is set. In a long run, chance of hitting it might be significant. I think we could live with it temporarily, but issue still persists. Config shouldn't be rolled out simultaneously on all nodes. |
hi @zimnx
If I'm not mistaken deterministic behavior that we have with ingress is not good, but at the same time proposed non deterministic one is also not good. But there is no third option, either you are or you are not deterministic. in general if you have 5 replicas then they are exactly that, replica, duplicate of each other that behaves and acts on the changes in same manner.
yes, but the answer is leading me to inform you about our different products on Enterprise side, specifically called Fusion that can solve that issue. Following this conversation and proposed solution it seems that there is a room to improvement, but if we bring randomness to the table, more people might be confused, so I'm going to for now put this on hold, in future with adding potentially new features to HAProxy we might have different options even. |
When a HAProxy configuration change is required and there are active connections, the HAProxy waits until all connections are closed before initiating a restart with the hard-stop-after timeout. This behavior becomes problematic in environments where multiple HAProxy ingress controllers are watching the same Kubernetes resources, such as Services and Ingresses. When an update of these is observed, all instances of the HAProxy ingress controller detect the configuration changes simultaneously.
This leads to a situation where the hard-stop-after timeout is effectively triggered by each instance. Consequently, all active connections are closed simultaneously, causing significant availability issues.
Steps to reproduce:
Expected behavior:
The HAProxy ingress controllers should coordinate the configuration rollout in a way that prevents simultaneous restarts by multiple instances. This coordination should ensure a smooth transition without abruptly closing all active connections at once.
Actual behavior:
The current behavior leads to simultaneous HAProxy restarts due to the detection of configuration changes by all HAProxy's at once. This results in the hard-stop-after timeout being reached by each instance, causing all active connections to be closed concurrently.
The text was updated successfully, but these errors were encountered: