Replies: 4 comments 2 replies
-
The difficulty handling this with a stateless controller feels a bit similar to the Route -> Gateway attachment with invalid backends. I think I would lean towards the admission webhook as it could be a fairly simple check within the bounds of a single resource (don't allow application of a Gateway if listener critical fields aren't unique). Unlike deploying routes before backends exist, I can't really think of a compelling case to allow conflicting listeners, and I think this could be evaluated statelessly rather than dependent on comparing against a current configuration. I had initially considered if the timestamp precedence for Route matching could be a solution, but I don't think that's an option given that all listeners are specified together within the Gateway (and so would be modified simultaneously) rather than as separate objects. |
Beta Was this translation helpful? Give feedback.
-
I agree with the second broad strategy - in general if you have valid configuration which has yielded some state, and then you change that to objectively invalid configuration, it seems sensible not to change that state if at all possible until the config is corrected. I also think it is sensible to check it in an admission webhook - if it can be statically caught (which it can via an admission controller) that again seems the most sensible strategy. Having said that, is it acceptable to assume that the admission webhook is always running / configured? Would it be a good idea to define some controller error handling in the event that the a cluster admin has decided not to install the webhook or has otherwise borked it, or is it reasonable to state that such errors are too low level and thus out of scope to prepare for in this spec? Finally, and this suggestion might be a little bit too far for this discussion, but I wonder if it might be a good idea to put at least the host, port in the listener status anyway - is it conceivable that an implementation may perform some sort of normalisation on a hostname, or perhaps add extra canonical hosts (eg a listener without a hostname on it could list the hostnames of any HTTPRoutes attached to it?). |
Beta Was this translation helpful? Give feedback.
-
True--we can't assume the admission webhook is functioning properly, and controllers will need to handle the case where it isn't and a conflicted Listener sneaks into configuration. If we do have the admission rule in place though and can expect that it's normally running, I am more comfortable with the controller-side implementation marking all conflicted Listeners as such and taking them offline in the absence of information it can use to determine which Listeners were previously Ready--the worst case (a Listener in use gets taken offline) shouldn't happen under normal circumstances, and the abnormal circumstances are possible to detect (whichever Deployment handles admission requests being unready isn't perfect, but it's close) and alert on. |
Beta Was this translation helpful? Give feedback.
-
The conflicts guidance might help choose between different ways to handle conflicts. |
Beta Was this translation helpful? Give feedback.
-
Background
https://gateway-api.sigs.k8s.io/v1alpha2/references/spec/#gateway.networking.k8s.io/v1alpha2.GatewaySpec states
where "compatible" is, briefly, using a distinct Hostname if the protocol provides a means for per-Hostname routing, e.g. the
Host
header in HTTP. Two UDP Listeners sharing the same Port are not compatible because the Port is the only aspect of the connection that we can use to route UDP packets to the appropriate backend.Problem
This is perhaps a bit ambiguous about how Gateways should handle conflicts, however. I'm aware of two broad strategies:
From some discussion with the community, I know of one implementation that does keep state and preserves existing not-Conflicted controllers and one that marks all Conflicted and takes them offline. I originally intended to follow the second approach for our implementation, but changed to the second for our initial implementation following discussion with my team. Should there be standard behavior in the spec or a soft recommendation to use one or the other, or should this be entirely up to the individual implementations?
Analysis
IMO while the first is simpler, the second is desirable as a footgun prevention mechanism: if your Gateway has a Listener that has become Ready and likely is actively serving traffic, you want to avoid accidentally causing a service outage by adding a conflicting Listener. Careful review can avoid this, but I prefer to err on the side of believing that admins are fallible humans and will make mistakes, especially in larger organizations where multiple admins can modify a Gateway.
My initial approach to preferring existing not-Conflicted Listeners was to rely on their previous status: if a Listener is already not Conflicted, assume it holds the Port and Hostname it says it does and set other Listeners requesting the same to Conflicted. This doesn't quite work, however, since Port, Hostname, and Protocol are not immutable: changing these means you can no longer rely on the old status and must treat it as a new Listener--the change may have brought it into conflict with some other existing Listener.
Unfortunately, tracking Listener changes requires keeping internal controller state about its history, whereas we want to keep controllers stateless and able to perform their duties based on the current state of a resource only. I can think of a few approaches that might remove the need to track state:
Beta Was this translation helpful? Give feedback.
All reactions