Should there be a standard behavior for handling conflicted Listeners? #1218

rainest · 2022-06-15T21:38:51Z

rainest
Jun 15, 2022

Background

https://gateway-api.sigs.k8s.io/v1alpha2/references/spec/#gateway.networking.k8s.io/v1alpha2.GatewaySpec states

If this field specifies multiple Listeners that have the same Port value but are not compatible, the implementation must raise a “Conflicted” condition in the Listener status.

where "compatible" is, briefly, using a distinct Hostname if the protocol provides a means for per-Hostname routing, e.g. the Host header in HTTP. Two UDP Listeners sharing the same Port are not compatible because the Port is the only aspect of the connection that we can use to route UDP packets to the appropriate backend.

Problem

This is perhaps a bit ambiguous about how Gateways should handle conflicts, however. I'm aware of two broad strategies:

Once a conflicting Listener is added for some Port or Hostname, all Listeners on that Port or Port+Hostname combination are marked Conflicted and furthermore become not Ready. The Gateway will not route traffic for any of them.
The first Listener to grab a critical field wins. So long as the controller has not found and Ready-ed a Listener for a Port or Port+Hostname, it will mark a Listener not Conflicted and Ready barring other issues. After, any other Listener added for the same is marked Conflicted. The first Listener does route traffic; the others do not. This option may alternately mark the first Listener Conflicted but still Ready and continue to route traffic--the spec is arguably unambiguous about marking incompatible Listeners on the same Port Conflicted, but does not state whether this must also make them not Ready.

From some discussion with the community, I know of one implementation that does keep state and preserves existing not-Conflicted controllers and one that marks all Conflicted and takes them offline. I originally intended to follow the second approach for our implementation, but changed to the second for our initial implementation following discussion with my team. Should there be standard behavior in the spec or a soft recommendation to use one or the other, or should this be entirely up to the individual implementations?

Analysis

IMO while the first is simpler, the second is desirable as a footgun prevention mechanism: if your Gateway has a Listener that has become Ready and likely is actively serving traffic, you want to avoid accidentally causing a service outage by adding a conflicting Listener. Careful review can avoid this, but I prefer to err on the side of believing that admins are fallible humans and will make mistakes, especially in larger organizations where multiple admins can modify a Gateway.

My initial approach to preferring existing not-Conflicted Listeners was to rely on their previous status: if a Listener is already not Conflicted, assume it holds the Port and Hostname it says it does and set other Listeners requesting the same to Conflicted. This doesn't quite work, however, since Port, Hostname, and Protocol are not immutable: changing these means you can no longer rely on the old status and must treat it as a new Listener--the change may have brought it into conflict with some other existing Listener.

Unfortunately, tracking Listener changes requires keeping internal controller state about its history, whereas we want to keep controllers stateless and able to perform their duties based on the current state of a resource only. I can think of a few approaches that might remove the need to track state:

Make critical fields immutable, allowing controllers to assume that existing status is valid: you cannot change a Listener's Hostname, Port, or Protocol. You must delete your existing Listener and add a new one to do this. There was some concern in discussion over whether we can positively ID a particular Listener, but I believe the proposed 0.5.0 change to affirm that Name must be unique more or less indicates that we can use it as an identifier. This approach is, however, a bit annoying for users making edits, and is furthermore a late-breaking change that the community felt would be too significant to add after entering beta (which, due to timing, it would have to).
Add critical fields as optional ListenerStatus fields. Controllers that wish to preserve existing Listeners would be able to determine if the current Listener they're seeing has the same Hostname or similar as the same-Name Listener listed not-Conflicted in status, and invalidate that status if it does not. This approach does run the risk that other, potentially malicious actors could modify the status and cause unexpected behavior, though I'm unsure whether this would be a novel problem--I expect that they could cause the same unexpected behavior by modifying the Condition set or deleting the ListenerStatus entirely.
Enforce compatibility in an admission webhook and just tell users that the Listener they're trying to create will not work until they make some other change first. The compatibility rules are standardized and should not need to be implemented per implementation. This is also a significant change, however. We currently permit creating conflicted Listeners and there may be some valid use case for doing so. Implementations still have the option of adding their own validation webhooks if they wish to enforce additional constraints.

mikemorris · 2022-06-15T22:19:38Z

mikemorris
Jun 15, 2022

The difficulty handling this with a stateless controller feels a bit similar to the Route -> Gateway attachment with invalid backends.

I think I would lean towards the admission webhook as it could be a fairly simple check within the bounds of a single resource (don't allow application of a Gateway if listener critical fields aren't unique). Unlike deploying routes before backends exist, I can't really think of a compelling case to allow conflicting listeners, and I think this could be evaluated statelessly rather than dependent on comparing against a current configuration.

I had initially considered if the timestamp precedence for Route matching could be a solution, but I don't think that's an option given that all listeners are specified together within the Gateway (and so would be modified simultaneously) rather than as separate objects.

0 replies

EmilyShepherd · 2022-06-16T20:00:47Z

EmilyShepherd
Jun 16, 2022

I agree with the second broad strategy - in general if you have valid configuration which has yielded some state, and then you change that to objectively invalid configuration, it seems sensible not to change that state if at all possible until the config is corrected.

I also think it is sensible to check it in an admission webhook - if it can be statically caught (which it can via an admission controller) that again seems the most sensible strategy.

Having said that, is it acceptable to assume that the admission webhook is always running / configured? Would it be a good idea to define some controller error handling in the event that the a cluster admin has decided not to install the webhook or has otherwise borked it, or is it reasonable to state that such errors are too low level and thus out of scope to prepare for in this spec?

Finally, and this suggestion might be a little bit too far for this discussion, but I wonder if it might be a good idea to put at least the host, port in the listener status anyway - is it conceivable that an implementation may perform some sort of normalisation on a hostname, or perhaps add extra canonical hosts (eg a listener without a hostname on it could list the hostnames of any HTTPRoutes attached to it?).

0 replies

rainest · 2022-06-16T20:38:07Z

rainest
Jun 16, 2022
Author

True--we can't assume the admission webhook is functioning properly, and controllers will need to handle the case where it isn't and a conflicted Listener sneaks into configuration.

If we do have the admission rule in place though and can expect that it's normally running, I am more comfortable with the controller-side implementation marking all conflicted Listeners as such and taking them offline in the absence of information it can use to determine which Listeners were previously Ready--the worst case (a Listener in use gets taken offline) shouldn't happen under normal circumstances, and the abnormal circumstances are possible to detect (whichever Deployment handles admission requests being unready isn't perfect, but it's close) and alert on.

2 replies

EmilyShepherd Jun 16, 2022

Sounds fair. In the case that the admission webhook isn't working, a fullon "fail all conflicting listeners" might be the best approach anyway as, as you say, it would be a relatively abnormal, and potentially part of a larger, situation.

youngnick Jun 20, 2022
Maintainer

I also agree that the admission webhook approach is good, and we are mandating that the admission webhook be running for the implementation to be conformant, so I think that having the behavior be sub-optimal in the case that the webhook isn't running is fine.

The other thing to consider though is that implementations are free to merge Gateways into one data plane config, in which case conflict detection needs to be done across the whole set of Gateways. I think that the conflicts guidance that @jpeach mentioned is a little more useful here, because the creation timestamps can be different.

jpeach · 2022-06-17T03:57:21Z

jpeach
Jun 17, 2022

The conflicts guidance might help choose between different ways to handle conflicts.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should there be a standard behavior for handling conflicted Listeners? #1218

{{title}}

Replies: 4 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Should there be a standard behavior for handling conflicted Listeners? #1218

rainest Jun 15, 2022

Replies: 4 comments · 2 replies

mikemorris Jun 15, 2022

EmilyShepherd Jun 16, 2022

rainest Jun 16, 2022 Author

EmilyShepherd Jun 16, 2022

youngnick Jun 20, 2022 Maintainer

jpeach Jun 17, 2022

rainest
Jun 15, 2022

Replies: 4 comments 2 replies

mikemorris
Jun 15, 2022

EmilyShepherd
Jun 16, 2022

rainest
Jun 16, 2022
Author

youngnick Jun 20, 2022
Maintainer

jpeach
Jun 17, 2022