KEP-4858: IP/CIDR Validation Improvements #4899

danwinship · 2024-10-02T22:17:28Z

One-line PR description: Initial draft of KEP-4858 IP/CIDR Validation Improvements

Issue link: IP/CIDR validation improvements #4858

k8s-ci-robot · 2024-10-02T22:17:34Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-network/OWNERS~~ [danwinship]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

keps/sig-network/4858-ip-cidr-validation/README.md

aojea · 2024-10-07T09:41:27Z

keps/sig-network/4858-ip-cidr-validation/README.md

+
+<<[UNRESOLVED non-special-ip ]>>
+
+- MAYBE revisit the use of `ValidateNonSpecialIP`: certain kinds of


most of them come from CVEs, I mean, that function is used to avoid things like the apiserver proxy to redirect traffic to the node where the apiserver is running ... things like this

keps/sig-network/4858-ip-cidr-validation/README.md

aojea · 2024-10-07T09:53:07Z

keps/sig-network/4858-ip-cidr-validation/README.md

+
+##### e2e tests
+
+No new tests, and we will remove `test/e2e/network/funny_ips.go`,


but if I use cilium or a Service implementation that is not kube-proxy, how do I test that I don't break those legacy Services?

You'll need your own unit tests I guess?

I don't really like losing the e2e test, but I wasn't sure what else we could do unless we added some sort of terrible "don't validate the service fully if it has this annotation" hack.

we can add one of this Feature tags things, so we skip it in our CI but keep available for others projects to consume, having the code there will not hurt

No, the problem is that in order to test what the service proxy does with invalid IPs, you need to feed it a Service/EndpointSlice that contains invalid IPs. But the apiserver won't let us create one now...

aojea · 2024-10-07T12:02:28Z

keps/sig-network/4858-ip-cidr-validation/README.md

+However, `Endpoints` and `EndpointSlice` will be treated differently,
+because (a) they are large enough that doing additional validation on
+them could be noticeably slow, (b) in most cases, they are generated
+by controllers that only write out valid IPs anyway, (c) in the case
+of `EndpointSlice`, if we were going to allow moving bad IPs around
+within a slice without revalidation, then we ought to allow moving
+them between related slices too, which we can't really do.
+
+So for `Endpoints` and `EndpointSlice`, the rule will be that invalid
+IPs are only allowed to remain unfixed if the update leaves the entire
+`.subsets` / `.addresses` unchanged. So you can edit the labels or
+annotations without fixing invalid endpoint IPs, but you can't add a
+new IP while leaving existing invalid IPs in place.


EndpointSlices list of Endpoints is atomic

// endpoints is a list of unique endpoints in this slice. Each slice may // include a maximum of 1000 endpoints. // +listType=atomic Endpoints []Endpoint `json:"endpoints" protobuf:"bytes,2,rep,name=endpoints"`

and Endpoints subsets too

// The set of all endpoints is the union of all subsets. Addresses are placed into // subsets according to the IPs they share. A single address with multiple ports, // some of which are ready and some of which are not (because they come from // different containers) will result in the address being displayed in different // subsets for the different ports. No address will appear in both Addresses and // NotReadyAddresses in the same subset. // Sets of addresses and ports that comprise a service. // +optional // +listType=atomic Subsets []EndpointSubset `json:"subsets,omitempty" protobuf:"bytes,2,rep,name=subsets"`

It's not about the mechanics of the update, it's about the result. The validation code can't tell whether you did an Update or a Patch anyway.

eg, if you have a service with externalIPs: ["001.002.003.004"], and your controller tries to reconcile it to ["001.002.003.004", "5.6.7.8"], then the proposed validation rule will allow it. Regardless of whether you actually did it with a Patch or not, the change is semantically interpreted as as "the controller appended a new value and the old value didn't change", and the validation allows that for externalIPs.

But if you have an EndpointSlice with endpoints: [ { addresses: ["001.002.003.004"] } ] and your controller tries to reconcile it to endpoints: [ { addresses: ["001.002.003.004"] }, { addresses: ["5.6.7.8"] } ], the validation will not allow it, regardless of whether you did an Update or a Patch, because the proposed validation rule for Endpoints/EndpointSlice doesn't have that "and the old value didn't change" exception, for efficiency.

it looks simpler to avoid these problems, if the update of the field has to be atomic the it must pass the new validation, can;t see why we need to complicate validation to support these cases

The question isn't whether the update is an atomic replacement or an incremental patch from the apiserver's perspective, it's whether the update is an atomic replacement or an incremental patch from the client's perspective.

That is, if a Service has externalIPs: ["001.002.003.004"], and a client does:

svc, _ := client.CoreV1().Services(ns).Get(ctx, name, getOptions) svc.Spec.ExternalIPs = append(svc.Spec.ExternalIPs, "5.6.7.8") _, err := client.CoreV1().Services(ns).Update(ctx, update, updateOptions)

then does that validate or not?

(And the KEP currently says, for the case of everything except Endpoints and EndpointSlice, yes, you can do that, and it will still validate. But for Endpoints and EndpointSlice you can't, because it would be annoying to make it work and it's probably not needed.)

(And maybe we don't want to do it that way, I'm just trying to clarify what the issue is.)

what I'm trying to articulate is that if we apply a single rule "every field that is modified is validated with the new parsers", we simplify a lot the problem.
And we rollout this in different releases with a feature gate, so we can get per example two or three releases enabled by default in beta, so we can be completely sure at GA that removing the old parsers is safe.

So first off, that's exactly what I'm proposing for Endpoints and EndpointSlice; if endpoints.subsets or endpointslice.endpoints changes at all, it must be fully valid according to the new rules.

But what about array-valued fields which are explicitly not listType=atomic? pod.spec.podIPs is type map, and node.spec.podCIDRs is type set. So should you be allowed to modify one element of those arrays without touching the other (possibly-invalid) elements?

so we can be completely sure at GA that removing the old parsers is safe

("the old parsers" (the sloppy parsers in netutils) don't get removed as part of this KEP; they just stop being the sole definition of "valid".)

if endpoints.subsets or endpointslice.endpoints changes at all, it must be fully valid according to the new rules.

yeah, just applying the same logic everywhere, I'm specially worried about Services.

pod.status.podIPs

are those not normalized somewhere across the kubelet path? besides they are not inmutable, are only used by the kubelet and never updated by the kubelet, so this does not look a risk

node.spec.podCIDRs

this is inmutable, can not be update

("the old parsers" (the sloppy parsers in netutils) don't get removed as part of this KEP; they just stop being the sole definition of "valid".)

yeah, my bad, is just "put those parser to oblivion"

keps/sig-network/4858-ip-cidr-validation/README.md

aojea · 2024-10-07T12:08:09Z

keps/sig-network/4858-ip-cidr-validation/README.md

+The new validation should increase the security of Kubernetes by
+making it impossible to have ambiguously-interpretable IPs/CIDRs in
+the future.


I think we already evaluated this and determined the CVE with golang parsers does not impact kubernetes, the main goal is to get rid of the forked parsers IMHO kubernetes/kubernetes#108074

That is incorrect. The CVE is that if you have IP/CIDR values that are interpreted differently by different components, then you can potentially leverage that difference to bypass security/sanity checks.

Eg, something might validate that all pods are using a DNS resolver inside 10.0.0.0/8, for security reasons, and then someone specifies 010.010.010.010 as their DNS resolver, which ParseIPSloppy interpretes as being 10.10.10.10, which is within 10.0.0.0/8 and thus allowed, but then if that string gets read by a non-golang component, it will interpret it as 8.8.8.8.

Likewise, if you have an admission controller that ensures that new Services don't try to reuse the externalIPs already used by existing services, you might be able to trick it by writing an IPv4 IP in IPv4-mapped IPv6 format, causing it to get compared against the already-used IPv6 addresses instead of the already-used IPv4 addresses, allowing you to sneak a duplicate IPv4 address past the controller.

What we did do in kubernetes was try to make sure that in all cases where we pass an IP/CIDR to an external API, we round-trip it from string to net.IP to string again, to ensure that we aren't passing an ambiguous value to the external API. But that doesn't solve the problem for third-party service proxies, as you noted above.

when I was at redhat and we met with the security team the conclusion was that CVE-2021-29923 didn't apply to Kubernetes, and AFAIK no place indicates that kubernetes is impacted by that

Then they didn't understand the CVE. It absolutely applies to Kubernetes and always has. (It's possible that at that time there was no code shipped in the OCP release image which could be exploited in this way, but the vulnerability potentially applies to every piece of software that looks at Kubernetes API objects that contain IP addresses or CIDRs.)

only if those pieces disagree on the interpretation, you need to find a piece that disagree and then that is the project that is impacted ...

all the network implementations that I surveyed use golang are using the new parsers, that means that neither kube-proxy, cilium, antrea, ... are not impacted

when I was at redhat and we met with the security team the conclusion was that CVE-2021-29923 didn't apply

At that time, the missing normalization in endpointslicecache.go meant that it was possible to use IPs with leading 0s to bypass OpenShift's Endpoints admission controller. (The admission controller would prevent you from manually creating Endpoints with IPs inside the pod network, eg 10.128.0.0/14, but if you created an endpoint with the IP 012.128.2.5, the admission controller would use ParseIPSloppy to parse that and then say "12.128.2.5 is not inside 10.128.0.0/14" and allow it, and then kube-proxy would create an iptables rule saying "... -j DNAT --to-destination 012.128.2.5", which iptables would parse as 10.128.2.5, allowing you to do exactly the thing the admission controller had been trying to prevent.) Boom.

(Also FTR, kube-network-policies passes PodIPs values directly to nft without re-validating or normalizing them. This is probably not an exploitable security hole, but it means it will create incorrect nftables rules if a CNI plugin outputs non-normalized IPs.)

the main goal is to get rid of the forked parsers IMHO

It's true that if we remove the land mine, then we won't have to maintain the "Please do not step on the land mine" sign any more. But I think a better reason to remove the land mine is because someone might step on it and blow up. 🙂

all the network implementations that I surveyed use golang are using the new parsers, that means that neither kube-proxy, cilium, antrea, ... are not impacted

Assuming by "new parsers" you mean the golang 1.18+ parsers, kube-proxy does not...

And the fact that cilium and antrea use golang 1.18+ net.ParseIP while API fields are validated against golang 1.17 net.ParseIP means that you can put IPs into Pods/Services/EndpointSlices/NetworkPolicies that cilium and antrea will be unable to parse. This opens up a separate (albeit much smaller) set of possible exploitable situations. Possibly some DoS attacks where you can crash the network plugin due to an "impossible" nil result. Or perhaps, you can have NetworkPolicies in the cluster that look like they have a certain effect, but which don't actually have that effect in cilium/antrea because it can't parse the CIDR blocks in them.

we are running in circles, I summarized here #4899 (comment)

aojea · 2024-10-07T12:13:16Z

I think we can simplify and just make something similar to what was planned in kubernetes/kubernetes#108074

Feturegate: RachetingValidationIPCIDRs with alpha, beta and GA to avoid surprises and allow rollbacks
do just allows updates to resources that fail validation to succeed if the validation errors were on unchanged keypaths.
when GA we have the old parsers only for racheting validation

WDYT @thockin , you were suggesting in kubernetes/kubernetes#108074 the relaxing on the inmutable fields for "fixing" the objects, but is that worth the effort and the risks?

My thinking, is that complicating validation can impact ALL users, and most surely impact and make code more complex to maintain. If we scope the work in a way that only the users with weird IPs can be impacted, that I still think it has to be close to zero, this is much simpler

danwinship · 2024-10-08T22:29:00Z

the main goal is to get rid of the forked parsers IMHO

Note that this KEP does not currently propose ever dropping backward compatibility entirely. There had been some discussion in the earlier issues about eventually force-fixing any remaining invalid data in etcd (maybe fix-on-Get), but this KEP does not currently propose doing that.

FWIW, if we really care, ParseIPSloppy is equivalent to something like

ipStr = regexp.MustCompile(`(^|\.)0+(\d+)`).ReplaceAllString(ipStr, "$1$2")
return net.ParseIP(ipStr)

(and we only need to try that on strings that fail "normal" parsing)

aojea · 2024-10-09T06:58:18Z

Let me be more specific because I'm not able to express myself and I'm causing confusion, I'm +1 on this, @danwinship the only point I disagree is in kubernetes/kubernetes@94ddc45

I don't think we need validation to be smart, there is no argument on my side that theoretically both the security and the persisted data problem exists, but I'm really being pragmatical here and I doubt that is realistic and putting in perspective, if we need to fail on one side, better to avoid introducing risks to all users than to this small population that already has "funny ips"

I'm +1 on kubernetes/kubernetes#122550 if we remove 94ddc457d6a9e20d037f2163c38a28c8cf3e7e5a and we roll it out with a feature gate in Deprecated state so we can rollback if there are problems , that is what I'm trying to express in my comments

danwinship · 2024-10-09T12:53:47Z

I'm +1 on kubernetes/kubernetes#122550 if we remove 94ddc457d6a9e20d037f2163c38a28c8cf3e7e5a and we roll it out with a feature gate

I'm fine removing the "not quite immutable" change if that's what we want.
I guess even if we remove it from the KEP/PR now we could change our minds and add it later.

No strong opinions on any of the UNRESOLVEDs other than the feature gate?

danwinship · 2024-10-10T18:03:53Z

While filling out the PRR, it becomes clear that adding warnings for a few releases before changing validation is probably a good idea. This would not be feature-gated, so it doesn't require any KEP buy-in so I'm just going to do that for 1.32 and then we can get the KEP merged for 1.33, already in progress.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 2, 2024

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Oct 2, 2024

k8s-ci-robot requested review from aojea and thockin October 2, 2024 22:17

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 2, 2024

danwinship mentioned this pull request Oct 2, 2024

IP/CIDR validation improvements #4858

Open

4 tasks