-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4858: IP/CIDR Validation Improvements #4899
base: master
Are you sure you want to change the base?
Conversation
danwinship
commented
Oct 2, 2024
- One-line PR description: Initial draft of KEP-4858 IP/CIDR Validation Improvements
- Issue link: IP/CIDR validation improvements #4858
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
<<[UNRESOLVED non-special-ip ]>> | ||
|
||
- MAYBE revisit the use of `ValidateNonSpecialIP`: certain kinds of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most of them come from CVEs, I mean, that function is used to avoid things like the apiserver proxy to redirect traffic to the node where the apiserver is running ... things like this
|
||
##### e2e tests | ||
|
||
No new tests, and we will remove `test/e2e/network/funny_ips.go`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if I use cilium or a Service implementation that is not kube-proxy, how do I test that I don't break those legacy Services?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need your own unit tests I guess?
I don't really like losing the e2e test, but I wasn't sure what else we could do unless we added some sort of terrible "don't validate the service fully if it has this annotation" hack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add one of this Feature tags things, so we skip it in our CI but keep available for others projects to consume, having the code there will not hurt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the problem is that in order to test what the service proxy does with invalid IPs, you need to feed it a Service/EndpointSlice that contains invalid IPs. But the apiserver won't let us create one now...
However, `Endpoints` and `EndpointSlice` will be treated differently, | ||
because (a) they are large enough that doing additional validation on | ||
them could be noticeably slow, (b) in most cases, they are generated | ||
by controllers that only write out valid IPs anyway, (c) in the case | ||
of `EndpointSlice`, if we were going to allow moving bad IPs around | ||
within a slice without revalidation, then we ought to allow moving | ||
them between related slices too, which we can't really do. | ||
|
||
So for `Endpoints` and `EndpointSlice`, the rule will be that invalid | ||
IPs are only allowed to remain unfixed if the update leaves the entire | ||
`.subsets` / `.addresses` unchanged. So you can edit the labels or | ||
annotations without fixing invalid endpoint IPs, but you can't add a | ||
new IP while leaving existing invalid IPs in place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EndpointSlices list of Endpoints is atomic
// endpoints is a list of unique endpoints in this slice. Each slice may
// include a maximum of 1000 endpoints.
// +listType=atomic
Endpoints []Endpoint `json:"endpoints" protobuf:"bytes,2,rep,name=endpoints"`
and Endpoints subsets too
// The set of all endpoints is the union of all subsets. Addresses are placed into
// subsets according to the IPs they share. A single address with multiple ports,
// some of which are ready and some of which are not (because they come from
// different containers) will result in the address being displayed in different
// subsets for the different ports. No address will appear in both Addresses and
// NotReadyAddresses in the same subset.
// Sets of addresses and ports that comprise a service.
// +optional
// +listType=atomic
Subsets []EndpointSubset `json:"subsets,omitempty" protobuf:"bytes,2,rep,name=subsets"`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not about the mechanics of the update, it's about the result. The validation code can't tell whether you did an Update or a Patch anyway.
eg, if you have a service with externalIPs: ["001.002.003.004"]
, and your controller tries to reconcile it to ["001.002.003.004", "5.6.7.8"]
, then the proposed validation rule will allow it. Regardless of whether you actually did it with a Patch or not, the change is semantically interpreted as as "the controller appended a new value and the old value didn't change", and the validation allows that for externalIPs
.
But if you have an EndpointSlice with endpoints: [ { addresses: ["001.002.003.004"] } ]
and your controller tries to reconcile it to endpoints: [ { addresses: ["001.002.003.004"] }, { addresses: ["5.6.7.8"] } ]
, the validation will not allow it, regardless of whether you did an Update or a Patch, because the proposed validation rule for Endpoints/EndpointSlice doesn't have that "and the old value didn't change" exception, for efficiency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks simpler to avoid these problems, if the update of the field has to be atomic the it must pass the new validation, can;t see why we need to complicate validation to support these cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question isn't whether the update is an atomic replacement or an incremental patch from the apiserver's perspective, it's whether the update is an atomic replacement or an incremental patch from the client's perspective.
That is, if a Service has externalIPs: ["001.002.003.004"]
, and a client does:
svc, _ := client.CoreV1().Services(ns).Get(ctx, name, getOptions)
svc.Spec.ExternalIPs = append(svc.Spec.ExternalIPs, "5.6.7.8")
_, err := client.CoreV1().Services(ns).Update(ctx, update, updateOptions)
then does that validate or not?
(And the KEP currently says, for the case of everything except Endpoints and EndpointSlice, yes, you can do that, and it will still validate. But for Endpoints and EndpointSlice you can't, because it would be annoying to make it work and it's probably not needed.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And maybe we don't want to do it that way, I'm just trying to clarify what the issue is.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I'm trying to articulate is that if we apply a single rule "every field that is modified is validated with the new parsers", we simplify a lot the problem.
And we rollout this in different releases with a feature gate, so we can get per example two or three releases enabled by default in beta, so we can be completely sure at GA that removing the old parsers is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So first off, that's exactly what I'm proposing for Endpoints and EndpointSlice; if endpoints.subsets
or endpointslice.endpoints
changes at all, it must be fully valid according to the new rules.
But what about array-valued fields which are explicitly not listType=atomic
? pod.spec.podIPs
is type map
, and node.spec.podCIDRs
is type set
. So should you be allowed to modify one element of those arrays without touching the other (possibly-invalid) elements?
so we can be completely sure at GA that removing the old parsers is safe
("the old parsers" (the sloppy parsers in netutils) don't get removed as part of this KEP; they just stop being the sole definition of "valid".)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if endpoints.subsets or endpointslice.endpoints changes at all, it must be fully valid according to the new rules.
yeah, just applying the same logic everywhere, I'm specially worried about Services.
pod.status.podIPs
are those not normalized somewhere across the kubelet path? besides they are not inmutable, are only used by the kubelet and never updated by the kubelet, so this does not look a risk
node.spec.podCIDRs
this is inmutable, can not be update
("the old parsers" (the sloppy parsers in netutils) don't get removed as part of this KEP; they just stop being the sole definition of "valid".)
yeah, my bad, is just "put those parser to oblivion"
The new validation should increase the security of Kubernetes by | ||
making it impossible to have ambiguously-interpretable IPs/CIDRs in | ||
the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already evaluated this and determined the CVE with golang parsers does not impact kubernetes, the main goal is to get rid of the forked parsers IMHO kubernetes/kubernetes#108074
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is incorrect. The CVE is that if you have IP/CIDR values that are interpreted differently by different components, then you can potentially leverage that difference to bypass security/sanity checks.
Eg, something might validate that all pods are using a DNS resolver inside 10.0.0.0/8
, for security reasons, and then someone specifies 010.010.010.010
as their DNS resolver, which ParseIPSloppy
interpretes as being 10.10.10.10
, which is within 10.0.0.0/8
and thus allowed, but then if that string gets read by a non-golang component, it will interpret it as 8.8.8.8
.
Likewise, if you have an admission controller that ensures that new Services don't try to reuse the externalIPs
already used by existing services, you might be able to trick it by writing an IPv4 IP in IPv4-mapped IPv6 format, causing it to get compared against the already-used IPv6 addresses instead of the already-used IPv4 addresses, allowing you to sneak a duplicate IPv4 address past the controller.
What we did do in kubernetes was try to make sure that in all cases where we pass an IP/CIDR to an external API, we round-trip it from string to net.IP
to string again, to ensure that we aren't passing an ambiguous value to the external API. But that doesn't solve the problem for third-party service proxies, as you noted above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I was at redhat and we met with the security team the conclusion was that CVE-2021-29923 didn't apply to Kubernetes, and AFAIK no place indicates that kubernetes is impacted by that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then they didn't understand the CVE. It absolutely applies to Kubernetes and always has. (It's possible that at that time there was no code shipped in the OCP release image which could be exploited in this way, but the vulnerability potentially applies to every piece of software that looks at Kubernetes API objects that contain IP addresses or CIDRs.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only if those pieces disagree on the interpretation, you need to find a piece that disagree and then that is the project that is impacted ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the network implementations that I surveyed use golang are using the new parsers, that means that neither kube-proxy, cilium, antrea, ... are not impacted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I was at redhat and we met with the security team the conclusion was that CVE-2021-29923 didn't apply
At that time, the missing normalization in endpointslicecache.go meant that it was possible to use IPs with leading 0s to bypass OpenShift's Endpoints admission controller. (The admission controller would prevent you from manually creating Endpoints with IPs inside the pod network, eg 10.128.0.0/14
, but if you created an endpoint with the IP 012.128.2.5
, the admission controller would use ParseIPSloppy
to parse that and then say "12.128.2.5
is not inside 10.128.0.0/14
" and allow it, and then kube-proxy would create an iptables rule saying "... -j DNAT --to-destination 012.128.2.5
", which iptables would parse as 10.128.2.5
, allowing you to do exactly the thing the admission controller had been trying to prevent.) Boom.
(Also FTR, kube-network-policies passes PodIPs
values directly to nft
without re-validating or normalizing them. This is probably not an exploitable security hole, but it means it will create incorrect nftables rules if a CNI plugin outputs non-normalized IPs.)
the main goal is to get rid of the forked parsers IMHO
It's true that if we remove the land mine, then we won't have to maintain the "Please do not step on the land mine" sign any more. But I think a better reason to remove the land mine is because someone might step on it and blow up. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the network implementations that I surveyed use golang are using the new parsers, that means that neither kube-proxy, cilium, antrea, ... are not impacted
Assuming by "new parsers" you mean the golang 1.18+ parsers, kube-proxy does not...
And the fact that cilium and antrea use golang 1.18+ net.ParseIP
while API fields are validated against golang 1.17 net.ParseIP
means that you can put IPs into Pods/Services/EndpointSlices/NetworkPolicies that cilium and antrea will be unable to parse. This opens up a separate (albeit much smaller) set of possible exploitable situations. Possibly some DoS attacks where you can crash the network plugin due to an "impossible" nil
result. Or perhaps, you can have NetworkPolicies in the cluster that look like they have a certain effect, but which don't actually have that effect in cilium/antrea because it can't parse the CIDR blocks in them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are running in circles, I summarized here #4899 (comment)
I think we can simplify and just make something similar to what was planned in kubernetes/kubernetes#108074
WDYT @thockin , you were suggesting in kubernetes/kubernetes#108074 the relaxing on the inmutable fields for "fixing" the objects, but is that worth the effort and the risks? My thinking, is that complicating validation can impact ALL users, and most surely impact and make code more complex to maintain. If we scope the work in a way that only the users with weird IPs can be impacted, that I still think it has to be close to zero, this is much simpler |
Note that this KEP does not currently propose ever dropping backward compatibility entirely. There had been some discussion in the earlier issues about eventually force-fixing any remaining invalid data in etcd (maybe fix-on-Get), but this KEP does not currently propose doing that. FWIW, if we really care,
(and we only need to try that on strings that fail "normal" parsing) |
Let me be more specific because I'm not able to express myself and I'm causing confusion, I'm +1 on this, @danwinship the only point I disagree is in kubernetes/kubernetes@94ddc45 I don't think we need validation to be smart, there is no argument on my side that theoretically both the security and the persisted data problem exists, but I'm really being pragmatical here and I doubt that is realistic and putting in perspective, if we need to fail on one side, better to avoid introducing risks to all users than to this small population that already has "funny ips" I'm +1 on kubernetes/kubernetes#122550 if we remove 94ddc457d6a9e20d037f2163c38a28c8cf3e7e5a and we roll it out with a feature gate in Deprecated state so we can rollback if there are problems , that is what I'm trying to express in my comments |
I'm fine removing the "not quite immutable" change if that's what we want. No strong opinions on any of the UNRESOLVEDs other than the feature gate? |
While filling out the PRR, it becomes clear that adding warnings for a few releases before changing validation is probably a good idea. This would not be feature-gated, so it doesn't require any KEP buy-in so I'm just going to do that for 1.32 and then we can get the KEP merged for 1.33, already in progress. |
efc6636
to
65c78ed
Compare
65c78ed
to
967a7f7
Compare