-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(DSR): setup DSR inside pod on local eps only #1651
Conversation
Only attempt to setup DSR inside containers for local endpoints. Setting up DSR inside the containers network namespace requires local pods / endpoints.
Thank you @aauren for the great work! I just tested with PR-1651 image in my environment and I can confirm DSR feature now works correctly! I also did full environment restart in order to be sure this change does fix DSR. |
@opipenbe - Thanks for testing! Just to be double-sure, you also tested that the service behind the DSR also still continued to work in different scenarios too right? (e.g. from a worker that didn't contain the endpoint, for outside the cluster, etc.)? |
@aauren , I tested DSR outside of the cluster since my service external IP is exposed from all worker nodes by BGP. I also didnt find anything suspicious in the kube-router logs. I dont know a good method to test DSR inside cluster. As I understand, it is by design that packets are dropped from cluster to external IPs with DSR feature? If It is somehow possible to test DSR inside the cluster, then let me know and I can test it also :) |
Hmm... In my cluster, the external IPs route within the cluster just fine DSR or no DSR. Inside the cluster, for non-service-local VIPs every node should have an IPVS definition for the service, so I would imagine that it should "just work". |
Thanks @aauren for clarification! I figured out why DSR packets are dropped in my environment - in the RHEL / Rocky Linux, multihoming is not enabled by default. DSR makes routing asymmetric, so rp_filter (default value 1) will drop DSR: https://access.redhat.com/solutions/53031 . So I can also confirm that DSR is working successfully inside cluster after configuring kernel parameter |
@opipenbe Thanks for taking the deep dive on this! I have added a commit to this, that sets rp_filter to 2 during the startup of kube-router similar to other sysctl values that we do sanity checks on. |
rp_filter on RedHat based OS's is often set to 1 instead of 2 which is more permissive and allows the outbound route for traffic to differ from the route of incoming traffic.
854e55a
to
e3c0f3b
Compare
Wouldn't setting it to 2 only if it is already 1 be better? kube-router sets this sysctl to 2 unconditionally (while typical kernel default is 0), which breaks our KubeSpan (siderolabs/talos#9814), while 0 is more permissive than 2 and should allow kube-router to operate on our default settings. So I believe checking existing values should be better so that kube-router doesn't tighten up the filtering policy potentially breaking other networking. I am also investigating potential improvements on the Talos side so that KubeSpan would work even with rp_filter=2 or 1 |
@dsseng - This seems reasonable to me. Do you have time to create a PR for this change? |
Thank you for your response. I will perhaps contribute this change after completing the Talos part of work (since that should give us other benefits). There's one culprit, see the different options:
What specific interfaces should have rp_filter in weak mode? |
That's some good foresight. Right now we most likely only need to worry about the However, it wouldn't hurt to try to cover our bases. I would recommend that we check the primary interface (as retrieved by: |
Well, as docs state the value at the moment of filter run is the maximum of all and specific interface values. So if all is 0 and interface-specific is 1, then the behavior is 1, but if you set all to 2 it makes behavior for all the interfaces to be as 2. So I'm not really sure about a viable and fully-featured solution from the kube-router side, yet thankfully I have isolated the problem from KubeSpan & Talos side, so CNI behavior cannot disrupt the connectivity. |
Okay, marking packets incoming from the KubeSpan WireGuard interface and then setting the |
Only attempt to setup DSR inside containers for local endpoints. Setting up DSR inside the containers network namespace requires local pods / endpoints.
Fixes #1640
@opipenbe - I tested this locally with my smaller development cluster and it appeared to work correctly, however, I would be very appreciative if you would test this in your cluster as well. A container should be built as part of the GitHub Actions CI on this PR, you can look in the CI log for a link to kube-router-git for the newly built kube-router.
One thing that I still don't quite have an answer for though, is why this didn't also have this same error for you under kube-router v1.6.0 as the code path was essentially the same: https://github.com/cloudnativelabs/kube-router/blob/v1.6/pkg/controllers/proxy/service_endpoints_sync.go#L444-L447
Maybe something further up in the calling tree filtered it out?
Anyway, since to me, this seems like it is changing DSR logic that has existed for a long time, I'd appreciate the second test to ensure that it not only removes the error, but also still allows you to route to your service correctly.