fix(DSR): setup DSR inside pod on local eps only #1651

aauren · 2024-04-22T03:01:13Z

Only attempt to setup DSR inside containers for local endpoints. Setting up DSR inside the containers network namespace requires local pods / endpoints.

Fixes #1640

@opipenbe - I tested this locally with my smaller development cluster and it appeared to work correctly, however, I would be very appreciative if you would test this in your cluster as well. A container should be built as part of the GitHub Actions CI on this PR, you can look in the CI log for a link to kube-router-git for the newly built kube-router.

One thing that I still don't quite have an answer for though, is why this didn't also have this same error for you under kube-router v1.6.0 as the code path was essentially the same: https://github.com/cloudnativelabs/kube-router/blob/v1.6/pkg/controllers/proxy/service_endpoints_sync.go#L444-L447

Maybe something further up in the calling tree filtered it out?

Anyway, since to me, this seems like it is changing DSR logic that has existed for a long time, I'd appreciate the second test to ensure that it not only removes the error, but also still allows you to route to your service correctly.

Only attempt to setup DSR inside containers for local endpoints. Setting up DSR inside the containers network namespace requires local pods / endpoints.

opipenbe · 2024-04-22T09:14:10Z

Thank you @aauren for the great work!

I just tested with PR-1651 image in my environment and I can confirm DSR feature now works correctly! I also did full environment restart in order to be sure this change does fix DSR.

aauren · 2024-04-22T14:28:58Z

@opipenbe - Thanks for testing! Just to be double-sure, you also tested that the service behind the DSR also still continued to work in different scenarios too right? (e.g. from a worker that didn't contain the endpoint, for outside the cluster, etc.)?

opipenbe · 2024-04-22T14:50:10Z

@aauren , I tested DSR outside of the cluster since my service external IP is exposed from all worker nodes by BGP.

I also didnt find anything suspicious in the kube-router logs.

I dont know a good method to test DSR inside cluster. As I understand, it is by design that packets are dropped from cluster to external IPs with DSR feature? If It is somehow possible to test DSR inside the cluster, then let me know and I can test it also :)

aauren · 2024-04-22T20:33:17Z

Hmm... In my cluster, the external IPs route within the cluster just fine DSR or no DSR.

Inside the cluster, for non-service-local VIPs every node should have an IPVS definition for the service, so I would imagine that it should "just work".

opipenbe · 2024-04-23T16:17:39Z

Thanks @aauren for clarification! I figured out why DSR packets are dropped in my environment - in the RHEL / Rocky Linux, multihoming is not enabled by default. DSR makes routing asymmetric, so rp_filter (default value 1) will drop DSR: https://access.redhat.com/solutions/53031 .

So I can also confirm that DSR is working successfully inside cluster after configuring kernel parameter rp_filter = 2 for all nodes :)

aauren · 2024-04-24T22:46:04Z

@opipenbe Thanks for taking the deep dive on this!

I have added a commit to this, that sets rp_filter to 2 during the startup of kube-router similar to other sysctl values that we do sanity checks on.

rp_filter on RedHat based OS's is often set to 1 instead of 2 which is more permissive and allows the outbound route for traffic to differ from the route of incoming traffic.

dsseng · 2024-12-24T13:50:27Z

@aauren , I tested DSR outside of the cluster since my service external IP is exposed from all worker nodes by BGP.

I also didnt find anything suspicious in the kube-router logs.

I dont know a good method to test DSR inside cluster. As I understand, it is by design that packets are dropped from cluster to external IPs with DSR feature? If It is somehow possible to test DSR inside the cluster, then let me know and I can test it also :)

Wouldn't setting it to 2 only if it is already 1 be better? kube-router sets this sysctl to 2 unconditionally (while typical kernel default is 0), which breaks our KubeSpan (siderolabs/talos#9814), while 0 is more permissive than 2 and should allow kube-router to operate on our default settings.

So I believe checking existing values should be better so that kube-router doesn't tighten up the filtering policy potentially breaking other networking. I am also investigating potential improvements on the Talos side so that KubeSpan would work even with rp_filter=2 or 1

aauren · 2024-12-24T17:20:12Z

@dsseng - This seems reasonable to me.

Do you have time to create a PR for this change?

dsseng · 2024-12-24T17:38:40Z

Thank you for your response. I will perhaps contribute this change after completing the Talos part of work (since that should give us other benefits).

There's one culprit, see the different options:

all = 0, the desired interface is also 0 - no need for override
all = 1 - override anyway
all = 2 - no need for override
all = 0, but some of the specific interface values are 1 - we can't check all interfaces and due to this we miss setting the override.

What specific interfaces should have rp_filter in weak mode?

aauren · 2024-12-24T18:01:12Z

That's some good foresight. Right now we most likely only need to worry about the all interface, as that's the way the functionality works now and we haven't gotten any bugs reported.

However, it wouldn't hurt to try to cover our bases.

I would recommend that we check the primary interface (as retrieved by: NodeInterfaceAware.GetNodeInterfaceName() https://github.com/cloudnativelabs/kube-router/blob/master/pkg/utils/node.go#L55) and kube-bridge at the very least and ensure that they are set the same way all is set.

dsseng · 2024-12-24T18:53:21Z

Well, as docs state the value at the moment of filter run is the maximum of all and specific interface values. So if all is 0 and interface-specific is 1, then the behavior is 1, but if you set all to 2 it makes behavior for all the interfaces to be as 2.

So I'm not really sure about a viable and fully-featured solution from the kube-router side, yet thankfully I have isolated the problem from KubeSpan & Talos side, so CNI behavior cannot disrupt the connectivity.

dsseng · 2024-12-30T10:27:28Z

Okay, marking packets incoming from the KubeSpan WireGuard interface and then setting the conf/kubespan/src_valid_mark sysctl to 1 helped resolve the issue on Talos. siderolabs/talos#10074 fixes this on our side and seems to work even with strict filtering. Will do a fix on the Kube-Router side to ensure such disruptions don't happen on other systems

fix(DSR): setup DSR inside pod on local eps only

59a1281

Only attempt to setup DSR inside containers for local endpoints. Setting up DSR inside the containers network namespace requires local pods / endpoints.

mrueg approved these changes Apr 22, 2024

View reviewed changes

fix(service_endpoints_sync.go): error to be indicative of failure type

1f0a2cf

feat(NSC): ensure rp_filter is set correctly

e3c0f3b

rp_filter on RedHat based OS's is often set to 1 instead of 2 which is more permissive and allows the outbound route for traffic to differ from the route of incoming traffic.

aauren force-pushed the fix_dsr_container_enter_for_remote_endpoints branch from 854e55a to e3c0f3b Compare April 24, 2024 22:49

aauren merged commit b423b1f into master Apr 24, 2024
7 checks passed

aauren deleted the fix_dsr_container_enter_for_remote_endpoints branch April 24, 2024 23:13

dsseng mentioned this pull request Dec 30, 2024

fix(NSC): only set rp_filter to 2 if it is 1 #1791

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(DSR): setup DSR inside pod on local eps only #1651

fix(DSR): setup DSR inside pod on local eps only #1651

aauren commented Apr 22, 2024

opipenbe commented Apr 22, 2024

aauren commented Apr 22, 2024

opipenbe commented Apr 22, 2024

aauren commented Apr 22, 2024

opipenbe commented Apr 23, 2024

aauren commented Apr 24, 2024

dsseng commented Dec 24, 2024

aauren commented Dec 24, 2024

dsseng commented Dec 24, 2024

aauren commented Dec 24, 2024

dsseng commented Dec 24, 2024

dsseng commented Dec 30, 2024

fix(DSR): setup DSR inside pod on local eps only #1651

fix(DSR): setup DSR inside pod on local eps only #1651

Conversation

aauren commented Apr 22, 2024

opipenbe commented Apr 22, 2024

aauren commented Apr 22, 2024

opipenbe commented Apr 22, 2024

aauren commented Apr 22, 2024

opipenbe commented Apr 23, 2024

aauren commented Apr 24, 2024

dsseng commented Dec 24, 2024

aauren commented Dec 24, 2024

dsseng commented Dec 24, 2024

aauren commented Dec 24, 2024

dsseng commented Dec 24, 2024

dsseng commented Dec 30, 2024