Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(DSR): setup DSR inside pod on local eps only #1651

Merged
merged 3 commits into from
Apr 24, 2024

Conversation

aauren
Copy link
Collaborator

@aauren aauren commented Apr 22, 2024

Only attempt to setup DSR inside containers for local endpoints. Setting up DSR inside the containers network namespace requires local pods / endpoints.

Fixes #1640

@opipenbe - I tested this locally with my smaller development cluster and it appeared to work correctly, however, I would be very appreciative if you would test this in your cluster as well. A container should be built as part of the GitHub Actions CI on this PR, you can look in the CI log for a link to kube-router-git for the newly built kube-router.

One thing that I still don't quite have an answer for though, is why this didn't also have this same error for you under kube-router v1.6.0 as the code path was essentially the same: https://github.com/cloudnativelabs/kube-router/blob/v1.6/pkg/controllers/proxy/service_endpoints_sync.go#L444-L447

Maybe something further up in the calling tree filtered it out?

Anyway, since to me, this seems like it is changing DSR logic that has existed for a long time, I'd appreciate the second test to ensure that it not only removes the error, but also still allows you to route to your service correctly.

Only attempt to setup DSR inside containers for local endpoints. Setting
up DSR inside the containers network namespace requires local pods /
endpoints.
@opipenbe
Copy link

Thank you @aauren for the great work!

I just tested with PR-1651 image in my environment and I can confirm DSR feature now works correctly! I also did full environment restart in order to be sure this change does fix DSR.

@aauren
Copy link
Collaborator Author

aauren commented Apr 22, 2024

@opipenbe - Thanks for testing! Just to be double-sure, you also tested that the service behind the DSR also still continued to work in different scenarios too right? (e.g. from a worker that didn't contain the endpoint, for outside the cluster, etc.)?

@opipenbe
Copy link

@aauren , I tested DSR outside of the cluster since my service external IP is exposed from all worker nodes by BGP.

I also didnt find anything suspicious in the kube-router logs.

I dont know a good method to test DSR inside cluster. As I understand, it is by design that packets are dropped from cluster to external IPs with DSR feature? If It is somehow possible to test DSR inside the cluster, then let me know and I can test it also :)

@aauren
Copy link
Collaborator Author

aauren commented Apr 22, 2024

Hmm... In my cluster, the external IPs route within the cluster just fine DSR or no DSR.

Inside the cluster, for non-service-local VIPs every node should have an IPVS definition for the service, so I would imagine that it should "just work".

@opipenbe
Copy link

Thanks @aauren for clarification! I figured out why DSR packets are dropped in my environment - in the RHEL / Rocky Linux, multihoming is not enabled by default. DSR makes routing asymmetric, so rp_filter (default value 1) will drop DSR: https://access.redhat.com/solutions/53031 .

So I can also confirm that DSR is working successfully inside cluster after configuring kernel parameter rp_filter = 2 for all nodes :)

@aauren
Copy link
Collaborator Author

aauren commented Apr 24, 2024

@opipenbe Thanks for taking the deep dive on this!

I have added a commit to this, that sets rp_filter to 2 during the startup of kube-router similar to other sysctl values that we do sanity checks on.

rp_filter on RedHat based OS's is often set to 1 instead of 2 which is
more permissive and allows the outbound route for traffic to differ from
the route of incoming traffic.
@aauren aauren force-pushed the fix_dsr_container_enter_for_remote_endpoints branch from 854e55a to e3c0f3b Compare April 24, 2024 22:49
@aauren aauren merged commit b423b1f into master Apr 24, 2024
7 checks passed
@aauren aauren deleted the fix_dsr_container_enter_for_remote_endpoints branch April 24, 2024 23:13
@dsseng
Copy link
Contributor

dsseng commented Dec 24, 2024

@aauren , I tested DSR outside of the cluster since my service external IP is exposed from all worker nodes by BGP.

I also didnt find anything suspicious in the kube-router logs.

I dont know a good method to test DSR inside cluster. As I understand, it is by design that packets are dropped from cluster to external IPs with DSR feature? If It is somehow possible to test DSR inside the cluster, then let me know and I can test it also :)

Wouldn't setting it to 2 only if it is already 1 be better? kube-router sets this sysctl to 2 unconditionally (while typical kernel default is 0), which breaks our KubeSpan (siderolabs/talos#9814), while 0 is more permissive than 2 and should allow kube-router to operate on our default settings.

So I believe checking existing values should be better so that kube-router doesn't tighten up the filtering policy potentially breaking other networking. I am also investigating potential improvements on the Talos side so that KubeSpan would work even with rp_filter=2 or 1

@aauren
Copy link
Collaborator Author

aauren commented Dec 24, 2024

@dsseng - This seems reasonable to me.

Do you have time to create a PR for this change?

@dsseng
Copy link
Contributor

dsseng commented Dec 24, 2024

Thank you for your response. I will perhaps contribute this change after completing the Talos part of work (since that should give us other benefits).

There's one culprit, see the different options:

  1. all = 0, the desired interface is also 0 - no need for override
  2. all = 1 - override anyway
  3. all = 2 - no need for override
  4. all = 0, but some of the specific interface values are 1 - we can't check all interfaces and due to this we miss setting the override.

What specific interfaces should have rp_filter in weak mode?

@aauren
Copy link
Collaborator Author

aauren commented Dec 24, 2024

That's some good foresight. Right now we most likely only need to worry about the all interface, as that's the way the functionality works now and we haven't gotten any bugs reported.

However, it wouldn't hurt to try to cover our bases.

I would recommend that we check the primary interface (as retrieved by: NodeInterfaceAware.GetNodeInterfaceName() https://github.com/cloudnativelabs/kube-router/blob/master/pkg/utils/node.go#L55) and kube-bridge at the very least and ensure that they are set the same way all is set.

@dsseng
Copy link
Contributor

dsseng commented Dec 24, 2024

Well, as docs state the value at the moment of filter run is the maximum of all and specific interface values. So if all is 0 and interface-specific is 1, then the behavior is 1, but if you set all to 2 it makes behavior for all the interfaces to be as 2.

So I'm not really sure about a viable and fully-featured solution from the kube-router side, yet thankfully I have isolated the problem from KubeSpan & Talos side, so CNI behavior cannot disrupt the connectivity.

@dsseng
Copy link
Contributor

dsseng commented Dec 30, 2024

Okay, marking packets incoming from the KubeSpan WireGuard interface and then setting the conf/kubespan/src_valid_mark sysctl to 1 helped resolve the issue on Talos. siderolabs/talos#10074 fixes this on our side and seems to work even with strict filtering. Will do a fix on the Kube-Router side to ensure such disruptions don't happen on other systems

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DSR mode: "rpc error: code = NotFound desc = could not find container "": container ID should not be empty"
4 participants