Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(hairpin): set hairpin_mode for veth iface #1582

Merged
merged 1 commit into from
Dec 7, 2023

Conversation

aauren
Copy link
Collaborator

@aauren aauren commented Dec 3, 2023

It used to be that the kubelet handled setting hairpin mode for us: kubernetes/kubernetes#13628

Then this functionality moved to the dockershim:
kubernetes/kubernetes#62212

Then the functionality was removed entirely:
kubernetes/kubernetes@83265c9171f

Unfortunately, it was lost that we ever depended on this in order for our hairpin implementation to work, if we ever knew it at all. Additionally, I suspect that containerd and cri-o implementations never worked correctly with hairpinning.

Without this, the NAT rules that we implement for hairpinning don't work correctly. Because hairpin_mode isn't implemented on the virtual interface of the container on the host, the packet bubbles up to the kube-bridge. At some point in the traffic flow, the route back to the pod gets resolved to the mac address inside the container, at that point, the packet's source mac and destination mac don't match the kube-bridge interface and the packet is black-holed.

This can also be fixed by putting the kube-bridge interface into promiscuous mode so that it accepts all mac addresses, but I think that going back to the original functionality of enabling hairpin_mode on the veth interface of the container is likely the lesser of two evils here as putting the kube-bridge interface into promiscuous mode will likely have unintentional consequences.

With this fix, kube-router now passes the Kubernetes end-to-end conformance test for hairpinning traffic.

It used to be that the kubelet handled setting hairpin mode for us:
kubernetes/kubernetes#13628

Then this functionality moved to the dockershim:
kubernetes/kubernetes#62212

Then the functionality was removed entirely:
kubernetes/kubernetes@83265c9171f

Unfortunately, it was lost that we ever depended on this in order for
our hairpin implementation to work, if we ever knew it at all.
Additionally, I suspect that containerd and cri-o implementations never
worked correctly with hairpinning.

Without this, the NAT rules that we implement for hairpinning don't work
correctly. Because hairpin_mode isn't implemented on the virtual
interface of the container on the host, the packet bubbles up to the
kube-bridge. At some point in the traffic flow, the route back to the
pod gets resolved to the mac address inside the container, at that
point, the packet's source mac and destination mac don't match the
kube-bridge interface and the packet is black-holed.

This can also be fixed by putting the kube-bridge interface into
promiscuous mode so that it accepts all mac addresses, but I think that
going back to the original functionality of enabling hairpin_mode on the
veth interface of the container is likely the lesser of two evils here
as putting the kube-bridge interface into promiscuous mode will likely
have unintentional consequences.
@aauren aauren merged commit 0f3714b into cloudnativelabs:master Dec 7, 2023
7 checks passed
// ugly workaround, refactoring of pkg/Proxy is required
pid, err = hpc.nsc.ln.getContainerPidWithCRI(hpc.nsc.dsr.runtimeEndpoint, containerID)
if err != nil {
return fmt.Errorf("failed to prepare endpoint %s to do DSR due to: %v", endpointIP, err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should use fmt.Errorf("...: %w", ..., err) instead of %v for error type propagation that you can later check with errors.Is or convert with errors.As

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably at some point we're going to have to convert the entire code base. For now I've just been continuing on the %v wagon to stay consistent within the code base. So far we haven't really needed the functionality to look at error types or wrapped errors, so it has worked, but we should probably upgrade anyway.

aauren added a commit to aauren/kube-router that referenced this pull request Dec 8, 2023
This is needed because cloudnativelabs#1582 which was recently merged relies upon
finding the correct veth interface via /proc/<pid> which isn't available
unless kube-router is in the same process namespace.

hostPID and hostIPC was always required for DSR functionality, but now
hostPID is needed for hairpin to be available.
aauren added a commit that referenced this pull request Dec 8, 2023
This is needed because #1582 which was recently merged relies upon
finding the correct veth interface via /proc/<pid> which isn't available
unless kube-router is in the same process namespace.

hostPID and hostIPC was always required for DSR functionality, but now
hostPID is needed for hairpin to be available.
@aauren
Copy link
Collaborator Author

aauren commented Jan 6, 2024

Progress towards: #1596

@aauren aauren added this to the v2.1.0 milestone Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants