-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(hairpin): set hairpin_mode for veth iface #1582
Conversation
It used to be that the kubelet handled setting hairpin mode for us: kubernetes/kubernetes#13628 Then this functionality moved to the dockershim: kubernetes/kubernetes#62212 Then the functionality was removed entirely: kubernetes/kubernetes@83265c9171f Unfortunately, it was lost that we ever depended on this in order for our hairpin implementation to work, if we ever knew it at all. Additionally, I suspect that containerd and cri-o implementations never worked correctly with hairpinning. Without this, the NAT rules that we implement for hairpinning don't work correctly. Because hairpin_mode isn't implemented on the virtual interface of the container on the host, the packet bubbles up to the kube-bridge. At some point in the traffic flow, the route back to the pod gets resolved to the mac address inside the container, at that point, the packet's source mac and destination mac don't match the kube-bridge interface and the packet is black-holed. This can also be fixed by putting the kube-bridge interface into promiscuous mode so that it accepts all mac addresses, but I think that going back to the original functionality of enabling hairpin_mode on the veth interface of the container is likely the lesser of two evils here as putting the kube-bridge interface into promiscuous mode will likely have unintentional consequences.
// ugly workaround, refactoring of pkg/Proxy is required | ||
pid, err = hpc.nsc.ln.getContainerPidWithCRI(hpc.nsc.dsr.runtimeEndpoint, containerID) | ||
if err != nil { | ||
return fmt.Errorf("failed to prepare endpoint %s to do DSR due to: %v", endpointIP, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should use fmt.Errorf("...: %w", ..., err)
instead of %v for error type propagation that you can later check with errors.Is or convert with errors.As
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, probably at some point we're going to have to convert the entire code base. For now I've just been continuing on the %v
wagon to stay consistent within the code base. So far we haven't really needed the functionality to look at error types or wrapped errors, so it has worked, but we should probably upgrade anyway.
This is needed because cloudnativelabs#1582 which was recently merged relies upon finding the correct veth interface via /proc/<pid> which isn't available unless kube-router is in the same process namespace. hostPID and hostIPC was always required for DSR functionality, but now hostPID is needed for hairpin to be available.
This is needed because #1582 which was recently merged relies upon finding the correct veth interface via /proc/<pid> which isn't available unless kube-router is in the same process namespace. hostPID and hostIPC was always required for DSR functionality, but now hostPID is needed for hairpin to be available.
Progress towards: #1596 |
It used to be that the kubelet handled setting hairpin mode for us: kubernetes/kubernetes#13628
Then this functionality moved to the dockershim:
kubernetes/kubernetes#62212
Then the functionality was removed entirely:
kubernetes/kubernetes@83265c9171f
Unfortunately, it was lost that we ever depended on this in order for our hairpin implementation to work, if we ever knew it at all. Additionally, I suspect that containerd and cri-o implementations never worked correctly with hairpinning.
Without this, the NAT rules that we implement for hairpinning don't work correctly. Because hairpin_mode isn't implemented on the virtual interface of the container on the host, the packet bubbles up to the kube-bridge. At some point in the traffic flow, the route back to the pod gets resolved to the mac address inside the container, at that point, the packet's source mac and destination mac don't match the kube-bridge interface and the packet is black-holed.
This can also be fixed by putting the kube-bridge interface into promiscuous mode so that it accepts all mac addresses, but I think that going back to the original functionality of enabling hairpin_mode on the veth interface of the container is likely the lesser of two evils here as putting the kube-bridge interface into promiscuous mode will likely have unintentional consequences.
With this fix, kube-router now passes the Kubernetes end-to-end conformance test for hairpinning traffic.