Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linkerd #64

Open
ThetaDR opened this issue Dec 7, 2022 · 9 comments
Open

Linkerd #64

ThetaDR opened this issue Dec 7, 2022 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@ThetaDR
Copy link

ThetaDR commented Dec 7, 2022

At this moment we found, that Linkerd is using IP addresses for endpoint identity
For example, here is using a destination ip from the tcp packet (e.g. 10.96.5.1) https://github.com/linkerd/linkerd2-proxy/blob/main/linkerd/service-profiles/src/client.rs#L133

In the extension use-case we have this datapath:

NSC —>  NSM1 —> NSM2  —> NSE L7 Proxy -> Linkerd-proxy —>  Workload
    1.       2.       3.               4.                5.

At step 4 we have destination IP of the NSM interface in the outgoing packet
And we need to replace it to the Workload IP to make Linkerd work

Steps:

  1. NSC calls HTTP get for service ‘Workload’
  2. NSC does NSM lookup for ‘Workload’
  3. ‘Workload’ resolves to NSE L7 Proxy IP address (via fanout and a caching proxy on NSE side)
  4. NSC sends a tcp SYN to the NSE L7 Proxy
  5. The iptable rules in NSE L7 Proxy, which are redirecting all tcp traffic coming from the vWIre to the linkerd-proxy
  6. linkerd-proxy looks at destination ip address (which is NSE L7 Proxy IP address) and cant find a suitable proxy
@denis-tingaikin
Copy link
Member

/cc @edwarnicke

@ThetaDR Could you please double-check our investigation and provide results?

@NikitaSkrynnik
Copy link
Contributor

@edwarnicke

Double-checked problem that was described above.

linkerd-proxy really looks at Destination ip address of TCP-packet. Here in linkerd-proxy we can see that linkerd-proxy makes api call to discovery service with destionation addr.

Linkerd's discovery service tries to find who owns this ip address among kubernetes services and pods HERE.

If Discovery doesn't find anything, linkerd-proxy just forwards the TCP-packet to its initial destination.

In our case, when we send TCP-packet from Workload1, the packet has dst ip == 172.16.1.3 and linkerd simply can't find this ip among services and pods on a cluster.

We decided to make cmd-nse-l7-proxy leave ip address of a service unchanged. So, now we get a real ip address of the service on workload1 when we make dns request for the service.

Then we add routing rule to workload1 to route all traffic that has destination ip in 10.96.0.0/16 to nsm interface.

ip ro add 10.96.0.0/16 via 172.16.1.3

And we also add iptables rule to cmd-nse-l7-proxy:

iptables -t nat -I PREROUTING 1 -p tcp -i {{ .NsmInterfaceName }} -j DNAT --to-destination 127.0.0.1:4140

This rule redirects all traffic from nsm interface to outbound-proxy, which looks at destination IP of a packet and discovers needed service.

The only problem we have here is that this routing rule redirects all traffic to nsm interface if its destination ip is in 10.96.0.0/16 subnet.

Possible solutions

  1. We can leave this working example as is.
  2. We can somehow get a table of services that is stored in Discovery service of Linkerd and send this table to Workload1. Wokrload1 can use this table to determine what services are on the second cluster.
  3. We can investigate more how Linkerd's discovery service works (for example, make experiments with K8s Service IPs). Maybe we will find something new in Discovery Service that will help us to come up with new solution

@NikitaSkrynnik
Copy link
Contributor

NikitaSkrynnik commented Jan 27, 2023

Solution with ipset, iproute2 and dnsmasq

We can create ip set and store IPs of services of the second cluster there. We can mark all packets that has destination IP in the set with mark 3, for example. Also, we add route for packets, that routes them to nsm interface if they have mark 3.

And the last thing, we use dnsmasq that writes all dns responses from the second cluster to the ip set described above. We have to give dnsmasq a list of dns domains that are on the second cluster, so it can filter dns responses using this list.

All this work should be done on NSC on the first cluster.

  1. Create ipset
ipset create NSM_IPSET iphash
  1. Create a сustom routing rable
echo 201 nsm_table >> /etc/iproute2/rt_tables
  1. Add route to the custom routing table to route all traffic to nsm interface
ip ro add default via 172.16.1.2 proto static table nsm_table
  1. Use this custom routing table only for packets with mark 3
ip ru add fwmark 3 lookup nsm_table pref 3333
  1. Create iptables rule to mark all packets that have dst ip in the ip set with mark 3
iptables -A OUTPUT -t mangle -m set --match-set NSM_IPSET dst -j MARK --set-mark 3
  1. Add filter to dnsmasq (for dns domain *.default)
ipset=/default/NSM_IPSET

@NikitaSkrynnik
Copy link
Contributor

NikitaSkrynnik commented Feb 16, 2023

Solution with ip mutations

Main idea

cmd-nse-l7-proxy can send DNS responces with the changed first byte in ip. The first byte can be 199, for example. So cmd-nsc can distinguish packets which it needs to send to the second cluster by the first changed byte. cmd-nsc marks this packets with a special mark, changes the first byte back to its real value and a special route (also in cmd-nsc) routes all marked packets to nsm interface.

It solves the problem with overlapping subnets. Now cmd-nsc knows which packets should be send to the second cluster, because this packets are in a special subnet 199.0.0.0/8.

Steps

  1. We rework cmd-nse-l7-proxy DNS server. The server should change the first byte of all ips which it sends in DNS responses to cmd-nsc. For example, first byte can be 199.

  2. We mark all outbound TCP traffic that has 199 in the first byte of destination ip with mark 3 on cmd-nsc

iptables -t mangle -A OUTPUT -p tcp -d 199.0.0.0/8 -j MARK --set-mark 3
  1. We change the first byte of all marked packets on cmd-nsc
iptables -t nat -A OUTPUT -m mark --mark 8 -j NETMAP --to 10.0.0.0/8
  1. We change the source ip of all marked tcp packets
iptables -t nat -A POSTROUTING -m mark --mark 8 -j SNAT --to 172.16.1.3
  1. We create special route for marked packets that routes them to nsm interface
echo 201 nsm_table >> /etc/iproute2/rt_tables
ip ru add fwmark 3 lookup nsm_table pref 3333
ip ro add default via 172.16.1.3 table nsm_table

@edwarnicke
Copy link
Member

  1. We mark all outbound TCP traffic that has 199 in the first byte of destination ip with mark 3 on cmd-nsc
iptables -t mangle -A OUTPUT -p tcp -d 199.0.0.0/8 -j MARK --set-mark 3

Why use the mark? Why not just change the first byte without the mark?

  1. We change the first byte of all marked packets on cmd-nsc
iptables -t nat -A OUTPUT -m mark --mark 8 -j NETMAP --to 10.0.0.0/8

Is this mark just marking packets that should go out the NSM interface?

  1. We change the source ip of all marked tcp packets
iptables -t nat -A POSTROUTING -m mark --mark 8 -j SNAT --to 172.16.1.3

Couldn't all of this be done in the proxy rather than the NSC?

@edwarnicke
Copy link
Member

How do we know that 199.0.0.0/8 is safe to use for this purpose? Is there a corresponding safe address to do this for IPv6?

@edwarnicke
Copy link
Member

Have you looked at link local addresses?

@NikitaSkrynnik
Copy link
Contributor

1. Why use the mark? Why not just change the first byte without the mark?
We use mark to send this packets to NSM interface after changing the first byte. If we don't do this, the packet will be lost because we don't have service with such ip on the first cluster.

2. Is this mark just marking packets that should go out the NSM interface?
Yes

3. Couldn't all of this be done in the proxy rather than the NSC?
Unfortunately we can't do all of this on nse-l7-proxy, because on nse-l7-proxy we have DNAT rule which redirects packets to linkerd's proxy. For some reason NETMAP rule and DNAT rule doesn't work together. If we put NETMAP rule before DNAT rule, TCP packet will never reach DNAT rule.
We can use composition of two NSEs. The first one will have NETMAP and the second one will have DNAT.

4. How do we know that 199.0.0.0/8 is safe to use for this purpose? Is there a corresponding safe address to do this for IPv6?
We can use any subnet which doesn't overlap with the first cluster's subnets. We can calculate it automatically, based on service's subnets of the clusters. We can do the same for IPv6 addresses.

5. Have you looked at link local addresses?
Could you tell more about how we can use it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants