Linkerd #64

ThetaDR · 2022-12-07T08:07:41Z

At this moment we found, that Linkerd is using IP addresses for endpoint identity
For example, here is using a destination ip from the tcp packet (e.g. 10.96.5.1) https://github.com/linkerd/linkerd2-proxy/blob/main/linkerd/service-profiles/src/client.rs#L133

In the extension use-case we have this datapath:

NSC —>  NSM1 —> NSM2  —> NSE L7 Proxy -> Linkerd-proxy —>  Workload
    1.       2.       3.               4.                5.

At step 4 we have destination IP of the NSM interface in the outgoing packet
And we need to replace it to the Workload IP to make Linkerd work

Steps:

NSC calls HTTP get for service ‘Workload’
NSC does NSM lookup for ‘Workload’
‘Workload’ resolves to NSE L7 Proxy IP address (via fanout and a caching proxy on NSE side)
NSC sends a tcp SYN to the NSE L7 Proxy
The iptable rules in NSE L7 Proxy, which are redirecting all tcp traffic coming from the vWIre to the linkerd-proxy
linkerd-proxy looks at destination ip address (which is NSE L7 Proxy IP address) and cant find a suitable proxy

The text was updated successfully, but these errors were encountered:

denis-tingaikin · 2022-12-07T11:39:29Z

/cc @edwarnicke

@ThetaDR Could you please double-check our investigation and provide results?

NikitaSkrynnik · 2023-01-24T10:56:28Z

@edwarnicke

Double-checked problem that was described above.

linkerd-proxy really looks at Destination ip address of TCP-packet. Here in linkerd-proxy we can see that linkerd-proxy makes api call to discovery service with destionation addr.

Linkerd's discovery service tries to find who owns this ip address among kubernetes services and pods HERE.

If Discovery doesn't find anything, linkerd-proxy just forwards the TCP-packet to its initial destination.

In our case, when we send TCP-packet from Workload1, the packet has dst ip == 172.16.1.3 and linkerd simply can't find this ip among services and pods on a cluster.

We decided to make cmd-nse-l7-proxy leave ip address of a service unchanged. So, now we get a real ip address of the service on workload1 when we make dns request for the service.

Then we add routing rule to workload1 to route all traffic that has destination ip in 10.96.0.0/16 to nsm interface.

ip ro add 10.96.0.0/16 via 172.16.1.3

And we also add iptables rule to cmd-nse-l7-proxy:

iptables -t nat -I PREROUTING 1 -p tcp -i {{ .NsmInterfaceName }} -j DNAT --to-destination 127.0.0.1:4140

This rule redirects all traffic from nsm interface to outbound-proxy, which looks at destination IP of a packet and discovers needed service.

The only problem we have here is that this routing rule redirects all traffic to nsm interface if its destination ip is in 10.96.0.0/16 subnet.

Possible solutions

We can leave this working example as is.
We can somehow get a table of services that is stored in Discovery service of Linkerd and send this table to Workload1. Wokrload1 can use this table to determine what services are on the second cluster.
We can investigate more how Linkerd's discovery service works (for example, make experiments with K8s Service IPs). Maybe we will find something new in Discovery Service that will help us to come up with new solution

NikitaSkrynnik · 2023-01-27T06:28:03Z

Solution with ipset, iproute2 and dnsmasq

We can create ip set and store IPs of services of the second cluster there. We can mark all packets that has destination IP in the set with mark 3, for example. Also, we add route for packets, that routes them to nsm interface if they have mark 3.

And the last thing, we use dnsmasq that writes all dns responses from the second cluster to the ip set described above. We have to give dnsmasq a list of dns domains that are on the second cluster, so it can filter dns responses using this list.

All this work should be done on NSC on the first cluster.

Create ipset

ipset create NSM_IPSET iphash

Create a сustom routing rable

echo 201 nsm_table >> /etc/iproute2/rt_tables

Add route to the custom routing table to route all traffic to nsm interface

ip ro add default via 172.16.1.2 proto static table nsm_table

Use this custom routing table only for packets with mark 3

ip ru add fwmark 3 lookup nsm_table pref 3333

Create iptables rule to mark all packets that have dst ip in the ip set with mark 3

iptables -A OUTPUT -t mangle -m set --match-set NSM_IPSET dst -j MARK --set-mark 3

Add filter to dnsmasq (for dns domain *.default)

ipset=/default/NSM_IPSET

NikitaSkrynnik · 2023-02-16T10:28:21Z

Solution with ip mutations

Main idea

cmd-nse-l7-proxy can send DNS responces with the changed first byte in ip. The first byte can be 199, for example. So cmd-nsc can distinguish packets which it needs to send to the second cluster by the first changed byte. cmd-nsc marks this packets with a special mark, changes the first byte back to its real value and a special route (also in cmd-nsc) routes all marked packets to nsm interface.

It solves the problem with overlapping subnets. Now cmd-nsc knows which packets should be send to the second cluster, because this packets are in a special subnet 199.0.0.0/8.

Steps

We rework cmd-nse-l7-proxy DNS server. The server should change the first byte of all ips which it sends in DNS responses to cmd-nsc. For example, first byte can be 199.
We mark all outbound TCP traffic that has 199 in the first byte of destination ip with mark 3 on cmd-nsc

iptables -t mangle -A OUTPUT -p tcp -d 199.0.0.0/8 -j MARK --set-mark 3

We change the first byte of all marked packets on cmd-nsc

iptables -t nat -A OUTPUT -m mark --mark 8 -j NETMAP --to 10.0.0.0/8

We change the source ip of all marked tcp packets

iptables -t nat -A POSTROUTING -m mark --mark 8 -j SNAT --to 172.16.1.3

We create special route for marked packets that routes them to nsm interface

echo 201 nsm_table >> /etc/iproute2/rt_tables
ip ru add fwmark 3 lookup nsm_table pref 3333
ip ro add default via 172.16.1.3 table nsm_table

edwarnicke · 2023-03-14T14:42:19Z

We mark all outbound TCP traffic that has 199 in the first byte of destination ip with mark 3 on cmd-nsc
iptables -t mangle -A OUTPUT -p tcp -d 199.0.0.0/8 -j MARK --set-mark 3

Why use the mark? Why not just change the first byte without the mark?

We change the first byte of all marked packets on cmd-nsc
iptables -t nat -A OUTPUT -m mark --mark 8 -j NETMAP --to 10.0.0.0/8

Is this mark just marking packets that should go out the NSM interface?

We change the source ip of all marked tcp packets
iptables -t nat -A POSTROUTING -m mark --mark 8 -j SNAT --to 172.16.1.3

Couldn't all of this be done in the proxy rather than the NSC?

edwarnicke · 2023-03-14T14:42:52Z

How do we know that 199.0.0.0/8 is safe to use for this purpose? Is there a corresponding safe address to do this for IPv6?

edwarnicke · 2023-03-14T14:43:06Z

Have you looked at link local addresses?

NikitaSkrynnik · 2023-03-28T14:38:10Z

1. Why use the mark? Why not just change the first byte without the mark?
We use mark to send this packets to NSM interface after changing the first byte. If we don't do this, the packet will be lost because we don't have service with such ip on the first cluster.

2. Is this mark just marking packets that should go out the NSM interface?
Yes

3. Couldn't all of this be done in the proxy rather than the NSC?
Unfortunately we can't do all of this on nse-l7-proxy, because on nse-l7-proxy we have DNAT rule which redirects packets to linkerd's proxy. For some reason NETMAP rule and DNAT rule doesn't work together. If we put NETMAP rule before DNAT rule, TCP packet will never reach DNAT rule.
We can use composition of two NSEs. The first one will have NETMAP and the second one will have DNAT.

4. How do we know that 199.0.0.0/8 is safe to use for this purpose? Is there a corresponding safe address to do this for IPv6?
We can use any subnet which doesn't overlap with the first cluster's subnets. We can calculate it automatically, based on service's subnets of the clusters. We can do the same for IPv6 addresses.

5. Have you looked at link local addresses?
Could you tell more about how we can use it?

edwarnicke · 2023-03-28T14:48:29Z

https://en.wikipedia.org/wiki/Link-local_address#:~:text=IPv4%20link%2Dlocal%20addresses%20are,255.255).

denis-tingaikin assigned NikitaSkrynnik Jan 26, 2023

denis-tingaikin added the enhancement New feature or request label Jan 26, 2023

NikitaSkrynnik mentioned this issue Feb 21, 2023

Add mark to policy routes networkservicemesh/api#155

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linkerd #64

Linkerd #64

ThetaDR commented Dec 7, 2022

denis-tingaikin commented Dec 7, 2022

NikitaSkrynnik commented Jan 24, 2023

NikitaSkrynnik commented Jan 27, 2023 •

edited

Loading

NikitaSkrynnik commented Feb 16, 2023 •

edited

Loading

edwarnicke commented Mar 14, 2023

edwarnicke commented Mar 14, 2023

edwarnicke commented Mar 14, 2023

NikitaSkrynnik commented Mar 28, 2023

edwarnicke commented Mar 28, 2023

Linkerd #64

Linkerd #64

Comments

ThetaDR commented Dec 7, 2022

denis-tingaikin commented Dec 7, 2022

NikitaSkrynnik commented Jan 24, 2023

Possible solutions

NikitaSkrynnik commented Jan 27, 2023 • edited Loading

Solution with ipset, iproute2 and dnsmasq

NikitaSkrynnik commented Feb 16, 2023 • edited Loading

Solution with ip mutations

Main idea

Steps

edwarnicke commented Mar 14, 2023

edwarnicke commented Mar 14, 2023

edwarnicke commented Mar 14, 2023

NikitaSkrynnik commented Mar 28, 2023

edwarnicke commented Mar 28, 2023

NikitaSkrynnik commented Jan 27, 2023 •

edited

Loading

NikitaSkrynnik commented Feb 16, 2023 •

edited

Loading