Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New network topology for firecracker VMs #798

Conversation

CuriousGeorgiy
Copy link
Member

@CuriousGeorgiy CuriousGeorgiy commented Sep 4, 2023

Summary

Closes #797
Part of #794

Implementation Notes ⚒️

See #797 for details.

This PR introduces a new networking manager which implements the new network topology described in #797, but does not replace the existing tap manager, as it requires a patch to firecracker-containerd that adds a network namespace parameter to the CreateVM request.

Unit tests for the new networking manager are provided.

External Dependencies 🍀

  • N/A.

Breaking API Changes ⚠️

  • N/A.

@CuriousGeorgiy CuriousGeorgiy force-pushed the network-manager-refactoring branch 3 times, most recently from e3464df to a19d27a Compare September 4, 2023 13:23
Copy link
Member

@ustiugov ustiugov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please check the tests

@CuriousGeorgiy
Copy link
Member Author

@ustiugov I made an isolated change (i.e., the new networking module is not used), so I AFAIC, the test failures are not related to my changes (and the unit test for the networking module passes).

@lrq619
Copy link
Collaborator

lrq619 commented Sep 5, 2023

I'm restarting the runners, might be related to this

@CuriousGeorgiy
Copy link
Member Author

CuriousGeorgiy commented Sep 5, 2023

@lrq619 looks like the workflows are all failing due the same error:

Error: 4m[09:32:45] [Info] Installing Knative Serving component (gvisor mode) >>>>> [09:33:34] [Error] [exit 1] -> error: unable to read URL "https://raw.githubusercontent.com/vhive-serverless/vHive/main/configs/knative_yamls/serving-core.yaml", server reported 503 Service Unavailable, status code=503
Error: 1m[09:33:[34](https://github.com/vhive-serverless/vHive/actions/runs/6074712390/job/16500708127?pr=800#step:6:35)] [Error] Failed to install Knative Serving component!
Error: 1m[09:33:34] [Error] Faild subcommand: start_onenode_vhive_cluster!

@lrq619
Copy link
Collaborator

lrq619 commented Sep 5, 2023

All tests passed

@leokondrashov
Copy link
Contributor

Merge is blocked because the commit is deemed unsigned. I think that all that novel that you put in the commit message is too much for this auto check. Can you please remove it and update the branch?

@CuriousGeorgiy CuriousGeorgiy force-pushed the network-manager-refactoring branch 3 times, most recently from 22c3440 to 87b3e72 Compare September 5, 2023 10:43
Currently, each firecracker VM needs to use a TAP network device, to route
its packages into the network stack of the physical host. When saving and
restoring a function instance, the tap device name and the IP address of
the functions’ server, running inside the container, are preserved (see
also the current requirements for vanilla firecracker snapshot
loading [1]). This leads to networking conflicts on the host and limits the
snapshot restoration to a single instance per physical machine.

To bypass this obstacle, the following network topology is proposed:

1. A new network namespace (e.g.: VMns4) is created for each VM, in which
the TAP device from the snapshotted VM is rebuilt and receives the original
IP address of the function. The TAP device will broadcast all the incoming
and outgoing packets to and from the serverless function and VM’s network
interface. Each VM will run in its own network namespace, leading to no
conflicts on the host due to networking resources.

2. A local virtual tunnel is established between the VM inside its network
namespace and the host node via a virtual ethernet pair (veth). A link is
then established between the two ends of the virtual ethernet pair, in the
network namespace (veth4-0) and the host namespace (veth4-1). In contrast,
the default vHive configuration sets up a similar forwarding system through
network bridges.

3. Inside the network namespace we add a routing rule that redirects all
packets via the veth VM end towards a default gateway (172.17.0.17). Thus,
all packets sent by the function will show at the hosts’ end of the tunnel.

4. To avoid IP conflicts when routing the packets to and from functions,
each VM is assigned a unique clone address (172.18.0.5). All packets
leaving the VM end of the virtual ethernet pair get their source address
rewritten to the clone address of the corresponding VM. Packets entering
the host end of the virtual ethernet pair get their destination address
written to the original address of the VM. As a result, each VM still
thinks it is using the original address while in reality, its address is
translated to a clone address, different for every VM. This is accomplished
using two rules in the NAT table corresponding to the virtual namespace of
the VM. One rule is added in the POSTROUTING chain and one in the
PREROUTING chain. The POSTROUTING rule alters the network packets before
they are sent out in the virtual tunnel, from the VM namespace to the host,
and rewrites the IP source address of the packet. Similarly, the PREROUTING
rule overwrites the destination address of incoming packets, before
routing. The two ensure that packets going into the virtual namespace have
their destination address the original IP address of the VM (172.16.0.2),
while packets coming out of the namespace have their source address the
clone IP address (172.18.05). The source IP address will remain the same
for all the VM in the enhanced snapshotting mode, being set to 172.16.0.2
respectively.

5. In the routing table of the host, we add a rule that dictates that any
package that has as destination IP the clone IP of a VM, will be routed
towards the end of the tunnel situated in the corresponding network
namespace, through a set gateway (172.17.0.18). This ensures that whenever
packages arrive on the host for a VM, they will be sent down the right
virtual tunnel instantaneously.

6. In the hosts NFT filter table we add 2 rules for the FORWARD chain, that
allow traffic from the host end of the veth pair (veth4-1) to the default
host interface (eno 49) and vice versa.

Introduce a new networking management component for the topology described
above.

1. https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md#loading-snapshots

Closes vhive-serverless#797
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <[email protected]>
@CuriousGeorgiy CuriousGeorgiy force-pushed the network-manager-refactoring branch from 87b3e72 to 02edc1e Compare September 5, 2023 10:48
@CuriousGeorgiy
Copy link
Member Author

@leokondrashov No, the problem was you required signing commits for this repository. Fixed it by setting up commit signing.

@leokondrashov leokondrashov merged commit e940948 into vhive-serverless:main Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New network topology for firecracker VMs
4 participants