Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acquire neighboring router MAC dynamically #619

Merged
merged 10 commits into from
Nov 5, 2024
1 change: 1 addition & 0 deletions .reuse/dep5
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Files:
Dockerfile
config/*
hack/*
test/benchmark_test/config_templates/*
*.json
*meson.build
proto/dpdk.proto
Expand Down
1 change: 1 addition & 0 deletions docs/deployment/help_dpservice-bin.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
| --pf0 | IFNAME | first physical interface (e.g. eth0) | |
| --pf1 | IFNAME | second physical interface (e.g. eth1) | |
| --pf1-proxy | IFNAME | VF representor to use as a proxy for pf1 packets | |
| --pf1-proxy-vf | IFNAME | VF interface of the pf1-proxy VF representor | |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have some combinations which do not make sense with the newly introduced flags ?
Like, if I set pf1-proxy then I need to set pf1-proxy-vf as well ? and If I set multiport-eswitch, I would needpf1-proxy-* switches and if they are not set then I can not operate meaningfully ?

If so, can we document these dependencies ? and enforce during command line argument parsing with meaningful hints returned ?

Also in the documentation, maybe an example dpservice-bin command including all the command line parameters for a simple mpesw operation ?
Or maybe I overlooked it in the documentation ? Otherwise it is not so clear now, how to make mpesw work and which parameters are needed to make it work. At least to me.

These command line arguments are supposed to be generated by prepare.sh right ? but still showcasing in an example which parameters needed for dpservice-bin to operate successfully in mpesw mode, would be helpful and how (with which parameters) am I supposed to call prepare.sh if I want to operate in mpesw mode ?

| --ipv6 | ADDR6 | IPv6 underlay address | |
| --vf-pattern | PATTERN | virtual interface name pattern (e.g. 'eth1vf') | |
| --dhcp-mtu | SIZE | set the mtu field in DHCP responses (68 - 1500) | |
Expand Down
3 changes: 3 additions & 0 deletions docs/deployment/mellanox.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,6 @@ In some cases (looks like a nic/switch combination) performance is severly affec

## Dp-service setup
Either `prepare.sh` script or `preparedp.service` systemd unit needs to be run before dp-service can work properly. This should already be done automatically if using the Docker image provided. Make sure this does not produce any errors.

### Multiport-eswitch
The `prepare.sh` script supports `--multiport-eswitch` argument to set the card up in multiport-eswitch mode. There is an additional `--pf1-proxy` argument to also create a VF on PF1 for proxying PF1 traffic. Currently both arguments are needed to properly run dpservice in multiport-eswitch mode due to a (suspected) driver bug.
13 changes: 13 additions & 0 deletions docs/development/running.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,3 +74,16 @@ Without the help of scripts or config files, you can run the service directly (a
`--vf_pattern` defines the prefix used by the virtual functions created by the smartnic and which need to be controlled by dp-service. These interfaces are then to be used by running VMs.

`--ipv6` sets the underlay IPv6 address which should be used by dp-service for ingress/egress packets coming to/leaving the smartnic.

#### Multiport-eswitch
In this mode, only the PF0 (which is bonded with PF1) needs to be specified:
```bash
./dpservice-bin -a 0000:03:00.0,class=rxq_cqe_comp_en=0,rx_vec_en=1,dv_flow_en=2,dv_esw_en=1,fdb_def_rule_en=1,representor=pf[0-1]vf[0-5] -l 0,1 -- --pf0=enp59s0f1 --pf1=enp59s0f1 --vf-pattern=enp59s0f0_ --ipv6=2a10:afc0:e01f:209:: --no-stats --no-offload --multiport-eswitch
```

#### PF1-proxy
In multiport-eswitch mode, currently PF1 is not usable (suspected driver problem), so dpservice provides a way to proxy the communication over a separate VF on PF1.
```bash
./dpservice-bin -a 0000:03:00.0,class=rxq_cqe_comp_en=0,rx_vec_en=1,dv_flow_en=2,dv_esw_en=1,fdb_def_rule_en=1,representor=pf[0-1]vf[0-5] -l 0,1 -- --pf0=enp59s0f1 --pf1=enp59s0f1 --vf-pattern=enp59s0f0_ --ipv6=2a10:afc0:e01f:209:: --no-stats --no-offload --multiport-eswitch --pf1-proxy enp59s0f1npf1vf0 --pf1-proxy-vf enp59s0f1v0
```
The `--pf1-proxy` is the representor used by dpservice for proxying packets. The `--pf1-proxy-vf` is the VF used by the Linux kernel to receive packets, i.e. the replacement for PF1. Without `--pf1-proxy-vf` dpservice is unable to determine the MAC address to use for host-host overlay traffic.
9 changes: 9 additions & 0 deletions hack/dp_conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,15 @@
"array_size": "IF_NAMESIZE",
"ifdef": "ENABLE_PF1_PROXY"
},
{
"lgopt": "pf1-proxy-vf",
"arg": "IFNAME",
"help": "VF interface of the pf1-proxy VF representor",
"var": "pf1_proxy_vf",
"type": "char",
"array_size": "IF_NAMESIZE",
"ifdef": "ENABLE_PF1_PROXY"
},
{
"lgopt": "ipv6",
"arg": "ADDR6",
Expand Down
37 changes: 27 additions & 10 deletions hack/prepare.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ OPT_MULTIPORT=false
OPT_PF1_PROXY=false

BLUEFIELD_IDENTIFIERS=("MT_0000000543", "MT_0000000541")
NUMVFS=126
MAX_NUMVFS_POSSIBLE=126
NUMVFS_DESIRED=126
CONFIG="/tmp/dp_service.conf"
IS_X86_WITH_BLUEFIELD=false
IS_ARM_WITH_BLUEFIELD=false
Expand Down Expand Up @@ -143,8 +144,12 @@ function create_vf() {
local pf0="${devs[0]}"
local pf1="${devs[1]}"

if [[ "$OPT_MULTIPORT" == "true" && "$NUMVFS_DESIRED" -eq "$MAX_NUMVFS_POSSIBLE" ]]; then
NUMVFS_DESIRED=$((NUMVFS_DESIRED - 1))
fi

if [[ "$IS_ARM_WITH_BLUEFIELD" == "true" ]]; then
actualvfs=$NUMVFS
actualvfs=$NUMVFS_DESIRED
log "Skipping VF creation for BlueField card on ARM"
# enable switchdev mode, this operation takes most time
process_switchdev_mode "$pf0"
Expand Down Expand Up @@ -185,7 +190,7 @@ function create_vf() {

# calculating amount of VFs to create, 126 if more are available, or maximum available
totalvfs=$(cat /sys/bus/pci/devices/$pf0/sriov_totalvfs)
actualvfs=$((NUMVFS<totalvfs ? NUMVFS : totalvfs))
actualvfs=$((NUMVFS_DESIRED<totalvfs ? NUMVFS_DESIRED : totalvfs))
log "creating $actualvfs virtual functions"
echo $actualvfs > /sys/bus/pci/devices/$pf0/sriov_numvfs
if [[ "$OPT_PF1_PROXY" == "true" ]]; then
Expand Down Expand Up @@ -246,9 +251,10 @@ function get_ipv6() {
while read -r l1; do
if [ "$l1" != "::1/128" ]; then
echo ${l1%/*}
break
return
fi
done < <(ip -6 -o addr show lo | awk '{print $4}')
err "no ipv6 found"
}

function make_config() {
Expand All @@ -257,16 +263,27 @@ function make_config() {
return
fi

# To make error propagation work, need to assign separately
conf_pf0="$(get_ifname 0)"
conf_pf1="$(get_ifname 1)"
conf_vf_pattern="$(get_pattern ${devs[0]})"
conf_ipv6="$(get_ipv6)"
if [[ "$OPT_MULTIPORT" == "true" ]]; then
conf_pf1_proxy="$(get_pf1_proxy ${devs[1]})"
conf_pf1_proxy_vf="$(get_pf1_proxy_vf)"
fi

{ echo "# This has been generated by prepare.sh"
echo "no-stats"
echo "pf0 $(get_ifname 0)"
echo "pf1 $(get_ifname 1)"
echo "vf-pattern $(get_pattern ${devs[0]})"
echo "ipv6 $(get_ipv6)"
echo "pf0 $conf_pf0"
echo "pf1 $conf_pf1"
echo "vf-pattern $conf_vf_pattern"
echo "ipv6 $conf_ipv6"
if [[ "$OPT_MULTIPORT" == "true" ]]; then
echo "a-pf0 ${devs[0]},class=rxq_cqe_comp_en=0,rx_vec_en=1,dv_flow_en=2,dv_esw_en=1,fdb_def_rule_en=1,representor=pf[0-1]vf[0-$[$actualvfs-1]]"
if [[ "$OPT_PF1_PROXY" == "true" ]]; then
echo "pf1-proxy $(get_pf1_proxy ${devs[1]})"
echo "pf1-proxy $conf_pf1_proxy"
echo "pf1-proxy-vf $conf_pf1_proxy_vf"
fi
echo "multiport-eswitch"
else
Expand All @@ -277,7 +294,7 @@ function make_config() {
if [[ "$OPT_MULTIPORT" == "true" ]]; then
log "dpservice configured in multiport-eswitch mode"
if [[ "$OPT_PF1_PROXY" == "true" ]]; then
log "dpservice will create a PF1-proxy"
log "dpservice will create a pf1-proxy"
fi
else
log "dpservice configured in normal mode"
Expand Down
3 changes: 3 additions & 0 deletions include/dp_conf_opts.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ const char *dp_conf_get_pf1_name(void);
#ifdef ENABLE_PF1_PROXY
const char *dp_conf_get_pf1_proxy(void);
#endif
#ifdef ENABLE_PF1_PROXY
const char *dp_conf_get_pf1_proxy_vf(void);
#endif
const char *dp_conf_get_vf_pattern(void);
int dp_conf_get_dhcp_mtu(void);
int dp_conf_get_wcmp_perc(void);
Expand Down
1 change: 1 addition & 0 deletions include/dp_log.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ extern "C" {
#define DP_LOG_IFNAME(VALUE) _DP_LOG_STR("interface_name", VALUE)
#define DP_LOG_LCORE(VALUE) _DP_LOG_UINT("lcore_id", VALUE)
#define DP_LOG_RTE_GROUP(VALUE) _DP_LOG_UINT("rte_group", VALUE)
#define DP_LOG_LINKSTATE(VALUE) _DP_LOG_STR("link_state", (VALUE) ? "up" : "down")
// networking stack
#define DP_LOG_IPV4(VALUE) _DP_LOG_IPV4("ipv4", VALUE)
#define DP_LOG_IPV6(VALUE) _DP_LOG_IPV6("ipv6", VALUE)
Expand Down
3 changes: 2 additions & 1 deletion include/dp_netlink.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#ifndef __INCLUDE_DP_NETLINK_H__
#define __INCLUDE_DP_NETLINK_H__

#include <stdint.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h>

Expand All @@ -25,7 +26,7 @@ struct dp_nlnk_req {
struct dp_nl_tlv if_tlv;
};

int dp_get_pf_neigh_mac(int if_idx, struct rte_ether_addr *neigh, const struct rte_ether_addr *own_mac);
int dp_get_pf_neigh_mac(uint32_t if_idx, struct rte_ether_addr *neigh, const struct rte_ether_addr *own_mac);

#ifdef __cplusplus
}
Expand Down
11 changes: 10 additions & 1 deletion include/dp_port.h
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@
#include <stdint.h>
#include <stdbool.h>
#include <net/if.h>
#include <rte_pci.h>
#include <rte_meter.h>
#include <rte_pci.h>
#include <rte_timer.h>
#include "dp_conf.h"
#include "dp_firewall.h"
#include "dp_internal_stats.h"
Expand Down Expand Up @@ -89,6 +90,7 @@ struct dp_port {
char dev_name[RTE_ETH_NAME_MAX_LEN];
uint8_t peer_pf_hairpin_tx_rx_queue_offset;
uint16_t peer_pf_port_id;
uint32_t if_index;
struct rte_ether_addr own_mac;
struct rte_ether_addr neigh_mac;
struct dp_port_iface iface;
Expand All @@ -106,6 +108,8 @@ struct dp_port {
struct rte_flow *default_flows[DP_PORT_ASYNC_FLOW_COUNT];
} default_async_rules;
};
struct rte_timer neighmac_timer;
uint8_t neighmac_period;
};

struct dp_ports {
Expand All @@ -129,11 +133,16 @@ void dp_ports_stop(void);
void dp_ports_free(void);

int dp_start_port(struct dp_port *port);
int dp_start_pf_port(uint16_t index);
#ifdef ENABLE_PF1_PROXY
int dp_start_pf1_proxy_port(void);
#endif
int dp_stop_port(struct dp_port *port);

void dp_start_acquiring_neigh_mac(struct dp_port *port);
void dp_stop_acquiring_neigh_mac(struct dp_port *port);
int dp_set_neigh_mac(uint16_t port_id, const struct rte_ether_addr *mac);

int dp_port_meter_config(struct dp_port *port, uint64_t total_flow_rate_cap, uint64_t public_flow_rate_cap);

static __rte_always_inline
Expand Down
5 changes: 4 additions & 1 deletion include/monitoring/dp_event.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ extern "C" {
#include <stdint.h>
#include <rte_mbuf.h>
#include <rte_ethdev.h>
#include <rte_ether.h>

int dp_link_status_change_event_callback(uint16_t port_id,
enum rte_eth_event_type type,
Expand All @@ -19,9 +20,11 @@ int dp_link_status_change_event_callback(uint16_t port_id,
void dp_process_event_link_msg(struct rte_mbuf *m);

int dp_send_event_flow_aging_msg(void);

void dp_process_event_flow_aging_msg(struct rte_mbuf *m);

int dp_send_event_neighmac_msg(uint16_t port_id, struct rte_ether_addr *neighmac);
void dp_process_event_neighmac_msg(struct rte_mbuf *m);

#ifdef __cplusplus
}
#endif
Expand Down
8 changes: 8 additions & 0 deletions include/monitoring/dp_monitoring.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#define __INCLUDE_DP_MONITORING_H__

#include <stdint.h>
#include <rte_ether.h>
#include <rte_mbuf.h>
#include "dp_ipaddr.h"

Expand All @@ -17,6 +18,7 @@ extern "C" {
enum dp_event_type {
DP_EVENT_TYPE_LINK_STATUS,
DP_EVENT_TYPE_FLOW_AGING,
DP_EVENT_TYPE_NEIGHMAC,
};

struct dp_event_msg_head {
Expand All @@ -28,10 +30,16 @@ struct dp_link_status {
uint8_t status;
};

struct dp_neighmac {
uint16_t port_id;
struct rte_ether_addr mac;
};

struct dp_event_msg {
struct dp_event_msg_head msg_head;
union {
struct dp_link_status link_status;
struct dp_neighmac neighmac;
} event_entry;
};

Expand Down
23 changes: 23 additions & 0 deletions src/dp_conf_opts.c
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ _OPT_SHOPT_MAX = 255,
OPT_PF1,
#ifdef ENABLE_PF1_PROXY
OPT_PF1_PROXY,
#endif
#ifdef ENABLE_PF1_PROXY
OPT_PF1_PROXY_VF,
#endif
OPT_IPV6,
OPT_VF_PATTERN,
Expand Down Expand Up @@ -61,6 +64,9 @@ static const struct option dp_conf_longopts[] = {
{ "pf1", 1, 0, OPT_PF1 },
#ifdef ENABLE_PF1_PROXY
{ "pf1-proxy", 1, 0, OPT_PF1_PROXY },
#endif
#ifdef ENABLE_PF1_PROXY
{ "pf1-proxy-vf", 1, 0, OPT_PF1_PROXY_VF },
#endif
{ "ipv6", 1, 0, OPT_IPV6 },
{ "vf-pattern", 1, 0, OPT_VF_PATTERN },
Expand Down Expand Up @@ -114,6 +120,9 @@ static char pf1_name[IF_NAMESIZE];
#ifdef ENABLE_PF1_PROXY
static char pf1_proxy[IF_NAMESIZE];
#endif
#ifdef ENABLE_PF1_PROXY
static char pf1_proxy_vf[IF_NAMESIZE];
#endif
static char vf_pattern[IF_NAMESIZE];
static int dhcp_mtu = 1500;
static int wcmp_perc = 100;
Expand Down Expand Up @@ -149,6 +158,13 @@ const char *dp_conf_get_pf1_proxy(void)
return pf1_proxy;
}

#endif
#ifdef ENABLE_PF1_PROXY
const char *dp_conf_get_pf1_proxy_vf(void)
{
return pf1_proxy_vf;
}

#endif
const char *dp_conf_get_vf_pattern(void)
{
Expand Down Expand Up @@ -248,6 +264,9 @@ static inline void dp_argparse_help(const char *progname, FILE *outfile)
" --pf1=IFNAME second physical interface (e.g. eth1)\n"
#ifdef ENABLE_PF1_PROXY
" --pf1-proxy=IFNAME VF representor to use as a proxy for pf1 packets\n"
#endif
#ifdef ENABLE_PF1_PROXY
" --pf1-proxy-vf=IFNAME VF interface of the pf1-proxy VF representor\n"
#endif
" --ipv6=ADDR6 IPv6 underlay address\n"
" --vf-pattern=PATTERN virtual interface name pattern (e.g. 'eth1vf')\n"
Expand Down Expand Up @@ -290,6 +309,10 @@ static int dp_conf_parse_arg(int opt, const char *arg)
#ifdef ENABLE_PF1_PROXY
case OPT_PF1_PROXY:
return dp_argparse_string(arg, pf1_proxy, ARRAY_SIZE(pf1_proxy));
#endif
#ifdef ENABLE_PF1_PROXY
case OPT_PF1_PROXY_VF:
return dp_argparse_string(arg, pf1_proxy_vf, ARRAY_SIZE(pf1_proxy_vf));
#endif
case OPT_IPV6:
return dp_argparse_opt_ipv6(arg);
Expand Down
8 changes: 2 additions & 6 deletions src/dp_netlink.c
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ static int dp_recv_msg(struct sockaddr_nl sock_addr, int sock, char *buf, int bu
return (int)msg_len;
}

int dp_get_pf_neigh_mac(int if_idx, struct rte_ether_addr *neigh, const struct rte_ether_addr *own_mac)
int dp_get_pf_neigh_mac(uint32_t if_idx, struct rte_ether_addr *neigh, const struct rte_ether_addr *own_mac)
{
struct sockaddr_nl sa = {
.nl_family = AF_NETLINK,
Expand Down Expand Up @@ -119,11 +119,7 @@ int dp_get_pf_neigh_mac(int if_idx, struct rte_ether_addr *neigh, const struct r
goto cleanup;
}

// TODO this should be an error in production
if (DP_FAILED(dp_read_neigh((struct nlmsghdr *)reply, reply_len, neigh, own_mac)))
DPS_LOG_WARNING("No neighboring router found");

ret = DP_OK;
ret = dp_read_neigh((struct nlmsghdr *)reply, reply_len, neigh, own_mac);

cleanup:
close(sock);
Expand Down
Loading
Loading