Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acquire neighboring router MAC dynamically #619

Merged
merged 10 commits into from
Nov 5, 2024
1 change: 1 addition & 0 deletions docs/deployment/help_dpservice-bin.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
| --pf0 | IFNAME | first physical interface (e.g. eth0) | |
| --pf1 | IFNAME | second physical interface (e.g. eth1) | |
| --pf1-proxy | IFNAME | VF representor to use as a proxy for pf1 packets | |
| --pf1-proxy-vf | IFNAME | VF interface of the pf1-proxy VF representor | |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have some combinations which do not make sense with the newly introduced flags ?
Like, if I set pf1-proxy then I need to set pf1-proxy-vf as well ? and If I set multiport-eswitch, I would needpf1-proxy-* switches and if they are not set then I can not operate meaningfully ?

If so, can we document these dependencies ? and enforce during command line argument parsing with meaningful hints returned ?

Also in the documentation, maybe an example dpservice-bin command including all the command line parameters for a simple mpesw operation ?
Or maybe I overlooked it in the documentation ? Otherwise it is not so clear now, how to make mpesw work and which parameters are needed to make it work. At least to me.

These command line arguments are supposed to be generated by prepare.sh right ? but still showcasing in an example which parameters needed for dpservice-bin to operate successfully in mpesw mode, would be helpful and how (with which parameters) am I supposed to call prepare.sh if I want to operate in mpesw mode ?

| --ipv6 | ADDR6 | IPv6 underlay address | |
| --vf-pattern | PATTERN | virtual interface name pattern (e.g. 'eth1vf') | |
| --dhcp-mtu | SIZE | set the mtu field in DHCP responses (68 - 1500) | |
Expand Down
9 changes: 9 additions & 0 deletions hack/dp_conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,15 @@
"array_size": "IF_NAMESIZE",
"ifdef": "ENABLE_PF1_PROXY"
},
{
"lgopt": "pf1-proxy-vf",
"arg": "IFNAME",
"help": "VF interface of the pf1-proxy VF representor",
"var": "pf1_proxy_vf",
"type": "char",
"array_size": "IF_NAMESIZE",
"ifdef": "ENABLE_PF1_PROXY"
},
{
"lgopt": "ipv6",
"arg": "ADDR6",
Expand Down
1 change: 1 addition & 0 deletions hack/prepare.sh
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,7 @@ function make_config() {
echo "a-pf0 ${devs[0]},class=rxq_cqe_comp_en=0,rx_vec_en=1,dv_flow_en=2,dv_esw_en=1,fdb_def_rule_en=1,representor=pf[0-1]vf[0-$[$actualvfs-1]]"
if [[ "$OPT_PF1_PROXY" == "true" ]]; then
echo "pf1-proxy $(get_pf1_proxy ${devs[1]})"
echo "pf1-proxy-vf $(get_pf1_proxy_vf)"
fi
echo "multiport-eswitch"
else
Expand Down
3 changes: 3 additions & 0 deletions include/dp_conf_opts.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ const char *dp_conf_get_pf1_name(void);
#ifdef ENABLE_PF1_PROXY
const char *dp_conf_get_pf1_proxy(void);
#endif
#ifdef ENABLE_PF1_PROXY
const char *dp_conf_get_pf1_proxy_vf(void);
#endif
const char *dp_conf_get_vf_pattern(void);
int dp_conf_get_dhcp_mtu(void);
int dp_conf_get_wcmp_perc(void);
Expand Down
3 changes: 2 additions & 1 deletion include/dp_netlink.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#ifndef __INCLUDE_DP_NETLINK_H__
#define __INCLUDE_DP_NETLINK_H__

#include <stdint.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h>

Expand All @@ -25,7 +26,7 @@ struct dp_nlnk_req {
struct dp_nl_tlv if_tlv;
};

int dp_get_pf_neigh_mac(int if_idx, struct rte_ether_addr *neigh, const struct rte_ether_addr *own_mac);
int dp_get_pf_neigh_mac(uint32_t if_idx, struct rte_ether_addr *neigh, const struct rte_ether_addr *own_mac);

#ifdef __cplusplus
}
Expand Down
3 changes: 3 additions & 0 deletions include/dp_port.h
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ struct dp_port {
char dev_name[RTE_ETH_NAME_MAX_LEN];
uint8_t peer_pf_hairpin_tx_rx_queue_offset;
uint16_t peer_pf_port_id;
uint32_t if_index;
struct rte_ether_addr own_mac;
struct rte_ether_addr neigh_mac;
struct dp_port_iface iface;
Expand Down Expand Up @@ -135,6 +136,8 @@ int dp_start_pf1_proxy_port(void);
#endif
int dp_stop_port(struct dp_port *port);

int dp_acquire_neigh_mac(struct dp_port *port);

int dp_port_meter_config(struct dp_port *port, uint64_t total_flow_rate_cap, uint64_t public_flow_rate_cap);

static __rte_always_inline
Expand Down
23 changes: 23 additions & 0 deletions src/dp_conf_opts.c
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ _OPT_SHOPT_MAX = 255,
OPT_PF1,
#ifdef ENABLE_PF1_PROXY
OPT_PF1_PROXY,
#endif
#ifdef ENABLE_PF1_PROXY
OPT_PF1_PROXY_VF,
#endif
OPT_IPV6,
OPT_VF_PATTERN,
Expand Down Expand Up @@ -61,6 +64,9 @@ static const struct option dp_conf_longopts[] = {
{ "pf1", 1, 0, OPT_PF1 },
#ifdef ENABLE_PF1_PROXY
{ "pf1-proxy", 1, 0, OPT_PF1_PROXY },
#endif
#ifdef ENABLE_PF1_PROXY
{ "pf1-proxy-vf", 1, 0, OPT_PF1_PROXY_VF },
#endif
{ "ipv6", 1, 0, OPT_IPV6 },
{ "vf-pattern", 1, 0, OPT_VF_PATTERN },
Expand Down Expand Up @@ -114,6 +120,9 @@ static char pf1_name[IF_NAMESIZE];
#ifdef ENABLE_PF1_PROXY
static char pf1_proxy[IF_NAMESIZE];
#endif
#ifdef ENABLE_PF1_PROXY
static char pf1_proxy_vf[IF_NAMESIZE];
#endif
static char vf_pattern[IF_NAMESIZE];
static int dhcp_mtu = 1500;
static int wcmp_perc = 100;
Expand Down Expand Up @@ -149,6 +158,13 @@ const char *dp_conf_get_pf1_proxy(void)
return pf1_proxy;
}

#endif
#ifdef ENABLE_PF1_PROXY
const char *dp_conf_get_pf1_proxy_vf(void)
{
return pf1_proxy_vf;
}

#endif
const char *dp_conf_get_vf_pattern(void)
{
Expand Down Expand Up @@ -248,6 +264,9 @@ static inline void dp_argparse_help(const char *progname, FILE *outfile)
" --pf1=IFNAME second physical interface (e.g. eth1)\n"
#ifdef ENABLE_PF1_PROXY
" --pf1-proxy=IFNAME VF representor to use as a proxy for pf1 packets\n"
#endif
#ifdef ENABLE_PF1_PROXY
" --pf1-proxy-vf=IFNAME VF interface of the pf1-proxy VF representor\n"
#endif
" --ipv6=ADDR6 IPv6 underlay address\n"
" --vf-pattern=PATTERN virtual interface name pattern (e.g. 'eth1vf')\n"
Expand Down Expand Up @@ -290,6 +309,10 @@ static int dp_conf_parse_arg(int opt, const char *arg)
#ifdef ENABLE_PF1_PROXY
case OPT_PF1_PROXY:
return dp_argparse_string(arg, pf1_proxy, ARRAY_SIZE(pf1_proxy));
#endif
#ifdef ENABLE_PF1_PROXY
case OPT_PF1_PROXY_VF:
return dp_argparse_string(arg, pf1_proxy_vf, ARRAY_SIZE(pf1_proxy_vf));
#endif
case OPT_IPV6:
return dp_argparse_opt_ipv6(arg);
Expand Down
8 changes: 2 additions & 6 deletions src/dp_netlink.c
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ static int dp_recv_msg(struct sockaddr_nl sock_addr, int sock, char *buf, int bu
return (int)msg_len;
}

int dp_get_pf_neigh_mac(int if_idx, struct rte_ether_addr *neigh, const struct rte_ether_addr *own_mac)
int dp_get_pf_neigh_mac(uint32_t if_idx, struct rte_ether_addr *neigh, const struct rte_ether_addr *own_mac)
{
struct sockaddr_nl sa = {
.nl_family = AF_NETLINK,
Expand Down Expand Up @@ -119,11 +119,7 @@ int dp_get_pf_neigh_mac(int if_idx, struct rte_ether_addr *neigh, const struct r
goto cleanup;
}

// TODO this should be an error in production
if (DP_FAILED(dp_read_neigh((struct nlmsghdr *)reply, reply_len, neigh, own_mac)))
DPS_LOG_WARNING("No neighboring router found");

ret = DP_OK;
ret = dp_read_neigh((struct nlmsghdr *)reply, reply_len, neigh, own_mac);

cleanup:
close(sock);
Expand Down
76 changes: 53 additions & 23 deletions src/dp_port.c
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
// SPDX-FileCopyrightText: 2023 SAP SE or an SAP affiliate company and IronCore contributors
// SPDX-License-Identifier: Apache-2.0

#include "dp_error.h"
#include "dp_port.h"
#include <rte_bus_pci.h>
#include "dp_conf.h"
#include "dp_error.h"
#include "dp_hairpin.h"
#include "dp_log.h"
#include "dp_lpm.h"
#include "dp_netlink.h"
#include "dp_port.h"
#ifdef ENABLE_VIRTSVC
# include "dp_virtsvc.h"
#endif
Expand Down Expand Up @@ -91,20 +91,9 @@ struct dp_port *dp_get_port_by_name(const char *pci_name)
return _dp_port_table[port_id];
}

static void dp_set_neighmac(struct dp_port *port, const struct rte_ether_addr *mac)
{
char strmac[18];

rte_ether_addr_copy(mac, &port->neigh_mac);

snprintf(strmac, sizeof(strmac), RTE_ETHER_ADDR_PRT_FMT, RTE_ETHER_ADDR_BYTES(&port->neigh_mac));
DPS_LOG_INFO("Setting neighboring MAC", _DP_LOG_STR("mac", strmac), DP_LOG_PORT(port));
}

static int dp_port_init_ethdev(struct dp_port *port, struct rte_eth_dev_info *dev_info)
{
struct dp_dpdk_layer *dp_layer = get_dpdk_layer();
struct rte_ether_addr pf_neigh_mac = {0};
struct rte_eth_txconf txq_conf;
struct rte_eth_rxconf rxq_conf;
struct rte_eth_conf port_conf = port_conf_default;
Expand Down Expand Up @@ -182,16 +171,6 @@ static int dp_port_init_ethdev(struct dp_port *port, struct rte_eth_dev_info *de
static_assert(sizeof(port->dev_name) == RTE_ETH_NAME_MAX_LEN, "Incompatible port dev_name size");
rte_eth_dev_get_name_by_port(port->port_id, port->dev_name);

if (port->is_pf) {
if (DP_FAILED(dp_get_pf_neigh_mac(dev_info->if_index, &pf_neigh_mac, &port->own_mac)))
return DP_ERROR;
dp_set_neighmac(port, &pf_neigh_mac);
}
#ifdef ENABLE_PF1_PROXY
else if (dp_conf_is_pf1_proxy_enabled() && port == dp_get_pf1_proxy())
dp_set_neighmac(port, &dp_get_pf1()->neigh_mac);
#endif

if (dp_conf_is_multiport_eswitch() && DP_FAILED(dp_configure_async_flows(port->port_id)))
return DP_ERROR;

Expand Down Expand Up @@ -260,6 +239,7 @@ static struct dp_port *dp_port_init_interface(uint16_t port_id, struct rte_eth_d
port->is_pf = is_pf;
port->port_id = port_id;
port->socket_id = socket_id;
port->if_index = dev_info->if_index;
_dp_port_table[port_id] = port;

if (is_pf && DP_FAILED(dp_port_register_pf(port)))
Expand Down Expand Up @@ -294,16 +274,24 @@ static struct dp_port *dp_port_init_interface(uint16_t port_id, struct rte_eth_d
static struct dp_port *dp_port_init_pf1_proxy_interface(uint16_t port_id, struct rte_eth_dev_info *dev_info)
{
struct dp_port *port;
uint32_t if_index;
int socket_id;

socket_id = dp_get_port_socket_id(port_id);
if (DP_FAILED(socket_id) && socket_id != SOCKET_ID_ANY)
return NULL;

if_index = if_nametoindex(dp_conf_get_pf1_proxy_vf());
if (if_index == 0) {
DPS_LOG_ERR("Cannot get pf1-proxy vf interface index", DP_LOG_IFACE(dp_conf_get_pf1_proxy_vf()));
return NULL;
}

port = &_dp_pf1_proxy_port;
port->is_pf = false;
port->port_id = port_id;
port->socket_id = socket_id;
port->if_index = if_index;
_dp_port_table[port_id] = port;

if (DP_FAILED(dp_port_init_ethdev(port, dev_info)))
Expand Down Expand Up @@ -661,6 +649,14 @@ int dp_start_pf_port(uint16_t index)
return DP_ERROR;

DPS_LOG_INFO("Received initial PF link state", DP_LOG_LINKSTATE(port->link_status), DP_LOG_PORT(port));

if (port->link_status == RTE_ETH_LINK_UP)
#ifdef ENABLE_PF1_PROXY
// Do not use PF1 in pf1-proxy mode as Linux does not use it then (thus the mac will never be there)
if (!dp_conf_is_pf1_proxy_enabled() || port != dp_get_pf1())
#endif
dp_acquire_neigh_mac(port);

return DP_OK;
}

Expand All @@ -675,6 +671,9 @@ int dp_start_pf1_proxy_port(void)
return ret;
}

if (dp_get_pf1()->link_status == RTE_ETH_LINK_UP)
dp_acquire_neigh_mac(&_dp_pf1_proxy_port);

_dp_pf1_proxy_port.allocated = true;
return DP_OK;
}
Expand All @@ -692,6 +691,37 @@ int dp_stop_port(struct dp_port *port)
return DP_OK;
}


static void dp_set_neighmac(struct dp_port *port, const struct rte_ether_addr *mac)
{
char strmac[18];

rte_ether_addr_copy(mac, &port->neigh_mac);

snprintf(strmac, sizeof(strmac), RTE_ETHER_ADDR_PRT_FMT, RTE_ETHER_ADDR_BYTES(&port->neigh_mac));
DPS_LOG_INFO("Setting neighboring MAC", _DP_LOG_STR("mac", strmac), DP_LOG_PORT(port));
}

int dp_acquire_neigh_mac(struct dp_port *port)
{
struct rte_ether_addr pf_neigh_mac = {0};

#ifdef ENABLE_PF1_PROXY
// The pf1-proxy VF is actually the interface that Linux uses (thus has the right value)
if (dp_conf_is_pf1_proxy_enabled() && port == dp_get_pf1())
port = &_dp_pf1_proxy_port;
#endif

if (DP_FAILED(dp_get_pf_neigh_mac(port->if_index, &pf_neigh_mac, &port->own_mac))) {
DPS_LOG_WARNING("No PF neighboring router", DP_LOG_PORT(port));
return DP_ERROR;
}

dp_set_neighmac(port, &pf_neigh_mac);
return DP_OK;
}


static int dp_port_total_flow_meter_config(struct dp_port *port, uint64_t total_flow_rate_cap)
{
return dp_set_vf_rate_limit(port->port_id, total_flow_rate_cap);
Expand Down
3 changes: 3 additions & 0 deletions src/monitoring/dp_event.c
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,9 @@ void dp_process_event_link_msg(struct rte_mbuf *m)

port->link_status = status;
DPS_LOG_INFO("PF link state changed", DP_LOG_LINKSTATE(port->link_status), DP_LOG_PORT(port));

if (status == RTE_ETH_LINK_UP)
Copy link
Collaborator

@guvenc guvenc Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move this part to dp_link_status_change_event_callback. For me this looks like an unnecessary round trip from worker thread back to main thread by calling these functions here and the worker thread directly calls the netlink call which is not so nice as the worker thread should minimize the work which doesnt include packet processing.
Another problem with this approach is that worker thread now will enqueue to the monitoring queue which is actually supposed to be enqueued only by the main thread.
This approach will worsen the problem we already have in the code. monitoring_queue is being enqueued by main thread and by the interrupt thread (dp_link_status_change_event_callback) at the moment which is already a problem. Now we bring a third thread (worker thread) which will enqueue to the monitoring queue and monitoring queue is created as single producer, single consumer actually.

So in a nutshell:

  • During init of the PF ports we can arm the single call timer run on main thread which will call netlink, re-arm if netlink fails, send mac_neigh event with mac address if netlink doesnt fail. This is already done this way.
  • And in dp_link_status_change_event_callback we can arm the single call timer run on main thread if the link state is reported as up and let the timer callback call netlink, re-arm if netlink fails, send mac_neigh event with mac address if netlink doesnt fail. So that we ensure the enqueue of mac_neigh event happens in main thread context.
  • And also in dp_link_status_change_event_callback stop the the timer, if the link state gets reported as "down".

with this approach, we wouldnt at least worsen the situation in current code. Interrupt thread enqueuing together with main thread to monitoring queue can be addressed separately as it was not caused by this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the indicated code away from worker thread to the "interrupt" thread (the one where timer fires).

I think this should be enough, but if we only ever want to call the MAC retrieval code from one thread (the interrupt thread) and not the main thread, it would require removing the ability to "get the MAC immediately" in the init phase, thus always waiting at least a second for it. Which I do not believe is needed as the real problem here was the worker doing unnecessary stuff, which is fixed now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that the change is sufficient for this PR.

dp_acquire_neigh_mac(port);
}

// Flow-aging message - sent periodically to age-out conntracked flows
Expand Down