Skip to content

Commit

Permalink
Merge branch 'bridge-neigh-suppression'
Browse files Browse the repository at this point in the history
Ido Schimmel says:

====================
bridge: Add per-{Port, VLAN} neighbor suppression

Background
==========

In order to minimize the flooding of ARP and ND messages in the VXLAN
network, EVPN includes provisions [1] that allow participating VTEPs to
suppress such messages in case they know the MAC-IP binding and can
reply on behalf of the remote host. In Linux, the above is implemented
in the bridge driver using a per-port option called "neigh_suppress"
that was added in kernel version 4.15 [2].

Motivation
==========

Some applications use ARP messages as keepalives between the application
nodes in the network. This works perfectly well when two nodes are
connected to the same VTEP. When a node goes down it will stop
responding to ARP requests and the other node will notice it
immediately.

However, when the two nodes are connected to different VTEPs and
neighbor suppression is enabled, the local VTEP will reply to ARP
requests even after the remote node went down, until certain timers
expire and the EVPN control plane decides to withdraw the MAC/IP
Advertisement route for the address. Therefore, some users would like to
be able to disable neighbor suppression on VLANs where such applications
reside and keep it enabled on the rest.

Implementation
==============

The proposed solution is to allow user space to control neighbor
suppression on a per-{Port, VLAN} basis, in a similar fashion to other
per-port options that gained per-{Port, VLAN} counterparts such as
"mcast_router". This allows users to benefit from the operational
simplicity and scalability associated with shared VXLAN devices (i.e.,
external / collect-metadata mode), while still allowing for per-VLAN/VNI
neighbor suppression control.

The user interface is extended with a new "neigh_vlan_suppress" bridge
port option that allows user space to enable per-{Port, VLAN} neighbor
suppression on the bridge port. When enabled, the existing
"neigh_suppress" option has no effect and neighbor suppression is
controlled using a new "neigh_suppress" VLAN option. Example usage:

 # bridge link set dev vxlan0 neigh_vlan_suppress on
 # bridge vlan add vid 10 dev vxlan0
 # bridge vlan set vid 10 dev vxlan0 neigh_suppress on

Testing
=======

Tested using existing bridge selftests. Added a dedicated selftest in
the last patch.

Patchset overview
=================

Patches #1-#5 are preparations.

Patch #6 adds per-{Port, VLAN} neighbor suppression support to the
bridge's data path.

Patches #7-#8 add the required netlink attributes to enable the feature.

Patch #9 adds a selftest.

iproute2 patches can be found here [3].

Changelog
=========

Since RFC [4]:

No changes.

[1] https://www.rfc-editor.org/rfc/rfc7432#section-10
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a42317785c898c0ed46db45a33b0cc71b671bf29
[3] https://github.com/idosch/iproute2/tree/submit/neigh_suppress_v1
[4] https://lore.kernel.org/netdev/[email protected]/
====================

Signed-off-by: David S. Miller <[email protected]>
  • Loading branch information
davem330 committed Apr 21, 2023
2 parents 1cf3fe1 + 7648ac7 commit 25c800b
Show file tree
Hide file tree
Showing 15 changed files with 936 additions and 19 deletions.
1 change: 1 addition & 0 deletions include/linux/if_bridge.h
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ struct br_ip_list {
#define BR_TX_FWD_OFFLOAD BIT(20)
#define BR_PORT_LOCKED BIT(21)
#define BR_PORT_MAB BIT(22)
#define BR_NEIGH_VLAN_SUPPRESS BIT(23)

#define BR_DEFAULT_AGEING_TIME (300 * HZ)

Expand Down
1 change: 1 addition & 0 deletions include/uapi/linux/if_bridge.h
Original file line number Diff line number Diff line change
Expand Up @@ -525,6 +525,7 @@ enum {
BRIDGE_VLANDB_ENTRY_MCAST_ROUTER,
BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS,
BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS,
BRIDGE_VLANDB_ENTRY_NEIGH_SUPPRESS,
__BRIDGE_VLANDB_ENTRY_MAX,
};
#define BRIDGE_VLANDB_ENTRY_MAX (__BRIDGE_VLANDB_ENTRY_MAX - 1)
Expand Down
1 change: 1 addition & 0 deletions include/uapi/linux/if_link.h
Original file line number Diff line number Diff line change
Expand Up @@ -569,6 +569,7 @@ enum {
IFLA_BRPORT_MAB,
IFLA_BRPORT_MCAST_N_GROUPS,
IFLA_BRPORT_MCAST_MAX_GROUPS,
IFLA_BRPORT_NEIGH_VLAN_SUPPRESS,
__IFLA_BRPORT_MAX
};
#define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
Expand Down
33 changes: 27 additions & 6 deletions net/bridge/br_arp_nd_proxy.c
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ void br_recalculate_neigh_suppress_enabled(struct net_bridge *br)
bool neigh_suppress = false;

list_for_each_entry(p, &br->port_list, list) {
if (p->flags & BR_NEIGH_SUPPRESS) {
if (p->flags & (BR_NEIGH_SUPPRESS | BR_NEIGH_VLAN_SUPPRESS)) {
neigh_suppress = true;
break;
}
Expand Down Expand Up @@ -158,7 +158,7 @@ void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br,
return;

if (br_opt_get(br, BROPT_NEIGH_SUPPRESS_ENABLED)) {
if (p && (p->flags & BR_NEIGH_SUPPRESS))
if (br_is_neigh_suppress_enabled(p, vid))
return;
if (parp->ar_op != htons(ARPOP_RREQUEST) &&
parp->ar_op != htons(ARPOP_RREPLY) &&
Expand Down Expand Up @@ -202,8 +202,8 @@ void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br,
bool replied = false;

if ((p && (p->flags & BR_PROXYARP)) ||
(f->dst && (f->dst->flags & (BR_PROXYARP_WIFI |
BR_NEIGH_SUPPRESS)))) {
(f->dst && (f->dst->flags & BR_PROXYARP_WIFI)) ||
br_is_neigh_suppress_enabled(f->dst, vid)) {
if (!vid)
br_arp_send(br, p, skb->dev, sip, tip,
sha, n->ha, sha, 0, 0);
Expand Down Expand Up @@ -407,7 +407,7 @@ void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br,

BR_INPUT_SKB_CB(skb)->proxyarp_replied = 0;

if (p && (p->flags & BR_NEIGH_SUPPRESS))
if (br_is_neigh_suppress_enabled(p, vid))
return;

if (msg->icmph.icmp6_type == NDISC_NEIGHBOUR_ADVERTISEMENT &&
Expand Down Expand Up @@ -461,7 +461,7 @@ void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br,
if (f) {
bool replied = false;

if (f->dst && (f->dst->flags & BR_NEIGH_SUPPRESS)) {
if (br_is_neigh_suppress_enabled(f->dst, vid)) {
if (vid != 0)
br_nd_send(br, p, skb, n,
skb->vlan_proto,
Expand All @@ -483,3 +483,24 @@ void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br,
}
}
#endif

bool br_is_neigh_suppress_enabled(const struct net_bridge_port *p, u16 vid)
{
if (!p)
return false;

if (!vid)
return !!(p->flags & BR_NEIGH_SUPPRESS);

if (p->flags & BR_NEIGH_VLAN_SUPPRESS) {
struct net_bridge_vlan_group *vg = nbp_vlan_group_rcu(p);
struct net_bridge_vlan *v;

v = br_vlan_find(vg, vid);
if (!v)
return false;
return !!(v->priv_flags & BR_VLFLAG_NEIGH_SUPPRESS_ENABLED);
} else {
return !!(p->flags & BR_NEIGH_SUPPRESS);
}
}
8 changes: 4 additions & 4 deletions net/bridge/br_device.c
Original file line number Diff line number Diff line change
Expand Up @@ -80,10 +80,10 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)

dest = eth_hdr(skb)->h_dest;
if (is_broadcast_ether_addr(dest)) {
br_flood(br, skb, BR_PKT_BROADCAST, false, true);
br_flood(br, skb, BR_PKT_BROADCAST, false, true, vid);
} else if (is_multicast_ether_addr(dest)) {
if (unlikely(netpoll_tx_running(dev))) {
br_flood(br, skb, BR_PKT_MULTICAST, false, true);
br_flood(br, skb, BR_PKT_MULTICAST, false, true, vid);
goto out;
}
if (br_multicast_rcv(&brmctx, &pmctx_null, vlan, skb, vid)) {
Expand All @@ -96,11 +96,11 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
br_multicast_querier_exists(brmctx, eth_hdr(skb), mdst))
br_multicast_flood(mdst, skb, brmctx, false, true);
else
br_flood(br, skb, BR_PKT_MULTICAST, false, true);
br_flood(br, skb, BR_PKT_MULTICAST, false, true, vid);
} else if ((dst = br_fdb_find_rcu(br, dest, vid)) != NULL) {
br_forward(dst->dst, skb, false, true);
} else {
br_flood(br, skb, BR_PKT_UNICAST, false, true);
br_flood(br, skb, BR_PKT_UNICAST, false, true, vid);
}
out:
rcu_read_unlock();
Expand Down
8 changes: 5 additions & 3 deletions net/bridge/br_forward.c
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,8 @@ static struct net_bridge_port *maybe_deliver(

/* called under rcu_read_lock */
void br_flood(struct net_bridge *br, struct sk_buff *skb,
enum br_pkt_type pkt_type, bool local_rcv, bool local_orig)
enum br_pkt_type pkt_type, bool local_rcv, bool local_orig,
u16 vid)
{
struct net_bridge_port *prev = NULL;
struct net_bridge_port *p;
Expand All @@ -224,8 +225,9 @@ void br_flood(struct net_bridge *br, struct sk_buff *skb,
/* Do not flood to ports that enable proxy ARP */
if (p->flags & BR_PROXYARP)
continue;
if ((p->flags & (BR_PROXYARP_WIFI | BR_NEIGH_SUPPRESS)) &&
BR_INPUT_SKB_CB(skb)->proxyarp_replied)
if (BR_INPUT_SKB_CB(skb)->proxyarp_replied &&
((p->flags & BR_PROXYARP_WIFI) ||
br_is_neigh_suppress_enabled(p, vid)))
continue;

prev = maybe_deliver(prev, p, skb, local_orig);
Expand Down
2 changes: 1 addition & 1 deletion net/bridge/br_if.c
Original file line number Diff line number Diff line change
Expand Up @@ -759,7 +759,7 @@ void br_port_flags_change(struct net_bridge_port *p, unsigned long mask)
if (mask & BR_AUTO_MASK)
nbp_update_port_count(br);

if (mask & BR_NEIGH_SUPPRESS)
if (mask & (BR_NEIGH_SUPPRESS | BR_NEIGH_VLAN_SUPPRESS))
br_recalculate_neigh_suppress_enabled(br);
}

Expand Down
2 changes: 1 addition & 1 deletion net/bridge/br_input.c
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
br_forward(dst->dst, skb, local_rcv, false);
} else {
if (!mcast_hit)
br_flood(br, skb, pkt_type, local_rcv, false);
br_flood(br, skb, pkt_type, local_rcv, false, vid);
else
br_multicast_flood(mdst, skb, brmctx, local_rcv, false);
}
Expand Down
8 changes: 7 additions & 1 deletion net/bridge/br_netlink.c
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,7 @@ static inline size_t br_port_info_size(void)
+ nla_total_size(1) /* IFLA_BRPORT_ISOLATED */
+ nla_total_size(1) /* IFLA_BRPORT_LOCKED */
+ nla_total_size(1) /* IFLA_BRPORT_MAB */
+ nla_total_size(1) /* IFLA_BRPORT_NEIGH_VLAN_SUPPRESS */
+ nla_total_size(sizeof(struct ifla_bridge_id)) /* IFLA_BRPORT_ROOT_ID */
+ nla_total_size(sizeof(struct ifla_bridge_id)) /* IFLA_BRPORT_BRIDGE_ID */
+ nla_total_size(sizeof(u16)) /* IFLA_BRPORT_DESIGNATED_PORT */
Expand Down Expand Up @@ -278,7 +279,9 @@ static int br_port_fill_attrs(struct sk_buff *skb,
!!(p->flags & BR_MRP_LOST_IN_CONT)) ||
nla_put_u8(skb, IFLA_BRPORT_ISOLATED, !!(p->flags & BR_ISOLATED)) ||
nla_put_u8(skb, IFLA_BRPORT_LOCKED, !!(p->flags & BR_PORT_LOCKED)) ||
nla_put_u8(skb, IFLA_BRPORT_MAB, !!(p->flags & BR_PORT_MAB)))
nla_put_u8(skb, IFLA_BRPORT_MAB, !!(p->flags & BR_PORT_MAB)) ||
nla_put_u8(skb, IFLA_BRPORT_NEIGH_VLAN_SUPPRESS,
!!(p->flags & BR_NEIGH_VLAN_SUPPRESS)))
return -EMSGSIZE;

timerval = br_timer_value(&p->message_age_timer);
Expand Down Expand Up @@ -891,6 +894,7 @@ static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
[IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT] = { .type = NLA_U32 },
[IFLA_BRPORT_MCAST_N_GROUPS] = { .type = NLA_REJECT },
[IFLA_BRPORT_MCAST_MAX_GROUPS] = { .type = NLA_U32 },
[IFLA_BRPORT_NEIGH_VLAN_SUPPRESS] = NLA_POLICY_MAX(NLA_U8, 1),
};

/* Change the state of the port and notify spanning tree */
Expand Down Expand Up @@ -957,6 +961,8 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[],
br_set_port_flag(p, tb, IFLA_BRPORT_ISOLATED, BR_ISOLATED);
br_set_port_flag(p, tb, IFLA_BRPORT_LOCKED, BR_PORT_LOCKED);
br_set_port_flag(p, tb, IFLA_BRPORT_MAB, BR_PORT_MAB);
br_set_port_flag(p, tb, IFLA_BRPORT_NEIGH_VLAN_SUPPRESS,
BR_NEIGH_VLAN_SUPPRESS);

if ((p->flags & BR_PORT_MAB) &&
(!(p->flags & BR_PORT_LOCKED) || !(p->flags & BR_LEARNING))) {
Expand Down
5 changes: 4 additions & 1 deletion net/bridge/br_private.h
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,7 @@ enum {
BR_VLFLAG_ADDED_BY_SWITCHDEV = BIT(1),
BR_VLFLAG_MCAST_ENABLED = BIT(2),
BR_VLFLAG_GLOBAL_MCAST_ENABLED = BIT(3),
BR_VLFLAG_NEIGH_SUPPRESS_ENABLED = BIT(4),
};

/**
Expand Down Expand Up @@ -849,7 +850,8 @@ void br_forward(const struct net_bridge_port *to, struct sk_buff *skb,
bool local_rcv, bool local_orig);
int br_forward_finish(struct net *net, struct sock *sk, struct sk_buff *skb);
void br_flood(struct net_bridge *br, struct sk_buff *skb,
enum br_pkt_type pkt_type, bool local_rcv, bool local_orig);
enum br_pkt_type pkt_type, bool local_rcv, bool local_orig,
u16 vid);

/* return true if both source port and dest port are isolated */
static inline bool br_skb_isolated(const struct net_bridge_port *to,
Expand Down Expand Up @@ -2218,4 +2220,5 @@ void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br,
void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br,
u16 vid, struct net_bridge_port *p, struct nd_msg *msg);
struct nd_msg *br_is_nd_neigh_msg(struct sk_buff *skb, struct nd_msg *m);
bool br_is_neigh_suppress_enabled(const struct net_bridge_port *p, u16 vid);
#endif
1 change: 1 addition & 0 deletions net/bridge/br_vlan.c
Original file line number Diff line number Diff line change
Expand Up @@ -2134,6 +2134,7 @@ static const struct nla_policy br_vlan_db_policy[BRIDGE_VLANDB_ENTRY_MAX + 1] =
[BRIDGE_VLANDB_ENTRY_MCAST_ROUTER] = { .type = NLA_U8 },
[BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS] = { .type = NLA_REJECT },
[BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS] = { .type = NLA_U32 },
[BRIDGE_VLANDB_ENTRY_NEIGH_SUPPRESS] = NLA_POLICY_MAX(NLA_U8, 1),
};

static int br_vlan_rtm_process_one(struct net_device *dev,
Expand Down
20 changes: 19 additions & 1 deletion net/bridge/br_vlan_options.c
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,9 @@ bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v,
const struct net_bridge_port *p)
{
if (nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_STATE, br_vlan_get_state(v)) ||
!__vlan_tun_put(skb, v))
!__vlan_tun_put(skb, v) ||
nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_NEIGH_SUPPRESS,
!!(v->priv_flags & BR_VLFLAG_NEIGH_SUPPRESS_ENABLED)))
return false;

#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
Expand Down Expand Up @@ -80,6 +82,7 @@ size_t br_vlan_opts_nl_size(void)
+ nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS */
+ nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS */
#endif
+ nla_total_size(sizeof(u8)) /* BRIDGE_VLANDB_ENTRY_NEIGH_SUPPRESS */
+ 0;
}

Expand Down Expand Up @@ -239,6 +242,21 @@ static int br_vlan_process_one_opts(const struct net_bridge *br,
}
#endif

if (tb[BRIDGE_VLANDB_ENTRY_NEIGH_SUPPRESS]) {
bool enabled = v->priv_flags & BR_VLFLAG_NEIGH_SUPPRESS_ENABLED;
bool val = nla_get_u8(tb[BRIDGE_VLANDB_ENTRY_NEIGH_SUPPRESS]);

if (!p) {
NL_SET_ERR_MSG_MOD(extack, "Can't set neigh_suppress for non-port vlans");
return -EINVAL;
}

if (val != enabled) {
v->priv_flags ^= BR_VLFLAG_NEIGH_SUPPRESS_ENABLED;
*changed = true;
}
}

return 0;
}

Expand Down
2 changes: 1 addition & 1 deletion net/core/rtnetlink.c
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
#include "dev.h"

#define RTNL_MAX_TYPE 50
#define RTNL_SLAVE_MAX_TYPE 42
#define RTNL_SLAVE_MAX_TYPE 43

struct rtnl_link {
rtnl_doit_func doit;
Expand Down
1 change: 1 addition & 0 deletions tools/testing/selftests/net/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ TEST_GEN_FILES += nat6to4.o
TEST_GEN_FILES += ip_local_port_range
TEST_GEN_FILES += bind_wildcard
TEST_PROGS += test_vxlan_mdb.sh
TEST_PROGS += test_bridge_neigh_suppress.sh

TEST_FILES := settings

Expand Down
Loading

0 comments on commit 25c800b

Please sign in to comment.