Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature to perform pkt capturing on RX sides of interfaces #415

Merged
merged 14 commits into from
Nov 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions docs/deployment/capture_offloaded_rx_pkts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Feature: capture offloaded rx packets on interfaces
In offloaded mode, packets that are processed by hardware offload rules cannot be seen anymore even on the software path. To increase the visibility of this type of traffic flows, we use special rte flow rules to instrument packet processing on hardware to duplicate and capture these packets on interfaces.

## What can be achieved and what cannot
Through tedious and complex experiment, the following features are identified and thus currently supported:

1. Capture offloaded packets on the RX side of a VF (packets that are sent from VMs).
2. Capture offloaded packets on the RX side of PF0 (IPinIP packets that are transmitted to PF0 from the wire).

Due to the constraint of Mellanox HW or driver, the following features currently are not supported:

1. Capture offloaded packets on the TX side of interfaces.
2. Capture offloaded packets on the RX side of PF1. PF1 is currently not in switchdev mode, thus the used special rte flow rule does not work for it.
3. The configured UDP src port is not really respected by HW, and UDP dst port is respected instead.


## Capture and understand offloaded rx packets
Capturing must be started via dpservice-cli before the first packets of new flows on an interface. The target interfaces, especially VFs, need to be started first, and in total, 16 interfaces can be specified as part of the cmdline parameters. Again, as capturing on PF1 is currently not supported by HW, please only specify `--pf=0`.
PlagueCZ marked this conversation as resolved.
Show resolved Hide resolved


```
./bin/dpservice-cli capture start --sink-node-ip=<underlay IP of the hypervisor or a remote host> --udp-src-port=<selected port ID> --udp-dst-port=<selected port ID> --vf=<list of started interfaces> --pf=0
```

for example:
```
./bin/dpservice-cli capture start --sink-node-ip=abcd:efgh:1234:4321::1 --udp-src-port=3000 --udp-dst-port=3010 --vf=vm-1,vm-2 --pf=0
```

The captured packets will be transmitted back in an encapped format to the interface (via router) of your selected sink machine, either the hypervisor where dp-service is running or a remote host. These packets are visible on physical interfaces using a regular tcpdump tool. For example, these packets can be dumped to a pcap file using a command:

```
sudo tcpdump -ni any udp dst port 3010 -w test.pcap
```

The generated test.pcap file can be opened using Wireshark(graphic). As captured packets are encaped as UDP payload, this file can be firstly modified by removing the first 62 bytes of all packets.

```
editcap -C 62 -F pcap test.pcap test_no_udp.pcap
```

The resulted test_no_udp.pcap file can be recognized by wireshark.

The following command is used to stop capturing on all configured interfaces. Note that, to start capturing on a new set of interfaces, this stopping command has to be called first.
```
/bin/dpservice-cli capture stop
```

or before you start capturing, it is also recommended to check the operation status of this capturing feature by using:
```
/bin/dpservice-cli capture status
```
The returned values incude this feature's operation status, as well as the configuration information using the "capture start" subcommand.

## How offloaded packets are captured
Offloaded packets are captured by using special rte flow rules, especially the one that enables packet sampling on the RX side of an interface. The captured packets are encapsulated by prepending extra headers. Despite the fact that captured Ethernet frames are treated as UDP payload, it is flexible to use other customized headers as well. The format of encapsulation is as follows:

```
| Outer Ether header | Outer IPv6 header | UDP header | Captured Ether frame |
```

[Figure1](docs/sys_design/pkt_capture_flow_rules-VF.drawio.png) and [Figure2](docs/sys_design/pkt_capture_flow_rules-PF.drawio.png) illustrate the organization of flow rules for VF and PF. The differences between handling VF and PF are empirical.
PlagueCZ marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 0 additions & 2 deletions include/dp_cntrack.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,6 @@
extern "C" {
#endif

#define DP_IS_CAPTURED_HW_PKT 5

void dp_cntrack_init(void);

int dp_cntrack_handle(struct rte_mbuf *m, struct dp_flow *df);
Expand Down
6 changes: 6 additions & 0 deletions include/dp_error.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@ const char *dp_strerror_verbose(int error);
ERR(ITERATOR, 207) \
ERR(OUT_OF_MEMORY, 208) \
ERR(LIMIT_REACHED, 209) \
ERR(ALREADY_ACTIVE, 210) \
ERR(NOT_ACTIVE, 211) \
ERR(ROLLBACK, 212) \
ERR(RTE_RULE_ADD, 213) \
ERR(RTE_RULE_DEL, 214) \
/* Specific errors */ \
ERR(ROUTE_EXISTS, 301) \
ERR(ROUTE_NOT_FOUND, 302) \
Expand All @@ -62,6 +67,7 @@ const char *dp_strerror_verbose(int error);
ERR(NO_LB, 422) \
ERR(NO_DROP_SUPPORT, 441) \


#define _DP_GRPC_ERROR_ENUM(NAME, NUMBER) \
DP_GRPC_ERR_##NAME = _DP_GRPC_ERRCODES - NUMBER,
enum dp_grpc_error {
Expand Down
2 changes: 1 addition & 1 deletion include/dp_flow.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ extern "C" {
// arbitrary big number
#define FLOW_MAX 850000

#define DP_FLOW_VAL_AGE_CTX_CAPACITY 5
#define DP_FLOW_VAL_AGE_CTX_CAPACITY 6

#define DP_FLOW_DEFAULT_TIMEOUT 30 /* 30 seconds */
#define DP_FLOW_TCP_EXTENDED_TIMEOUT (60 * 60 * 24) /* 1 day */
Expand Down
2 changes: 2 additions & 0 deletions include/dp_log.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ extern "C" {
#define DP_LOG_IFNAME(VALUE) _DP_LOG_STR("interface_name", VALUE)
#define DP_LOG_LCORE(VALUE) _DP_LOG_UINT("lcore_id", VALUE)
#define DP_LOG_RTE_GROUP(VALUE) _DP_LOG_UINT("rte_group", VALUE)
#define DP_LOG_PORT_TYPE(VALUE) _DP_LOG_UINT("port_type", VALUE)
// networking stack
#define DP_LOG_IPV4(VALUE) _DP_LOG_IPV4("ipv4", VALUE)
#define DP_LOG_IPV6(VALUE) _DP_LOG_IPV6("ipv6", VALUE)
Expand All @@ -69,6 +70,7 @@ extern "C" {
#define DP_LOG_GRPCRET(VALUE) _DP_LOG_INT("grpc_error", VALUE), _DP_LOG_STR("grpc_message", dp_grpc_strerror(VALUE))
#define DP_LOG_GRPCREQUEST(VALUE) _DP_LOG_INT("grpc_request", VALUE)
#define DP_LOG_IFACE(VALUE) _DP_LOG_STR("interface_id", VALUE)
#define DP_LOG_IFACE_INDEX(VALUE) _DP_LOG_INT("interface_index", VALUE)
#define DP_LOG_TVNI(VALUE) _DP_LOG_UINT("t_vni", VALUE)
#define DP_LOG_PCI(VALUE) _DP_LOG_STR("pci", VALUE)
#define DP_LOG_PXE_SRV(VALUE) _DP_LOG_STR("pxe_server", VALUE)
Expand Down
1 change: 0 additions & 1 deletion include/dp_mbuf_dyn.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ struct dp_flow {
uint16_t offload_ipv6 : 1; // tmp solution to set if we should offload ipv6 pkts
uint16_t dir : 2; // store the direction of each packet
uint16_t offload_decision: 2; // store the offload status of each packet
uint16_t offload_mark: 1; // store the offload mark of each packet
} flags;
uint16_t l3_type; //layer-3 for inner packets. it can be crafted or extracted from raw frames
union {
Expand Down
4 changes: 3 additions & 1 deletion include/dp_port.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,9 @@ struct dp_port {
uint8_t peer_pf_hairpin_tx_rx_queue_offset;
uint16_t peer_pf_port_id;
enum dp_vf_port_attach_status attach_status;
struct rte_flow *default_flow;
struct rte_flow *default_jump_flow;
struct rte_flow *default_capture_flow;
bool captured;
};

struct dp_ports {
Expand Down
4 changes: 4 additions & 0 deletions include/grpc/dp_async_grpc.h
Original file line number Diff line number Diff line change
Expand Up @@ -183,4 +183,8 @@ CREATE_CALLCLASS(ListFirewallRules, MultiReplyCall);
CREATE_CALLCLASS(CheckVniInUse, SingleReplyCall);
CREATE_CALLCLASS(ResetVni, SingleReplyCall);

CREATE_CALLCLASS(CaptureStart, SingleReplyCall);
CREATE_CALLCLASS(CaptureStop, SingleReplyCall);
CREATE_CALLCLASS(CaptureStatus, SingleReplyCall);

#endif
33 changes: 33 additions & 0 deletions include/grpc/dp_grpc_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include <stdint.h>
#include "dp_util.h"
#include "dp_firewall.h"
#include "monitoring/dp_monitoring.h"

#ifdef __cplusplus
extern "C" {
Expand Down Expand Up @@ -54,6 +55,9 @@ enum dpgrpc_request_type {
DP_REQ_TYPE_ListFirewallRules,
DP_REQ_TYPE_CheckVniInUse,
DP_REQ_TYPE_ResetVni,
DP_REQ_TYPE_CaptureStart,
DP_REQ_TYPE_CaptureStop,
DP_REQ_TYPE_CaptureStatus,
};

// in sync with dpdk proto!
Expand All @@ -63,6 +67,11 @@ enum dpgrpc_vni_type {
DP_VNI_BOTH,
};

enum dpgrpc_capture_iface_type {
DP_CAPTURE_IFACE_TYPE_SINGLE_PF,
DP_CAPTURE_IFACE_TYPE_SINGLE_VF,
};

struct dpgrpc_iface {
char iface_id[VM_IFACE_ID_MAX_LEN];
uint32_t ip4_addr;
Expand Down Expand Up @@ -160,6 +169,27 @@ struct dpgrpc_versions {
char app[DP_GRPC_VERSION_MAX_LEN];
};

struct dpgrpc_capture_interface {
enum dpgrpc_capture_iface_type type;
union {
char iface_id[VM_IFACE_ID_MAX_LEN];
uint8_t pf_index;
} spec;
};

struct dpgrpc_capture {
uint8_t dst_addr6[DP_VNF_IPV6_ADDR_SIZE];
uint8_t interface_count;
uint32_t udp_src_port;
uint32_t udp_dst_port;
struct dpgrpc_capture_interface interfaces[DP_CAPTURE_MAX_PORT_NUM];
bool is_active;
};
byteocean marked this conversation as resolved.
Show resolved Hide resolved

struct dpgrpc_capture_stop {
uint16_t port_cnt;
};

struct dpgrpc_request {
uint16_t type; // enum dpgrpc_request_type
union {
Expand Down Expand Up @@ -198,6 +228,7 @@ struct dpgrpc_request {
struct dpgrpc_vni vni_in_use;
struct dpgrpc_vni vni_reset;
struct dpgrpc_versions get_version;
struct dpgrpc_capture capture_start;
};
};

Expand Down Expand Up @@ -235,6 +266,8 @@ struct dpgrpc_reply {
struct dpgrpc_fwrule_info fwrule;
struct dpgrpc_vni_in_use vni_in_use;
struct dpgrpc_versions versions;
struct dpgrpc_capture_stop capture_stop;
struct dpgrpc_capture capture_get;
};
};

Expand Down
3 changes: 3 additions & 0 deletions include/grpc/dp_grpc_conv.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ namespace GrpcConv
bool GrpcToDpFwallDirection(const TrafficDirection& grpc_dir, enum dp_fwall_direction *dp_dir);
bool GrpcToDpFwallPort(int32_t grpc_port, uint32_t *dp_port);

bool GrpcToDpCaptureInterfaceType(const CaptureInterfaceType & grpc_type, enum dpgrpc_capture_iface_type *dp_capture_iface_type);
CaptureInterfaceType CaptureInterfaceTypeToGrpc(enum dpgrpc_capture_iface_type dp_capture_iface_type);

const char *Ipv4ToStr(uint32_t ipv4);

uint32_t Ipv4PrefixLenToMask(uint32_t prefix_length);
Expand Down
4 changes: 0 additions & 4 deletions include/monitoring/dp_event.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,8 @@ int dp_link_status_change_event_callback(uint16_t port_id,
void dp_process_event_link_msg(struct rte_mbuf *m);

int dp_send_event_flow_aging_msg(void);
int dp_send_event_hardware_capture_start_msg(void);
int dp_send_event_hardware_capture_stop_msg(void);

void dp_process_event_flow_aging_msg(struct rte_mbuf *m);
void dp_process_event_hardware_capture_start_msg(struct rte_mbuf *m);
void dp_process_event_hardware_capture_stop_msg(struct rte_mbuf *m);

#ifdef __cplusplus
}
Expand Down
24 changes: 6 additions & 18 deletions include/monitoring/dp_graphtrace.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,7 @@ enum dp_graphtrace_loglevel {
int dp_graphtrace_init(void);
void dp_graphtrace_free(void);

void _dp_graphtrace_send(enum dp_graphtrace_pkt_type type,
const struct rte_node *node,
void _dp_graphtrace_send(const struct rte_node *node,
const struct rte_node *next_node,
void **objs, uint16_t nb_objs,
uint16_t dst_port_id);
Expand Down Expand Up @@ -53,53 +52,42 @@ void _dp_graphtrace_send(enum dp_graphtrace_pkt_type type,

extern int _dp_graphtrace_flags;
extern bool _dp_graphtrace_enabled;
extern bool _dp_graphtrace_hw_enabled;

static __rte_always_inline void dp_graphtrace_next(const struct rte_node *node, void *obj, rte_edge_t next_index)
{
if (_dp_graphtrace_enabled && (_dp_graphtrace_flags & DP_GRAPHTRACE_FLAG_NODES))
_dp_graphtrace_send(DP_GRAPHTRACE_PKT_TYPE_SOFTWARE, node, node->nodes[next_index], &obj, 1, -1);
_dp_graphtrace_send(node, node->nodes[next_index], &obj, 1, -1);
_dp_graphtrace_log_next(node, obj, next_index);
}

static __rte_always_inline void dp_graphtrace_next_burst(const struct rte_node *node, void **objs, uint16_t nb_objs, rte_edge_t next_index)
{
if (_dp_graphtrace_enabled && (_dp_graphtrace_flags & DP_GRAPHTRACE_FLAG_NODES))
_dp_graphtrace_send(DP_GRAPHTRACE_PKT_TYPE_SOFTWARE, node, node->nodes[next_index], objs, nb_objs, -1);
_dp_graphtrace_send(node, node->nodes[next_index], objs, nb_objs, -1);
_dp_graphtrace_log_next_burst(node, objs, nb_objs, next_index);
}

static __rte_always_inline void dp_graphtrace_rx_burst(const struct rte_node *node, void **objs, uint16_t nb_objs)
{
if (_dp_graphtrace_enabled)
_dp_graphtrace_send(DP_GRAPHTRACE_PKT_TYPE_SOFTWARE, NULL, node, objs, nb_objs, -1);
_dp_graphtrace_send(NULL, node, objs, nb_objs, -1);
_dp_graphtrace_log_rx_burst(node, objs, nb_objs);
}

static __rte_always_inline void dp_graphtrace_tx_burst(const struct rte_node *node, void **objs, uint16_t nb_objs, uint16_t port_id)
{
if (_dp_graphtrace_enabled)
_dp_graphtrace_send(DP_GRAPHTRACE_PKT_TYPE_SOFTWARE, node, NULL, objs, nb_objs, port_id);
_dp_graphtrace_send(node, NULL, objs, nb_objs, port_id);
_dp_graphtrace_log_tx_burst(node, objs, nb_objs, port_id);
}

static __rte_always_inline void dp_graphtrace_drop_burst(const struct rte_node *node, void **objs, uint16_t nb_objs)
{
if (_dp_graphtrace_enabled && (_dp_graphtrace_flags & DP_GRAPHTRACE_FLAG_DROPS))
_dp_graphtrace_send(DP_GRAPHTRACE_PKT_TYPE_SOFTWARE, node, NULL, objs, nb_objs, -1);
_dp_graphtrace_send(node, NULL, objs, nb_objs, -1);
_dp_graphtrace_log_drop_burst(node, objs, nb_objs);
}

static __rte_always_inline void dp_graphtrace_capture_offload_pkt(void *obj)
{
if (_dp_graphtrace_hw_enabled)
_dp_graphtrace_send(DP_GRAPHTRACE_PKT_TYPE_OFFLOAD, NULL, NULL, &obj, 1, -1);
}

static __rte_always_inline bool dp_is_graphtrace_hw_enabled(void)
{
return _dp_graphtrace_enabled;
}

#ifdef __cplusplus
}
Expand Down
7 changes: 0 additions & 7 deletions include/monitoring/dp_graphtrace_shared.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,6 @@ enum dp_graphtrace_action {
DP_GRAPHTRACE_ACTION_STOP,
};

enum dp_graphtrace_pkt_type {
DP_GRAPHTRACE_PKT_TYPE_SOFTWARE,
DP_GRAPHTRACE_PKT_TYPE_OFFLOAD,
};

struct dp_graphtrace {
struct rte_mempool *mempool;
struct rte_ring *ringbuf;
Expand All @@ -47,7 +42,6 @@ struct dp_graphtrace_params {
};

struct dp_graphtrace_pktinfo {
enum dp_graphtrace_pkt_type pkt_type;
uint32_t pktid;
const struct rte_node *node;
const struct rte_node *next_node;
Expand All @@ -57,7 +51,6 @@ struct dp_graphtrace_pktinfo {
struct dp_graphtrace_params_start {
bool drops;
bool nodes;
bool hw;
};

struct dp_graphtrace_mp_request {
Expand Down
19 changes: 16 additions & 3 deletions include/monitoring/dp_monitoring.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,11 @@
extern "C" {
#endif

#define DP_CAPTURE_MAX_PORT_NUM 16

byteocean marked this conversation as resolved.
Show resolved Hide resolved
enum dp_event_type {
DP_EVENT_TYPE_UNKNOWN,
DP_EVENT_TYPE_LINK_STATUS,
DP_EVENT_TYPE_FLOW_AGING,
DP_EVENT_TYPE_HARDWARE_CAPTURE_START,
DP_EVENT_TYPE_HARDWARE_CAPTURE_STOP,
};

struct dp_event_msg_head {
Expand All @@ -32,8 +31,22 @@ struct dp_event_msg {
} event_entry;
};

struct dp_capture_hdr_config {
uint8_t capture_node_ipv6_addr[16];
uint32_t capture_udp_src_port;
uint32_t capture_udp_dst_port;
};

void dp_process_event_msg(struct rte_mbuf *m);


void dp_set_capture_hdr_config(uint8_t *addr, uint32_t udp_src_port, uint32_t udp_dst_port);
const struct dp_capture_hdr_config *dp_get_capture_hdr_config(void);

void dp_set_capture_enabled(bool enabled);

bool dp_is_capture_enabled(void);

#ifdef __cplusplus
}
#endif
Expand Down
2 changes: 1 addition & 1 deletion include/rte_flow/dp_rte_flow.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ extern "C"

enum {
DP_RTE_FLOW_DEFAULT_GROUP,
DP_RTE_FLOW_MONITORING_GROUP,
DP_RTE_FLOW_CAPTURE_GROUP,
DP_RTE_FLOW_VNET_GROUP,
};

Expand Down
Loading
Loading