Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable disabling TX checksum offload for Antrea host gateway #6843

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions build/charts/antrea/conf/antrea-agent.conf
Original file line number Diff line number Diff line change
Expand Up @@ -179,9 +179,13 @@ trafficEncryptionMode: {{ .Values.trafficEncryptionMode | quote }}
# `trafficEncapMode` is `noEncap`, and `noSNAT` is true.
enableBridgingMode: {{ .Values.enableBridgingMode }}

# Disable TX checksum offloading for container network interfaces. It's supposed to be set to true when the
# datapath doesn't support TX checksum offloading, which causes packets to be dropped due to bad checksum.
# It affects Pods running on Linux Nodes only.
# Disable TX checksum offloading for container network interfaces and the host gateway interface (default:
# antrea-gw0). It's supposed to be set to true when the datapath doesn't support TX checksum offloading,
# which causes packets to be dropped due to bad checksum.
# If this option is later set to false, Antrea does nothing to the affected container network interfaces
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
Comment on lines +186 to +187
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To restore the default TX checksum state of the affected interfaces, it is recommended to delete them and recreate.

For Pods, yes. For the gateway interface, I don't know if this is the best advice as it is a bit disruptive and unpractical for users. We could consider an antctl command (maybe in the future) to change the offload configuration on all Nodes easily?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. I will add the sub command in another PR.

# This option affects Linux Nodes only.
disableTXChecksumOffload: {{ .Values.disableTXChecksumOffload }}

# Default MTU to use for the host gateway interface and the network interface of each Pod.
Expand Down
14 changes: 9 additions & 5 deletions build/yamls/antrea-aks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4122,9 +4122,13 @@ data:
# `trafficEncapMode` is `noEncap`, and `noSNAT` is true.
enableBridgingMode: false

# Disable TX checksum offloading for container network interfaces. It's supposed to be set to true when the
# datapath doesn't support TX checksum offloading, which causes packets to be dropped due to bad checksum.
# It affects Pods running on Linux Nodes only.
# Disable TX checksum offloading for container network interfaces and the host gateway interface (default:
# antrea-gw0). It's supposed to be set to true when the datapath doesn't support TX checksum offloading,
# which causes packets to be dropped due to bad checksum.
# If this option is later set to false, Antrea does nothing to the affected container network interfaces
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
# This option affects Linux Nodes only.
disableTXChecksumOffload: false

# Default MTU to use for the host gateway interface and the network interface of each Pod.
Expand Down Expand Up @@ -5400,7 +5404,7 @@ spec:
kubectl.kubernetes.io/default-container: antrea-agent
# Automatically restart Pods with a RollingUpdate if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: f7ac1903ae9edfd45361cb67b991cb23f708f15cb5cb862bffd70e95dcd776fb
checksum/config: 54e7d7a7bed8d3013386af09cce086580ba7ffc36fb5e92d654aa237c8e9fd94
labels:
app: antrea
component: antrea-agent
Expand Down Expand Up @@ -5638,7 +5642,7 @@ spec:
annotations:
# Automatically restart Pod if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: f7ac1903ae9edfd45361cb67b991cb23f708f15cb5cb862bffd70e95dcd776fb
checksum/config: 54e7d7a7bed8d3013386af09cce086580ba7ffc36fb5e92d654aa237c8e9fd94
labels:
app: antrea
component: antrea-controller
Expand Down
14 changes: 9 additions & 5 deletions build/yamls/antrea-eks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4122,9 +4122,13 @@ data:
# `trafficEncapMode` is `noEncap`, and `noSNAT` is true.
enableBridgingMode: false

# Disable TX checksum offloading for container network interfaces. It's supposed to be set to true when the
# datapath doesn't support TX checksum offloading, which causes packets to be dropped due to bad checksum.
# It affects Pods running on Linux Nodes only.
# Disable TX checksum offloading for container network interfaces and the host gateway interface (default:
# antrea-gw0). It's supposed to be set to true when the datapath doesn't support TX checksum offloading,
# which causes packets to be dropped due to bad checksum.
# If this option is later set to false, Antrea does nothing to the affected container network interfaces
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
# This option affects Linux Nodes only.
disableTXChecksumOffload: false

# Default MTU to use for the host gateway interface and the network interface of each Pod.
Expand Down Expand Up @@ -5400,7 +5404,7 @@ spec:
kubectl.kubernetes.io/default-container: antrea-agent
# Automatically restart Pods with a RollingUpdate if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: f7ac1903ae9edfd45361cb67b991cb23f708f15cb5cb862bffd70e95dcd776fb
checksum/config: 54e7d7a7bed8d3013386af09cce086580ba7ffc36fb5e92d654aa237c8e9fd94
labels:
app: antrea
component: antrea-agent
Expand Down Expand Up @@ -5639,7 +5643,7 @@ spec:
annotations:
# Automatically restart Pod if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: f7ac1903ae9edfd45361cb67b991cb23f708f15cb5cb862bffd70e95dcd776fb
checksum/config: 54e7d7a7bed8d3013386af09cce086580ba7ffc36fb5e92d654aa237c8e9fd94
labels:
app: antrea
component: antrea-controller
Expand Down
14 changes: 9 additions & 5 deletions build/yamls/antrea-gke.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4122,9 +4122,13 @@ data:
# `trafficEncapMode` is `noEncap`, and `noSNAT` is true.
enableBridgingMode: false

# Disable TX checksum offloading for container network interfaces. It's supposed to be set to true when the
# datapath doesn't support TX checksum offloading, which causes packets to be dropped due to bad checksum.
# It affects Pods running on Linux Nodes only.
# Disable TX checksum offloading for container network interfaces and the host gateway interface (default:
# antrea-gw0). It's supposed to be set to true when the datapath doesn't support TX checksum offloading,
# which causes packets to be dropped due to bad checksum.
# If this option is later set to false, Antrea does nothing to the affected container network interfaces
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
# This option affects Linux Nodes only.
disableTXChecksumOffload: false

# Default MTU to use for the host gateway interface and the network interface of each Pod.
Expand Down Expand Up @@ -5400,7 +5404,7 @@ spec:
kubectl.kubernetes.io/default-container: antrea-agent
# Automatically restart Pods with a RollingUpdate if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: 00ba3a60f132691721ba2e84c5c8f0a9eddc32593b38798de8f59d52fff54169
checksum/config: e5d1c205f1bb6ede60ed7ec99e4e32663eac39b86e345c40d942552cc73605e1
labels:
app: antrea
component: antrea-agent
Expand Down Expand Up @@ -5636,7 +5640,7 @@ spec:
annotations:
# Automatically restart Pod if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: 00ba3a60f132691721ba2e84c5c8f0a9eddc32593b38798de8f59d52fff54169
checksum/config: e5d1c205f1bb6ede60ed7ec99e4e32663eac39b86e345c40d942552cc73605e1
labels:
app: antrea
component: antrea-controller
Expand Down
14 changes: 9 additions & 5 deletions build/yamls/antrea-ipsec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4135,9 +4135,13 @@ data:
# `trafficEncapMode` is `noEncap`, and `noSNAT` is true.
enableBridgingMode: false

# Disable TX checksum offloading for container network interfaces. It's supposed to be set to true when the
# datapath doesn't support TX checksum offloading, which causes packets to be dropped due to bad checksum.
# It affects Pods running on Linux Nodes only.
# Disable TX checksum offloading for container network interfaces and the host gateway interface (default:
# antrea-gw0). It's supposed to be set to true when the datapath doesn't support TX checksum offloading,
# which causes packets to be dropped due to bad checksum.
# If this option is later set to false, Antrea does nothing to the affected container network interfaces
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
# This option affects Linux Nodes only.
disableTXChecksumOffload: false

# Default MTU to use for the host gateway interface and the network interface of each Pod.
Expand Down Expand Up @@ -5413,7 +5417,7 @@ spec:
kubectl.kubernetes.io/default-container: antrea-agent
# Automatically restart Pods with a RollingUpdate if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: 4b9bbfbbda1ab405ade14e797ea88fbd6f3795bb6aae9df0496409d542799145
checksum/config: 6b5ad7dcc2e54f41aab252cc9c3e81bf2c059ea717ca250c4fac21affde5fd3d
checksum/ipsec-secret: d0eb9c52d0cd4311b6d252a951126bf9bea27ec05590bed8a394f0f792dcb2a4
labels:
app: antrea
Expand Down Expand Up @@ -5695,7 +5699,7 @@ spec:
annotations:
# Automatically restart Pod if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: 4b9bbfbbda1ab405ade14e797ea88fbd6f3795bb6aae9df0496409d542799145
checksum/config: 6b5ad7dcc2e54f41aab252cc9c3e81bf2c059ea717ca250c4fac21affde5fd3d
labels:
app: antrea
component: antrea-controller
Expand Down
14 changes: 9 additions & 5 deletions build/yamls/antrea.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4122,9 +4122,13 @@ data:
# `trafficEncapMode` is `noEncap`, and `noSNAT` is true.
enableBridgingMode: false

# Disable TX checksum offloading for container network interfaces. It's supposed to be set to true when the
# datapath doesn't support TX checksum offloading, which causes packets to be dropped due to bad checksum.
# It affects Pods running on Linux Nodes only.
# Disable TX checksum offloading for container network interfaces and the host gateway interface (default:
# antrea-gw0). It's supposed to be set to true when the datapath doesn't support TX checksum offloading,
# which causes packets to be dropped due to bad checksum.
# If this option is later set to false, Antrea does nothing to the affected container network interfaces
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
# This option affects Linux Nodes only.
disableTXChecksumOffload: false

# Default MTU to use for the host gateway interface and the network interface of each Pod.
Expand Down Expand Up @@ -5400,7 +5404,7 @@ spec:
kubectl.kubernetes.io/default-container: antrea-agent
# Automatically restart Pods with a RollingUpdate if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: e4e94ba89524d8fdc7eb3ad6e0f6948767f3d92ef767f17c47da348f08b5c2e0
checksum/config: 066c4105578a251da2b0ee3928df4ec469a7ba113764f9206a5b97b789713d0c
labels:
app: antrea
component: antrea-agent
Expand Down Expand Up @@ -5636,7 +5640,7 @@ spec:
annotations:
# Automatically restart Pod if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: e4e94ba89524d8fdc7eb3ad6e0f6948767f3d92ef767f17c47da348f08b5c2e0
checksum/config: 066c4105578a251da2b0ee3928df4ec469a7ba113764f9206a5b97b789713d0c
labels:
app: antrea
component: antrea-controller
Expand Down
3 changes: 2 additions & 1 deletion cmd/antrea-agent/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,8 @@ func run(o *Options) error {
connectUplinkToBridge,
o.enableAntreaProxy,
l7NetworkPolicyEnabled,
l7FlowExporterEnabled)
l7FlowExporterEnabled,
o.config.DisableTXChecksumOffload)
err = agentInitializer.Initialize()
if err != nil {
return fmt.Errorf("error initializing agent: %v", err)
Expand Down
8 changes: 6 additions & 2 deletions docs/antrea-l7-network-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,12 @@ This guide demonstrates how to configure layer 7 NetworkPolicy.

Layer 7 NetworkPolicy was introduced in v1.10 as an alpha feature and is disabled by default. A feature gate,
`L7NetworkPolicy`, must be enabled in antrea-controller.conf and antrea-agent.conf in the `antrea-config` ConfigMap.
Additionally, due to the constraint of the application detection engine, TX checksum offloading must be disabled via the
`disableTXChecksumOffload` option in antrea-agent.conf for the feature to work. An example configuration is as below:
Additionally, to ensure proper functionality, TX checksum offloading must be disabled for container network interfaces
and the host gateway interface (default: antrea-gw0) due to the constraint of the application detection engine. Ths can
be configured using the `disableTXChecksumOffload` option in antrea-agent.conf. Disabling TX checksum offloading ensures
that TCP connections traverse these interfaces correctly, preventing connection failures and packet loss.

An example configuration is as below:

```yaml
apiVersion: v1
Expand Down
96 changes: 51 additions & 45 deletions pkg/agent/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -109,27 +109,28 @@ var (

// Initializer knows how to setup host networking, OpenVSwitch, and Openflow.
type Initializer struct {
client clientset.Interface
crdClient versioned.Interface
ovsBridgeClient ovsconfig.OVSBridgeClient
ovsCtlClient ovsctl.OVSCtlClient
ofClient openflow.Client
routeClient route.Interface
wireGuardClient wireguard.Interface
ifaceStore interfacestore.InterfaceStore
ovsBridge string
hostGateway string // name of gateway port on the OVS bridge
mtu int
networkConfig *config.NetworkConfig
nodeConfig *config.NodeConfig
wireGuardConfig *config.WireGuardConfig
egressConfig *config.EgressConfig
serviceConfig *config.ServiceConfig
l7NetworkPolicyConfig *config.L7NetworkPolicyConfig
enableL7NetworkPolicy bool
enableL7FlowExporter bool
connectUplinkToBridge bool
enableAntreaProxy bool
client clientset.Interface
crdClient versioned.Interface
ovsBridgeClient ovsconfig.OVSBridgeClient
ovsCtlClient ovsctl.OVSCtlClient
ofClient openflow.Client
routeClient route.Interface
wireGuardClient wireguard.Interface
ifaceStore interfacestore.InterfaceStore
ovsBridge string
hostGateway string // name of gateway port on the OVS bridge
mtu int
networkConfig *config.NetworkConfig
nodeConfig *config.NodeConfig
wireGuardConfig *config.WireGuardConfig
egressConfig *config.EgressConfig
serviceConfig *config.ServiceConfig
l7NetworkPolicyConfig *config.L7NetworkPolicyConfig
enableL7NetworkPolicy bool
enableL7FlowExporter bool
connectUplinkToBridge bool
enableAntreaProxy bool
disableTXChecksumOffload bool
// podNetworkWait should be decremented once the Node's network is ready.
// The CNI server will wait for it before handling any CNI Add requests.
podNetworkWait *utilwait.Group
Expand Down Expand Up @@ -166,32 +167,34 @@ func NewInitializer(
enableAntreaProxy bool,
enableL7NetworkPolicy bool,
enableL7FlowExporter bool,
disableTXChecksumOffload bool,
) *Initializer {
return &Initializer{
ovsBridgeClient: ovsBridgeClient,
ovsCtlClient: ovsCtlClient,
client: k8sClient,
crdClient: crdClient,
ifaceStore: ifaceStore,
ofClient: ofClient,
routeClient: routeClient,
ovsBridge: ovsBridge,
hostGateway: hostGateway,
mtu: mtu,
networkConfig: networkConfig,
wireGuardConfig: wireGuardConfig,
egressConfig: egressConfig,
serviceConfig: serviceConfig,
l7NetworkPolicyConfig: &config.L7NetworkPolicyConfig{},
podNetworkWait: podNetworkWait,
flowRestoreCompleteWait: flowRestoreCompleteWait,
stopCh: stopCh,
nodeType: nodeType,
externalNodeNamespace: externalNodeNamespace,
connectUplinkToBridge: connectUplinkToBridge,
enableAntreaProxy: enableAntreaProxy,
enableL7NetworkPolicy: enableL7NetworkPolicy,
enableL7FlowExporter: enableL7FlowExporter,
ovsBridgeClient: ovsBridgeClient,
ovsCtlClient: ovsCtlClient,
client: k8sClient,
crdClient: crdClient,
ifaceStore: ifaceStore,
ofClient: ofClient,
routeClient: routeClient,
ovsBridge: ovsBridge,
hostGateway: hostGateway,
mtu: mtu,
networkConfig: networkConfig,
wireGuardConfig: wireGuardConfig,
egressConfig: egressConfig,
serviceConfig: serviceConfig,
l7NetworkPolicyConfig: &config.L7NetworkPolicyConfig{},
podNetworkWait: podNetworkWait,
flowRestoreCompleteWait: flowRestoreCompleteWait,
stopCh: stopCh,
nodeType: nodeType,
externalNodeNamespace: externalNodeNamespace,
connectUplinkToBridge: connectUplinkToBridge,
enableAntreaProxy: enableAntreaProxy,
enableL7NetworkPolicy: enableL7NetworkPolicy,
enableL7FlowExporter: enableL7FlowExporter,
disableTXChecksumOffload: disableTXChecksumOffload,
}
}

Expand Down Expand Up @@ -706,6 +709,9 @@ func (i *Initializer) setupGatewayInterface() error {
return err
}
}
if err := i.setTXChecksumOffload(); err != nil {
return err
}

return nil
}
Expand Down
11 changes: 11 additions & 0 deletions pkg/agent/agent_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ import (
"antrea.io/antrea/pkg/agent/config"
"antrea.io/antrea/pkg/agent/interfacestore"
"antrea.io/antrea/pkg/agent/util"
"antrea.io/antrea/pkg/agent/util/ethtool"
"antrea.io/antrea/pkg/apis/crd/v1alpha1"
"antrea.io/antrea/pkg/ovs/ovsconfig"
utilip "antrea.io/antrea/pkg/util/ip"
Expand Down Expand Up @@ -262,3 +263,13 @@ func (i *Initializer) prepareL7EngineInterfaces() error {
}
return nil
}

func (i *Initializer) setTXChecksumOffload() error {
if i.disableTXChecksumOffload {
if err := ethtool.EthtoolTXHWCsumOff(i.hostGateway); err != nil {
return fmt.Errorf("error when disabling TX checksum offload on %s: %v", i.hostGateway, err)
}
klog.InfoS("Disabled TX checksum offload on host gateway interface", "hostGateway", i.hostGateway)
}
return nil
}
4 changes: 4 additions & 0 deletions pkg/agent/agent_windows.go
Original file line number Diff line number Diff line change
Expand Up @@ -512,3 +512,7 @@ func (i *Initializer) installVMInitialFlows() error {
func (i *Initializer) prepareL7EngineInterfaces() error {
return nil
}

func (i *Initializer) setTXChecksumOffload() error {
return nil
}
10 changes: 7 additions & 3 deletions pkg/config/agent/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -120,9 +120,13 @@ type AgentConfig struct {
// IPv4 and Linux Nodes, and can be enabled only when `ovsDatapathType` is `system`,
// `trafficEncapMode` is `noEncap`, and `noSNAT` is true.
EnableBridgingMode bool `yaml:"enableBridgingMode,omitempty"`
// Disable TX checksum offloading for container network interfaces. It's supposed to be set to true when the
// datapath doesn't support TX checksum offloading, which causes packets to be dropped due to bad checksum.
// It affects Pods running on Linux Nodes only.
// Disable TX checksum offloading for container network interfaces and the host gateway interface (default:
// antrea-gw0). It's supposed to be set to true when the datapath doesn't support TX checksum offloading,
// which causes packets to be dropped due to bad checksum.
// If this option is later set to false, Antrea does nothing to the affected container network interfaces
// and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
// it is recommended to delete them and recreate.
// This option affects Linux Nodes only.
DisableTXChecksumOffload bool `yaml:"disableTXChecksumOffload,omitempty"`
// APIPort is the port for the antrea-agent APIServer to serve on.
// Defaults to 10350.
Expand Down
Loading