Skip to content

Commit

Permalink
Enable disabling TX checksum offload for Antrea host gateway
Browse files Browse the repository at this point in the history
This commit introduces the ability to disable TX checksum offload
for the host gateway interface (default: `antrea-gw0`) by setting the
`disableTXChecksumOffload` option to `true`.

If this option is later set to false, Antrea does nothing to the affected
container network interfaces and the host gateway interface. To restore
the default TX checksum state of the affected interfaces, it is recommended
to delete them and recreate.

To delete the affected interfaces, for container network interfaces, delete
the corresponding Pods; for the host gateway interface, uninstall Antrea,
reboot all K8s Nodes and install Antrea.

Signed-off-by: Hongliang Liu <[email protected]>
  • Loading branch information
hongliangl committed Dec 11, 2024
1 parent b646bbd commit 3268120
Show file tree
Hide file tree
Showing 12 changed files with 133 additions and 79 deletions.
10 changes: 7 additions & 3 deletions build/charts/antrea/conf/antrea-agent.conf
Original file line number Diff line number Diff line change
Expand Up @@ -179,9 +179,13 @@ trafficEncryptionMode: {{ .Values.trafficEncryptionMode | quote }}
# `trafficEncapMode` is `noEncap`, and `noSNAT` is true.
enableBridgingMode: {{ .Values.enableBridgingMode }}

# Disable TX checksum offloading for container network interfaces. It's supposed to be set to true when the
# datapath doesn't support TX checksum offloading, which causes packets to be dropped due to bad checksum.
# It affects Pods running on Linux Nodes only.
# Disable TX checksum offloading for container network interfaces and the host gateway interface (default:
# antrea-gw0). It's supposed to be set to true when the datapath doesn't support TX checksum offloading,
# which causes packets to be dropped due to bad checksum.
# If this option is later set to false, Antrea does nothing to the affected container network interfaces
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
# This option affects Linux Nodes only.
disableTXChecksumOffload: {{ .Values.disableTXChecksumOffload }}

# Default MTU to use for the host gateway interface and the network interface of each Pod.
Expand Down
14 changes: 9 additions & 5 deletions build/yamls/antrea-aks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4122,9 +4122,13 @@ data:
# `trafficEncapMode` is `noEncap`, and `noSNAT` is true.
enableBridgingMode: false
# Disable TX checksum offloading for container network interfaces. It's supposed to be set to true when the
# datapath doesn't support TX checksum offloading, which causes packets to be dropped due to bad checksum.
# It affects Pods running on Linux Nodes only.
# Disable TX checksum offloading for container network interfaces and the host gateway interface (default:
# antrea-gw0). It's supposed to be set to true when the datapath doesn't support TX checksum offloading,
# which causes packets to be dropped due to bad checksum.
# If this option is later set to false, Antrea does nothing to the affected container network interfaces
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
# This option affects Linux Nodes only.
disableTXChecksumOffload: false
# Default MTU to use for the host gateway interface and the network interface of each Pod.
Expand Down Expand Up @@ -5394,7 +5398,7 @@ spec:
kubectl.kubernetes.io/default-container: antrea-agent
# Automatically restart Pods with a RollingUpdate if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: 5dd823245aab41ce7ca74d05693aa96e1537615f6966b6b78879cde5d3a0b215
checksum/config: 23ae11f3cd06f1152ba1c5a16d206745aec469055e6f0a43518671bbe3285462
labels:
app: antrea
component: antrea-agent
Expand Down Expand Up @@ -5632,7 +5636,7 @@ spec:
annotations:
# Automatically restart Pod if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: 5dd823245aab41ce7ca74d05693aa96e1537615f6966b6b78879cde5d3a0b215
checksum/config: 23ae11f3cd06f1152ba1c5a16d206745aec469055e6f0a43518671bbe3285462
labels:
app: antrea
component: antrea-controller
Expand Down
14 changes: 9 additions & 5 deletions build/yamls/antrea-eks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4122,9 +4122,13 @@ data:
# `trafficEncapMode` is `noEncap`, and `noSNAT` is true.
enableBridgingMode: false
# Disable TX checksum offloading for container network interfaces. It's supposed to be set to true when the
# datapath doesn't support TX checksum offloading, which causes packets to be dropped due to bad checksum.
# It affects Pods running on Linux Nodes only.
# Disable TX checksum offloading for container network interfaces and the host gateway interface (default:
# antrea-gw0). It's supposed to be set to true when the datapath doesn't support TX checksum offloading,
# which causes packets to be dropped due to bad checksum.
# If this option is later set to false, Antrea does nothing to the affected container network interfaces
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
# This option affects Linux Nodes only.
disableTXChecksumOffload: false
# Default MTU to use for the host gateway interface and the network interface of each Pod.
Expand Down Expand Up @@ -5394,7 +5398,7 @@ spec:
kubectl.kubernetes.io/default-container: antrea-agent
# Automatically restart Pods with a RollingUpdate if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: 5dd823245aab41ce7ca74d05693aa96e1537615f6966b6b78879cde5d3a0b215
checksum/config: 23ae11f3cd06f1152ba1c5a16d206745aec469055e6f0a43518671bbe3285462
labels:
app: antrea
component: antrea-agent
Expand Down Expand Up @@ -5633,7 +5637,7 @@ spec:
annotations:
# Automatically restart Pod if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: 5dd823245aab41ce7ca74d05693aa96e1537615f6966b6b78879cde5d3a0b215
checksum/config: 23ae11f3cd06f1152ba1c5a16d206745aec469055e6f0a43518671bbe3285462
labels:
app: antrea
component: antrea-controller
Expand Down
14 changes: 9 additions & 5 deletions build/yamls/antrea-gke.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4122,9 +4122,13 @@ data:
# `trafficEncapMode` is `noEncap`, and `noSNAT` is true.
enableBridgingMode: false
# Disable TX checksum offloading for container network interfaces. It's supposed to be set to true when the
# datapath doesn't support TX checksum offloading, which causes packets to be dropped due to bad checksum.
# It affects Pods running on Linux Nodes only.
# Disable TX checksum offloading for container network interfaces and the host gateway interface (default:
# antrea-gw0). It's supposed to be set to true when the datapath doesn't support TX checksum offloading,
# which causes packets to be dropped due to bad checksum.
# If this option is later set to false, Antrea does nothing to the affected container network interfaces
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
# This option affects Linux Nodes only.
disableTXChecksumOffload: false
# Default MTU to use for the host gateway interface and the network interface of each Pod.
Expand Down Expand Up @@ -5394,7 +5398,7 @@ spec:
kubectl.kubernetes.io/default-container: antrea-agent
# Automatically restart Pods with a RollingUpdate if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: e9ea48fa57cb11513f69a1fb2b44dd3c6cb96aa739598c1db5091ea91f097f4b
checksum/config: c53b2ce4d1af6f5dbb05f3a6a4b42df7bfb32ce6beea950903c2bfa3b030d682
labels:
app: antrea
component: antrea-agent
Expand Down Expand Up @@ -5630,7 +5634,7 @@ spec:
annotations:
# Automatically restart Pod if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: e9ea48fa57cb11513f69a1fb2b44dd3c6cb96aa739598c1db5091ea91f097f4b
checksum/config: c53b2ce4d1af6f5dbb05f3a6a4b42df7bfb32ce6beea950903c2bfa3b030d682
labels:
app: antrea
component: antrea-controller
Expand Down
14 changes: 9 additions & 5 deletions build/yamls/antrea-ipsec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4135,9 +4135,13 @@ data:
# `trafficEncapMode` is `noEncap`, and `noSNAT` is true.
enableBridgingMode: false
# Disable TX checksum offloading for container network interfaces. It's supposed to be set to true when the
# datapath doesn't support TX checksum offloading, which causes packets to be dropped due to bad checksum.
# It affects Pods running on Linux Nodes only.
# Disable TX checksum offloading for container network interfaces and the host gateway interface (default:
# antrea-gw0). It's supposed to be set to true when the datapath doesn't support TX checksum offloading,
# which causes packets to be dropped due to bad checksum.
# If this option is later set to false, Antrea does nothing to the affected container network interfaces
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
# This option affects Linux Nodes only.
disableTXChecksumOffload: false
# Default MTU to use for the host gateway interface and the network interface of each Pod.
Expand Down Expand Up @@ -5407,7 +5411,7 @@ spec:
kubectl.kubernetes.io/default-container: antrea-agent
# Automatically restart Pods with a RollingUpdate if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: 38e19ea8db3838e3f5cff4aaa2684db1586fb457d095ac3ea49e8bf405a04e41
checksum/config: 86dca0ab732878d94c7927b999fd348f19c386dd9dc10f8bb6642a258c3f74d1
checksum/ipsec-secret: d0eb9c52d0cd4311b6d252a951126bf9bea27ec05590bed8a394f0f792dcb2a4
labels:
app: antrea
Expand Down Expand Up @@ -5689,7 +5693,7 @@ spec:
annotations:
# Automatically restart Pod if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: 38e19ea8db3838e3f5cff4aaa2684db1586fb457d095ac3ea49e8bf405a04e41
checksum/config: 86dca0ab732878d94c7927b999fd348f19c386dd9dc10f8bb6642a258c3f74d1
labels:
app: antrea
component: antrea-controller
Expand Down
14 changes: 9 additions & 5 deletions build/yamls/antrea.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4122,9 +4122,13 @@ data:
# `trafficEncapMode` is `noEncap`, and `noSNAT` is true.
enableBridgingMode: false
# Disable TX checksum offloading for container network interfaces. It's supposed to be set to true when the
# datapath doesn't support TX checksum offloading, which causes packets to be dropped due to bad checksum.
# It affects Pods running on Linux Nodes only.
# Disable TX checksum offloading for container network interfaces and the host gateway interface (default:
# antrea-gw0). It's supposed to be set to true when the datapath doesn't support TX checksum offloading,
# which causes packets to be dropped due to bad checksum.
# If this option is later set to false, Antrea does nothing to the affected container network interfaces
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
# This option affects Linux Nodes only.
disableTXChecksumOffload: false
# Default MTU to use for the host gateway interface and the network interface of each Pod.
Expand Down Expand Up @@ -5394,7 +5398,7 @@ spec:
kubectl.kubernetes.io/default-container: antrea-agent
# Automatically restart Pods with a RollingUpdate if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: 59fb1ea496577015058d4e99e4f64136aa68d5340db13c00ced565da750a22fc
checksum/config: b5c43688f60e69b5c166344b70a6d57bb5c4740ab9f13fd17b1e0b1899234448
labels:
app: antrea
component: antrea-agent
Expand Down Expand Up @@ -5630,7 +5634,7 @@ spec:
annotations:
# Automatically restart Pod if the ConfigMap changes
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
checksum/config: 59fb1ea496577015058d4e99e4f64136aa68d5340db13c00ced565da750a22fc
checksum/config: b5c43688f60e69b5c166344b70a6d57bb5c4740ab9f13fd17b1e0b1899234448
labels:
app: antrea
component: antrea-controller
Expand Down
3 changes: 2 additions & 1 deletion cmd/antrea-agent/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,8 @@ func run(o *Options) error {
connectUplinkToBridge,
o.enableAntreaProxy,
l7NetworkPolicyEnabled,
l7FlowExporterEnabled)
l7FlowExporterEnabled,
o.config.DisableTXChecksumOffload)
err = agentInitializer.Initialize()
if err != nil {
return fmt.Errorf("error initializing agent: %v", err)
Expand Down
8 changes: 6 additions & 2 deletions docs/antrea-l7-network-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,12 @@ This guide demonstrates how to configure layer 7 NetworkPolicy.

Layer 7 NetworkPolicy was introduced in v1.10 as an alpha feature and is disabled by default. A feature gate,
`L7NetworkPolicy`, must be enabled in antrea-controller.conf and antrea-agent.conf in the `antrea-config` ConfigMap.
Additionally, due to the constraint of the application detection engine, TX checksum offloading must be disabled via the
`disableTXChecksumOffload` option in antrea-agent.conf for the feature to work. An example configuration is as below:
Additionally, to ensure proper functionality, TX checksum offloading must be disabled for container network interfaces
and the host gateway interface (default: antrea-gw0) due to the constraint of the application detection engine. Ths can
be configured using the `disableTXChecksumOffload` option in antrea-agent.conf. Disabling TX checksum offloading ensures
that TCP connections traverse these interfaces correctly, preventing connection failures and packet loss.

An example configuration is as below:

```yaml
apiVersion: v1
Expand Down
96 changes: 51 additions & 45 deletions pkg/agent/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -109,27 +109,28 @@ var (

// Initializer knows how to setup host networking, OpenVSwitch, and Openflow.
type Initializer struct {
client clientset.Interface
crdClient versioned.Interface
ovsBridgeClient ovsconfig.OVSBridgeClient
ovsCtlClient ovsctl.OVSCtlClient
ofClient openflow.Client
routeClient route.Interface
wireGuardClient wireguard.Interface
ifaceStore interfacestore.InterfaceStore
ovsBridge string
hostGateway string // name of gateway port on the OVS bridge
mtu int
networkConfig *config.NetworkConfig
nodeConfig *config.NodeConfig
wireGuardConfig *config.WireGuardConfig
egressConfig *config.EgressConfig
serviceConfig *config.ServiceConfig
l7NetworkPolicyConfig *config.L7NetworkPolicyConfig
enableL7NetworkPolicy bool
enableL7FlowExporter bool
connectUplinkToBridge bool
enableAntreaProxy bool
client clientset.Interface
crdClient versioned.Interface
ovsBridgeClient ovsconfig.OVSBridgeClient
ovsCtlClient ovsctl.OVSCtlClient
ofClient openflow.Client
routeClient route.Interface
wireGuardClient wireguard.Interface
ifaceStore interfacestore.InterfaceStore
ovsBridge string
hostGateway string // name of gateway port on the OVS bridge
mtu int
networkConfig *config.NetworkConfig
nodeConfig *config.NodeConfig
wireGuardConfig *config.WireGuardConfig
egressConfig *config.EgressConfig
serviceConfig *config.ServiceConfig
l7NetworkPolicyConfig *config.L7NetworkPolicyConfig
enableL7NetworkPolicy bool
enableL7FlowExporter bool
connectUplinkToBridge bool
enableAntreaProxy bool
disableTXChecksumOffload bool
// podNetworkWait should be decremented once the Node's network is ready.
// The CNI server will wait for it before handling any CNI Add requests.
podNetworkWait *utilwait.Group
Expand Down Expand Up @@ -166,32 +167,34 @@ func NewInitializer(
enableAntreaProxy bool,
enableL7NetworkPolicy bool,
enableL7FlowExporter bool,
disableTXChecksumOffload bool,
) *Initializer {
return &Initializer{
ovsBridgeClient: ovsBridgeClient,
ovsCtlClient: ovsCtlClient,
client: k8sClient,
crdClient: crdClient,
ifaceStore: ifaceStore,
ofClient: ofClient,
routeClient: routeClient,
ovsBridge: ovsBridge,
hostGateway: hostGateway,
mtu: mtu,
networkConfig: networkConfig,
wireGuardConfig: wireGuardConfig,
egressConfig: egressConfig,
serviceConfig: serviceConfig,
l7NetworkPolicyConfig: &config.L7NetworkPolicyConfig{},
podNetworkWait: podNetworkWait,
flowRestoreCompleteWait: flowRestoreCompleteWait,
stopCh: stopCh,
nodeType: nodeType,
externalNodeNamespace: externalNodeNamespace,
connectUplinkToBridge: connectUplinkToBridge,
enableAntreaProxy: enableAntreaProxy,
enableL7NetworkPolicy: enableL7NetworkPolicy,
enableL7FlowExporter: enableL7FlowExporter,
ovsBridgeClient: ovsBridgeClient,
ovsCtlClient: ovsCtlClient,
client: k8sClient,
crdClient: crdClient,
ifaceStore: ifaceStore,
ofClient: ofClient,
routeClient: routeClient,
ovsBridge: ovsBridge,
hostGateway: hostGateway,
mtu: mtu,
networkConfig: networkConfig,
wireGuardConfig: wireGuardConfig,
egressConfig: egressConfig,
serviceConfig: serviceConfig,
l7NetworkPolicyConfig: &config.L7NetworkPolicyConfig{},
podNetworkWait: podNetworkWait,
flowRestoreCompleteWait: flowRestoreCompleteWait,
stopCh: stopCh,
nodeType: nodeType,
externalNodeNamespace: externalNodeNamespace,
connectUplinkToBridge: connectUplinkToBridge,
enableAntreaProxy: enableAntreaProxy,
enableL7NetworkPolicy: enableL7NetworkPolicy,
enableL7FlowExporter: enableL7FlowExporter,
disableTXChecksumOffload: disableTXChecksumOffload,
}
}

Expand Down Expand Up @@ -706,6 +709,9 @@ func (i *Initializer) setupGatewayInterface() error {
return err
}
}
if err := i.setTXChecksumOffload(); err != nil {
return err
}

return nil
}
Expand Down
11 changes: 11 additions & 0 deletions pkg/agent/agent_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ import (
"antrea.io/antrea/pkg/agent/config"
"antrea.io/antrea/pkg/agent/interfacestore"
"antrea.io/antrea/pkg/agent/util"
"antrea.io/antrea/pkg/agent/util/ethtool"
"antrea.io/antrea/pkg/apis/crd/v1alpha1"
"antrea.io/antrea/pkg/ovs/ovsconfig"
utilip "antrea.io/antrea/pkg/util/ip"
Expand Down Expand Up @@ -262,3 +263,13 @@ func (i *Initializer) prepareL7EngineInterfaces() error {
}
return nil
}

func (i *Initializer) setTXChecksumOffload() error {
if i.disableTXChecksumOffload {
if err := ethtool.EthtoolTXHWCsumOff(i.hostGateway); err != nil {
return fmt.Errorf("error when disabling TX checksum offload on %s: %v", i.hostGateway, err)
}
klog.InfoS("Disabled TX checksum offload on host gateway interface", "hostGateway", i.hostGateway)
}
return nil
}
4 changes: 4 additions & 0 deletions pkg/agent/agent_windows.go
Original file line number Diff line number Diff line change
Expand Up @@ -512,3 +512,7 @@ func (i *Initializer) installVMInitialFlows() error {
func (i *Initializer) prepareL7EngineInterfaces() error {
return nil
}

func (i *Initializer) setTXChecksumOffload() error {
return nil
}
10 changes: 7 additions & 3 deletions pkg/config/agent/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -120,9 +120,13 @@ type AgentConfig struct {
// IPv4 and Linux Nodes, and can be enabled only when `ovsDatapathType` is `system`,
// `trafficEncapMode` is `noEncap`, and `noSNAT` is true.
EnableBridgingMode bool `yaml:"enableBridgingMode,omitempty"`
// Disable TX checksum offloading for container network interfaces. It's supposed to be set to true when the
// datapath doesn't support TX checksum offloading, which causes packets to be dropped due to bad checksum.
// It affects Pods running on Linux Nodes only.
// Disable TX checksum offloading for container network interfaces and the host gateway interface (default:
// antrea-gw0). It's supposed to be set to true when the datapath doesn't support TX checksum offloading,
// which causes packets to be dropped due to bad checksum.
// If this option is later set to false, Antrea does nothing to the affected container network interfaces
// and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
// it is recommended to delete them and recreate.
// This option affects Linux Nodes only.
DisableTXChecksumOffload bool `yaml:"disableTXChecksumOffload,omitempty"`
// APIPort is the port for the antrea-agent APIServer to serve on.
// Defaults to 10350.
Expand Down

0 comments on commit 3268120

Please sign in to comment.