Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] EKS Hybrid Nodes Networking Docs (pod network routing, webhooks, mixed mode clusters) #906

Open
wants to merge 4 commits into
base: mainline
Choose a base branch
from

Conversation

csplinter
Copy link

@csplinter csplinter commented Mar 4, 2025

DO NOT MERGE
We are still iterating on some of this content, put up PR to make it easy for others to view and review.

Description of changes:
This doc update adds to the info we have for EKS Hybrid Nodes on pod network routing, webhooks, and running mixed mode clusters

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@csplinter csplinter requested a review from fincd-aws as a code owner March 4, 2025 22:54
@csplinter csplinter changed the title EKS Hybrid Nodes Networking Docs (pod network routing, webhooks, mixed mode clusters) [DRAFT] EKS Hybrid Nodes Networking Docs (pod network routing, webhooks, mixed mode clusters) Mar 4, 2025
Copy link

This pull request is automatically being deployed by Amplify Hosting (learn more).

Access this pull request here: https://pr-906.d3rijirjvbh87e.amplifyapp.com

A common way to advertise pod addresses with your on-premises network is by using BGP. To use BGP with Cilium, you must set `bgpControlPlane.enabled: true`. For more information on Cilium's BGP support, see https://docs.cilium.io/en/stable/network/bgp-control-plane/bgp-control-plane/[Cilium BGP Control Plane] in the Cilium documentation.
. Create a YAML file called `cilium-values.yaml`. The following example configures Cilium to run on hybrid nodes only by setting affinity for the `eks.amazonaws.com/compute-type: hybrid` label.

- If you configured your Amazon EKS cluster with _remote pod networks_, configure the same pod CIDRs for your `clusterPoolIPv4PodCIDRList`. For example, `10.100.0.0/24`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth asking to validate that POD_CIDR doesn't overlap with the node cidr?

Even if pod cidr is not configured at the eks level, the pod cidr cilium uses (regardless if manually input or the default one) must not overlap with the node cidr.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yea good callout. will add that

- It is recommended to run Calico in overlay / tunnel mode with VXLAN as the link:https://docs.tigera.io/calico/latest/networking/configuring/vxlan-ipip[encapsulation method]. This mode has the fewest requirements on the underlying physical network. For more information on the different Calico networking modes, see https://docs.tigera.io/calico/latest/networking/determine-best-networking[Determining the best networking option] in the Calico documentation.
- It is recommended to run Calico with `natOutgoing` set to `true`. With `natOutgoing` set to `true` the source IP address of all pod traffic leaving the cluster to the IP address of the node. This makes it possible to run Calico with Amazon EKS clusters that have remote pod networks configured and with clusters that don't have remote pod networks configured. If you disable `natOutgoing`, then your pod CIDRs must be routable on your on-premises network and you must configure your Amazon EKS cluster with your remote pod networks.
- If you are running webhooks on your hybrid nodes, your pod CIDRs must be routable on your on-premises network and you must configure your Amazon EKS cluster with your remote pod networks. If your pod CIDRs are not routable on your on-premises network, then it is recommended to run webhooks on cloud nodes in the same cluster. See <<hybrid-nodes-networking#mixed-mode-clusters, Mixed mode clusters>> for more information.
- - A common way to make your pod CIDR routable on your on-premises network is to advertise pod addresses with BGP. To use BGP with Calico, you must set `installation.calicoNetwork.bgp: Enabled` in your Helm configuration. For more information on Calico's BGP support, see link:https://docs.tigera.io/calico/latest/networking/configuring/bgp[Configure BGP peering] in the Calico documentation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- - A common way to make your pod CIDR routable on your on-premises network is to advertise pod addresses with BGP. To use BGP with Calico, you must set `installation.calicoNetwork.bgp: Enabled` in your Helm configuration. For more information on Calico's BGP support, see link:https://docs.tigera.io/calico/latest/networking/configuring/bgp[Configure BGP peering] in the Calico documentation.
- A common way to make your pod CIDR routable on your on-premises network is to advertise pod addresses with BGP. To use BGP with Calico, you must set `installation.calicoNetwork.bgp: Enabled` in your Helm configuration. For more information on Calico's BGP support, see link:https://docs.tigera.io/calico/latest/networking/configuring/bgp[Configure BGP peering] in the Calico documentation.


If you are running a mixed mode cluster with both hybrid nodes and nodes in {aws} Cloud, we recommend that you have at least one CoreDNS replica on hybrid nodes and at least one CoreDNS replica on your nodes in {aws} Cloud. CoreDNS can be configured such that your workloads will use the closest CoreDNS replica, meaning your cloud workloads will use the CoreDNS running in the cloud and your hybrid workloads will use the CoreDNS running on hybrid nodes. See the steps below for how to configure CoreDNS for a mixed mode cluster.

. Add a topology zone label for each of your hybrid nodes, for example `topology.kubernetes.io/zone: onprem`. This can alternatively be done at the `nodeadm init` phase. Note, nodes running in {aws} Cloud automatically get a topology zone label applied to them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the topology label added at the nodeadm init by including a label to the NodeConfig spec? If yes, it could help to call that out.

peerASN: PEER_ASN
peerAddress: ONPREM_ROUTER_IP
peerASN: [.replaceable]`PEER_ASN`
peerAddress: [.replaceable]`ONPREM_ROUTER_IP`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide guidance on how to find localASN, peerASN, and peerAddress, or at least what those values represent? I've always been the most ucerain at this step.

== Calico considerations

- It is recommended to run Calico in overlay / tunnel mode with VXLAN as the link:https://docs.tigera.io/calico/latest/networking/configuring/vxlan-ipip[encapsulation method]. This mode has the fewest requirements on the underlying physical network. For more information on the different Calico networking modes, see https://docs.tigera.io/calico/latest/networking/determine-best-networking[Determining the best networking option] in the Calico documentation.
- It is recommended to run Calico with `natOutgoing` set to `true`. With `natOutgoing` set to `true` the source IP address of all pod traffic leaving the cluster to the IP address of the node. This makes it possible to run Calico with Amazon EKS clusters that have remote pod networks configured and with clusters that don't have remote pod networks configured. If you disable `natOutgoing`, then your pod CIDRs must be routable on your on-premises network and you must configure your Amazon EKS cluster with your remote pod networks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- It is recommended to run Calico with `natOutgoing` set to `true`. With `natOutgoing` set to `true` the source IP address of all pod traffic leaving the cluster to the IP address of the node. This makes it possible to run Calico with Amazon EKS clusters that have remote pod networks configured and with clusters that don't have remote pod networks configured. If you disable `natOutgoing`, then your pod CIDRs must be routable on your on-premises network and you must configure your Amazon EKS cluster with your remote pod networks.
- It is recommended to run Calico with `natOutgoing` set to `true`. With `natOutgoing` set to `true` the source IP address of all pod traffic leaving the cluster is translated to the IP address of the node. This makes it possible to run Calico with Amazon EKS Hybrid Nodes, whether or not remote pod networks are configured on the cluster. If you disable `natOutgoing`, then your pod CIDRs must be routable on your on-premises network and you must configure your Amazon EKS cluster with your remote pod networks.

- It is recommended to run Calico with `natOutgoing` set to `true`. With `natOutgoing` set to `true` the source IP address of all pod traffic leaving the cluster to the IP address of the node. This makes it possible to run Calico with Amazon EKS clusters that have remote pod networks configured and with clusters that don't have remote pod networks configured. If you disable `natOutgoing`, then your pod CIDRs must be routable on your on-premises network and you must configure your Amazon EKS cluster with your remote pod networks.
- If you are running webhooks on your hybrid nodes, your pod CIDRs must be routable on your on-premises network and you must configure your Amazon EKS cluster with your remote pod networks. If your pod CIDRs are not routable on your on-premises network, then it is recommended to run webhooks on cloud nodes in the same cluster. See <<hybrid-nodes-networking#mixed-mode-clusters, Mixed mode clusters>> for more information.
- - A common way to make your pod CIDR routable on your on-premises network is to advertise pod addresses with BGP. To use BGP with Calico, you must set `installation.calicoNetwork.bgp: Enabled` in your Helm configuration. For more information on Calico's BGP support, see link:https://docs.tigera.io/calico/latest/networking/configuring/bgp[Configure BGP peering] in the Calico documentation.
- The default IP Address Management (IPAM) in Calico is called link:https://docs.tigera.io/calico/latest/networking/ipam/get-started-ip-addresses#calico-ipam[Calico IPAM], where the `calico-ipam` plugin allocates IP addresses for each node based on user-configured pod CIDRs. The pod CIDRs are configured with the `installation.calicoNetwork.ipPools.cidr` Helm value. Calico allocates segments from the `ipPools.cidr` to each node. The size of the per node segments is configured with the `ipPools.blockSize` Helm value. The `ipPools.cidr` should match the remote pod network CIDRs you configured for your Amazon EKS cluster. For more information on IPAM with Calico, see link:https://docs.tigera.io/calico/latest/networking/ipam/get-started-ip-addresses[Get started with IP address management] in the Calico documentation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The default IP Address Management (IPAM) in Calico is called link:https://docs.tigera.io/calico/latest/networking/ipam/get-started-ip-addresses#calico-ipam[Calico IPAM], where the `calico-ipam` plugin allocates IP addresses for each node based on user-configured pod CIDRs. The pod CIDRs are configured with the `installation.calicoNetwork.ipPools.cidr` Helm value. Calico allocates segments from the `ipPools.cidr` to each node. The size of the per node segments is configured with the `ipPools.blockSize` Helm value. The `ipPools.cidr` should match the remote pod network CIDRs you configured for your Amazon EKS cluster. For more information on IPAM with Calico, see link:https://docs.tigera.io/calico/latest/networking/ipam/get-started-ip-addresses[Get started with IP address management] in the Calico documentation.
- The default IP Address Management (IPAM) in Calico is called link:https://docs.tigera.io/calico/latest/networking/ipam/get-started-ip-addresses#calico-ipam[Calico IPAM], where the `calico-ipam` plugin allocates IP addresses for each node based on user-configured pod CIDRs. The pod CIDRs are configured with the `installation.calicoNetwork.ipPools.cidr` Helm value, which should match the remote pod network CIDRs you configured for your Amazon EKS cluster. Calico allocates segments from the `ipPools.cidr` to each node. The size of the per node segments is configured with the `ipPools.blockSize` Helm value. For more information on IPAM with Calico, see link:https://docs.tigera.io/calico/latest/networking/ipam/get-started-ip-addresses[Get started with IP address management] in the Calico documentation.

.. Replace `POD_CIDR` with the CIDR ranges for your pods. If you configured your Amazon EKS cluster with remote pod networks, the `POD_CIDR` that you specify for Calico should be the same as the remote pod networks. For example, `10.100.0.0/24`.
.. Replace `CIDR_SIZE` with the size of the CIDR segment you want to allocate to each node. For example, `25` for a /25 segment size. For more information on CIDR `blockSize` and changing the `blockSize`, see https://docs.tigera.io/calico/latest/networking/ipam/change-block-size[Change IP pool block size] in the Calico documentation.
.. In the example below, `natOutgoing` is enabled and `bgp` is disabled. In this configuration, Calico can run on Amazon EKS clusters that have Remote Pod Network configured and can run on clusters that do not have Remote Pod Network configured. If you have `natOutgoing` set to disabled, you must configure your cluster with your remote pod networks and your on-premises network must be able to properly route traffic destined for your pod CIDRs. A common way to advertise pod addresses with your on-premises network is by using BGP. To use BGP with Calico, you must enable `bgp`. The example below configures all of the Calico components to run on only the hybrid nodes, since they have the `eks.amazonaws.com/compute-type: hybrid` label. If you are running webhooks on your hybrid nodes, you must configure your cluster with your Remote Pod Networks and you must advertise your pod addresses with your on-premises network. The example below configures `controlPlaneReplicas: 1`, increase the value if you have multiple hybrid nodes and want to run the Calico control plane components in a highly available fashion.
. Create a YAML file called `calico-values.yaml`. The following example configures all Calico components to run on hybrid nodes only by setting affinity for the `eks.amazonaws.com/compute-type: hybrid` label.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
. Create a YAML file called `calico-values.yaml`. The following example configures all Calico components to run on hybrid nodes only by setting affinity for the `eks.amazonaws.com/compute-type: hybrid` label.
. Create a YAML file called `calico-values.yaml`. The following example configures all Calico components to run only on hybrid nodes by setting affinity for the `eks.amazonaws.com/compute-type: hybrid` label.

@@ -63,23 +63,32 @@ Cilium version `1.16.x` is supported and recommended for EKS Hybrid Nodes for ev
|Yes
|===

== Cilium considerations

- By default, Cilium is configured to run in overlay / tunnel mode with VXLAN as the link:https://docs.cilium.io/en/stable/network/concepts/routing/#encapsulation[encapsulation method]. This mode has the fewest requirements on the underlying physical network.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- By default, Cilium is configured to run in overlay / tunnel mode with VXLAN as the link:https://docs.cilium.io/en/stable/network/concepts/routing/#encapsulation[encapsulation method]. This mode has the fewest requirements on the underlying physical network.
- By default, Cilium is configured to run in overlay / tunnel mode with VXLAN as the link:https://docs.cilium.io/en/stable/network/concepts/routing/#encapsulation[encapsulation method]. This mode is the most flexible to the configuration of the underlying physical network..

== Cilium considerations

- By default, Cilium is configured to run in overlay / tunnel mode with VXLAN as the link:https://docs.cilium.io/en/stable/network/concepts/routing/#encapsulation[encapsulation method]. This mode has the fewest requirements on the underlying physical network.
- By default, Cilium link:https://docs.cilium.io/en/stable/network/concepts/masquerading/[masquerades] the source IP address of all pod traffic leaving the cluster to the IP address of the node. This makes it possible to run Cilium with Amazon EKS clusters that have remote pod networks configured and with clusters that don't have remote pod networks configured. If you disable masquerading, then your pod CIDRs must be routable on your on-premises network and you must configure your Amazon EKS cluster with your remote pod networks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- By default, Cilium link:https://docs.cilium.io/en/stable/network/concepts/masquerading/[masquerades] the source IP address of all pod traffic leaving the cluster to the IP address of the node. This makes it possible to run Cilium with Amazon EKS clusters that have remote pod networks configured and with clusters that don't have remote pod networks configured. If you disable masquerading, then your pod CIDRs must be routable on your on-premises network and you must configure your Amazon EKS cluster with your remote pod networks.
- By default, Cilium link:https://docs.cilium.io/en/stable/network/concepts/masquerading/[masquerades] the source IP address of all pod traffic leaving the cluster to the IP address of the node. This makes it possible to run Cilium with Amazon EKS Hybrid Nodes whether or not remote pod networks are configured on the cluster. If you disable masquerading, then your pod CIDRs must be routable on your on-premises network and you must configure your Amazon EKS cluster with your remote pod networks.

- By default, Cilium link:https://docs.cilium.io/en/stable/network/concepts/masquerading/[masquerades] the source IP address of all pod traffic leaving the cluster to the IP address of the node. This makes it possible to run Cilium with Amazon EKS clusters that have remote pod networks configured and with clusters that don't have remote pod networks configured. If you disable masquerading, then your pod CIDRs must be routable on your on-premises network and you must configure your Amazon EKS cluster with your remote pod networks.
- If you are running webhooks on your hybrid nodes, your pod CIDRs must be routable on your on-premises network and you must configure your Amazon EKS cluster with your remote pod networks. If your pod CIDRs are not routable on your on-premises network, then it is recommended to run webhooks on cloud nodes in the same cluster. See <<hybrid-nodes-networking#mixed-mode-clusters, Mixed mode clusters>> for more information.
- A common way to make your pod CIDR routable on your on-premises network is to advertise pod addresses with BGP. To use BGP with Cilium, you must set `bgpControlPlane.enabled: true` in your Helm configuration. For more information on Cilium's BGP support, see https://docs.cilium.io/en/stable/network/bgp-control-plane/bgp-control-plane/[Cilium BGP Control Plane] in the Cilium documentation.
- The default IP Address Management (IPAM) in Cilium is called link:https://docs.cilium.io/en/stable/network/concepts/ipam/cluster-pool/[Cluster Scope], where the Cilium operator allocates IP addresses for each node based on user-configured pod CIDRs. The pod CIDRs are configured with the `clusterPoolIPv4PodCIDRList` Helm value. Cilium allocates segments from the `clusterPoolIPv4PodCIDRList` to each node. The size of the per node segments is configured with the `clusterPoolIPv4MaskSize` Helm value. The `clusterPoolIPv4PodCIDRList` should match the remote pod network CIDRs you configured for your Amazon EKS cluster. For more information on the `clusterPoolIPv4PodCIDRList` and `clusterPoolIPv4MaskSize`, see https://docs.cilium.io/en/stable/network/concepts/ipam/cluster-pool/#expanding-the-cluster-pool[Expanding the cluster pool] in the Cilium documentation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The default IP Address Management (IPAM) in Cilium is called link:https://docs.cilium.io/en/stable/network/concepts/ipam/cluster-pool/[Cluster Scope], where the Cilium operator allocates IP addresses for each node based on user-configured pod CIDRs. The pod CIDRs are configured with the `clusterPoolIPv4PodCIDRList` Helm value. Cilium allocates segments from the `clusterPoolIPv4PodCIDRList` to each node. The size of the per node segments is configured with the `clusterPoolIPv4MaskSize` Helm value. The `clusterPoolIPv4PodCIDRList` should match the remote pod network CIDRs you configured for your Amazon EKS cluster. For more information on the `clusterPoolIPv4PodCIDRList` and `clusterPoolIPv4MaskSize`, see https://docs.cilium.io/en/stable/network/concepts/ipam/cluster-pool/#expanding-the-cluster-pool[Expanding the cluster pool] in the Cilium documentation.
- The default IP Address Management (IPAM) in Cilium is called link:https://docs.cilium.io/en/stable/network/concepts/ipam/cluster-pool/[Cluster Scope], where the Cilium operator allocates IP addresses for each node based on user-configured pod CIDRs. The pod CIDRs are configured with the `clusterPoolIPv4PodCIDRList` Helm value, which should match the remote pod network CIDRs you configured for your Amazon EKS cluster. Cilium allocates segments from the `clusterPoolIPv4PodCIDRList` to each node. The size of the per node segments is configured with the `clusterPoolIPv4MaskSize` Helm value. For more information on the `clusterPoolIPv4PodCIDRList` and `clusterPoolIPv4MaskSize`, see https://docs.cilium.io/en/stable/network/concepts/ipam/cluster-pool/#expanding-the-cluster-pool[Expanding the cluster pool] in the Cilium documentation.

By default, Cilium masquerades the source IP address of all pod traffic leaving the cluster to the IP address of the node. This makes it possible for Cilium to run with Amazon EKS clusters that have remote pod networks configured and with clusters that don't have remote pod networks configured. If you disable masquerading for your Cilium deployment, then you must configure your Amazon EKS cluster with your remote pod networks and you must advertise your pod addresses with your on-premises network. If you are running webhooks on your hybrid nodes, you must configure your cluster with your remote pod networks and you must advertise your pod addresses with your on-premises network.
+
A common way to advertise pod addresses with your on-premises network is by using BGP. To use BGP with Cilium, you must set `bgpControlPlane.enabled: true`. For more information on Cilium's BGP support, see https://docs.cilium.io/en/stable/network/bgp-control-plane/bgp-control-plane/[Cilium BGP Control Plane] in the Cilium documentation.
. Create a YAML file called `cilium-values.yaml`. The following example configures Cilium to run on hybrid nodes only by setting affinity for the `eks.amazonaws.com/compute-type: hybrid` label.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
. Create a YAML file called `cilium-values.yaml`. The following example configures Cilium to run on hybrid nodes only by setting affinity for the `eks.amazonaws.com/compute-type: hybrid` label.
. Create a YAML file called `cilium-values.yaml`. The following example configures Cilium to run only on hybrid nodes by setting affinity for the `eks.amazonaws.com/compute-type: hybrid` label.

@@ -338,6 +350,14 @@ The interfaces and routes configured by Cilium are not removed by default when t
kubectl get crds -oname | grep "cilium" | xargs kubectl delete
----

== Calico considerations

- It is recommended to run Calico in overlay / tunnel mode with VXLAN as the link:https://docs.tigera.io/calico/latest/networking/configuring/vxlan-ipip[encapsulation method]. This mode has the fewest requirements on the underlying physical network. For more information on the different Calico networking modes, see https://docs.tigera.io/calico/latest/networking/determine-best-networking[Determining the best networking option] in the Calico documentation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- It is recommended to run Calico in overlay / tunnel mode with VXLAN as the link:https://docs.tigera.io/calico/latest/networking/configuring/vxlan-ipip[encapsulation method]. This mode has the fewest requirements on the underlying physical network. For more information on the different Calico networking modes, see https://docs.tigera.io/calico/latest/networking/determine-best-networking[Determining the best networking option] in the Calico documentation.
- It is recommended to run Calico in overlay / tunnel mode with VXLAN as the link:https://docs.tigera.io/calico/latest/networking/configuring/vxlan-ipip[encapsulation method]. This mode is the most flexible to the configuration of the underlying physical network. For more information on the different Calico networking modes, see https://docs.tigera.io/calico/latest/networking/determine-best-networking[Determining the best networking option] in the Calico documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants