This guide shows how to install a 5 node highly-available kubeadm cluster on AWS EC2 instances. If using the KodeKloud AWS Playground environment, please ensure you have selected region us-east-1
(N. Virginia) from the region selection at the top right of the AWS console. To maintain compatibility with the playground permissions, we will use the following EC2 instance configuration.
- Instance type:
t3.medium
- Operating System: Ubuntu 22.04 (at time of writing)
- Storage:
gp2
, 8GB
Note that this is an exercise in simply getting a cluster running and is a learning exercise only! It will not be suitable for serving workloads to the internet, nor will it be properly secured, otherwise this guide would be three times longer! It should not be used as a basis for building a production cluster.
We will provision the following infrastructure. The infrastructure will be created by Terraform, so as not to spend too much of the lab time just getting that provisioned, and to allow you to focus on the cluster installation.
As can be seen in this diagram, we will create five EC2 instances to form the cluster - 3 control planes and 2 workers, plus one load balancer to provide access to the API server endpoints and a further one student-node
from which to perform the configuration. We build the infrastructure using Terraform from AWS CloudShell (so you don't have to install Terraform on your workstation), then log into student-node
which can access the cluster nodes. This relationship between student-node
and the cluster nodes is similar to CKA Ultimate Mocks and how the real exam works - you start on a separate node (in this case student-node
), then use SSH to connect to cluster nodes. Note that SSH connections are only possible in the direction of the arrows. It is not possible to SSH from e.g. controlplane
directly to node01
. You must exit
to student-node
first. This is also how it is in the exam. student-node
assumes the role of a bastion host.
We will also set up direct connection from your workstation to the node ports of the workers so that you can browse any NodePort services you create (see security below).
Some basic security will be configured:
- Only the
student-node
will be able to access the cluster's API Server, and this is where you will runkubectl
commands from when the cluster is running. - Only the
student-node
can SSH to the cluster nodes. - Ports required by Kubernetes itself (inc. etcd) and Calico CNI will be configured in security groups on the cluster nodes.
Security issues that would make this unsuitable for a genuine production cluster:
- The kube nodes should be on private subnets (no direct access from the Internet) and placed behind a NAT gateway to allow them to download packages, or with a more extreme security posture, completely airgapped.
- Access to API server and etcd would be more tightly controlled.
- Use of default VPC is not recommended.
- The node ports will be open to the world - i.e. anyone can connect to them.
- A cloud load balancer coupled with an ingress controller would be provisioned to provide ingress to the cluster. It is definitely not recommended to expose the worker nodes' node ports to the Internet as we are doing here!!!
Other things that will be configured by the Terraform code
- Host names set on the nodes:
loadbalancer
,controlplane01
,controlplane02
,controlplane03
,node01
,node02
- Content of
/etc/hosts
set up on all nodes for easy use ofssh
command fromstudent-node
. - Generation and distribution of a key pair for logging into instances via SSH.
Let's go ahead and get the infrastructure built!
Click here to start a playground, and click START LAB
to request a new AWS Cloud Playground instance. After a few seconds, you will receive a URL and your credentials to access AWS Cloud console.
Note that you must have KodeKloud Pro subscription to run an AWS playground. If you have your own AWS account, this should still work, however you will bear the cost for any resources created until you delete them.
We will run this entire lab in AWS CloudShell which is a Linux terminal you run inside the AWS console and has most of what we need preconfigured, such as git and the AWS credentials needed by Terraform. Click here to open CloudShell.
From the CloudShell command prompt...
curl -O https://releases.hashicorp.com/terraform/1.6.2/terraform_1.6.2_linux_amd64.zip
unzip terraform_1.6.2_linux_amd64.zip
mkdir -p ~/bin
mv terraform ~/bin/
terraform version
git clone https://github.com/kodekloudhub/certified-kubernetes-administrator-course.git
Now change into the aws-ha/terraform
directory
cd certified-kubernetes-administrator-course/kubeadm-clusters/aws-ha/terraform
-
Run the terraform
terraform init terraform plan terraform apply
This should take about half a minute. If this all runs correctly, you will see something like the following at the end of all the output. IP addresses will be different for you
Apply complete! Resources: 43 added, 0 changed, 0 destroyed. Outputs: address_node01 = "54.224.201.244" address_node02 = "44.213.109.108" address_student_node = "3.92.232.115" connect_student_node = "ssh [email protected]"
Copy all these outputs to a notepad for later use.
-
Wait for all instances to be ready (Instance state -
running
, Status check -2/2 checks passed
). This will take 2-3 minutes. See EC2 console. -
Log into
student-node
Copy the
ssh
command from the terraform outputconnect_student_node
, e.g.Note that the IP address will be different for you.
We will install kubectl here so that we can run commands against the cluster when it is built
-
Install latest version of kubectl and place in the user programs directory
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" chmod +x kubectl sudo mv kubectl /usr/local/bin
-
Check
kubectl version
It should amongst other things tell you
The connection to the server localhost:8080 was refused - did you specify the right host or port?
which is fine, since we haven't installed kubernetes yet.
Now we will install the load balancer that serves as the endpoint for connecting to API server. This will round-robin API server requests between each of the control plane nodes. For this we will use HAProxy in TCP load balancing mode. In this mode it simply forwards all traffic to its back ends (the control planes) without changing it e.g. doing SSL termination.
First, be logged into student-node
as directed above.
-
Log into the load balancer
ssh loadbalancer
-
Become root (saves typing
sudo
before every command)sudo -i
-
Update the apt package index and install packages needed for HAProxy:
apt-get update apt-get install -y haproxy
-
Get IP addresses of the loadbalancer and 3 control planes and copy them to your notepad
dig +short loadbalancer dig +short controlplane01 dig +short controlplane02 dig +short controlplane03
-
Create the HAProxy configuration file
First we'll delete the default configuration, then add our own
rm /etc/haproxy/haproxy.cfg vi /etc/haproxy/haproxy.cfg
Now put the following content into the file. Replace
L.L.L.L
with the IP address ofloadbalancer
, andX.X.X.X
with IPs for each control plane nodefrontend kubernetes bind L.L.L.L:6443 option tcplog mode tcp default_backend kubernetes-control-nodes backend kubernetes-control-nodes mode tcp balance roundrobin option tcp-check server controlplane01 X.X.X.X:6443 check fall 3 rise 2 server controlplane02 X.X.X.X:6443 check fall 3 rise 2 server controlplane03 X.X.X.X:6443 check fall 3 rise 2
-
Restart and check haproxy
systemctl restart haproxy systemctl status haproxy
It should be warning us that no backend is available - which is true because we haven't installed Kubernetes yet!
-
Exit from
sudo
and then back tostudent-node
exit exit
First, be logged into student-node
as directed above.
Repeat the following steps on controlplane01
, controlplane02
, controlplane03
, node01
and node02
by SSH-ing from student-node
to each cluster node in turn, e.g.
ubuntu@student-node:~$ ssh controlplane01
Welcome to Ubuntu 22.04.2 LTS (GNU/Linux 5.19.0-1028-aws x86_64)
Last login: Tue Jul 25 15:27:07 2023 from 172.31.93.38
ubuntu@controlplane01:~$
Note that there's no step to disable swap, since EC2 instances are by default with swap disabled.
-
Become root (saves typing
sudo
before every command)sudo -i
-
Update the apt package index and install packages needed to use the Kubernetes apt repository:
apt-get update apt-get install -y apt-transport-https ca-certificates curl
-
Set up the required kernel modules and make them persistent
cat <<EOF > /etc/modules-load.d/k8s.conf overlay br_netfilter EOF modprobe overlay modprobe br_netfilter
-
Set the required kernel parameters and make them persistent
cat <<EOF > /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.ipv4.ip_forward = 1 EOF sysctl --system
-
Install the container runtime
apt-get install -y containerd
-
Configure the container runtime to use systemd Cgroups. This part is the bit many students miss, and if not done results in a controlplane that comes up, then all the pods start crashlooping.
kubectl
will also fail with an error likeThe connection to the server x.x.x.x:6443 was refused - did you specify the right host or port?
-
Create default configuration
mkdir -p /etc/containerd containerd config default > /etc/containerd/config.toml
-
Edit the configuration to set up CGroups
vi /etc/containerd/config.toml
Scroll down till you find a line with
SystemdCgroup = false
. Edit it to beSystemdCgroup = true
, then save and exit vi -
Restart containerd
systemctl restart containerd
-
-
Get latest version of Kubernetes and store in a shell variable
KUBE_LATEST=$(curl -L -s https://dl.k8s.io/release/stable.txt | awk 'BEGIN { FS="." } { printf "%s.%s", $1, $2 }')
-
Download the Kubernetes public signing key
mkdir -p /etc/apt/keyrings curl -fsSL https://pkgs.k8s.io/core:/stable:/${KUBE_LATEST}/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
-
Add the Kubernetes apt repository
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/${KUBE_LATEST}/deb/ /" > /etc/apt/sources.list.d/kubernetes.list
-
Update apt package index, install kubelet, kubeadm and kubectl, and pin their version
apt-get update apt-get install -y kubelet kubeadm kubectl apt-mark hold kubelet kubeadm kubectl
-
Configure
crictl
in case we need it to examine running containerscrictl config \ --set runtime-endpoint=unix:///run/containerd/containerd.sock \ --set image-endpoint=unix:///run/containerd/containerd.sock
-
Exit root shell
exit
-
Return to
student-node
exit
Repeat the above till you have done
controlplane01
,controlplane02
,controlplane03
,node01
andnode02
To create a highly available control plane, we install kubeadm on the first control plane node almost the same way as for a single control plane cluster, then we join the other control plane nodes in a similar manner to joining worker nodes
-
ssh to
controlplane01
ssh controlplane01
-
Become root
sudo -i
-
Boot the first control plane using the IP address of the load balancer as the control plane endpoint
dig +short loadbalancer
Replace
L.L.L.L
with the IP address you got abovekubeadm init --control-plane-endpoint L.L.L.L:6443 --upload-certs --pod-network-cidr=192.168.0.0/16
Copy both join commands that are printed to a notepad for use on other control nodes and the worker nodes.
-
Install network plugin (calico). Weave does not work too well with HA clusters.
kubectl --kubeconfig /etc/kubernetes/admin.conf create -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.3/manifests/tigera-operator.yaml kubectl --kubeconfig /etc/kubernetes/admin.conf create -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.3/manifests/custom-resources.yaml
-
Check we are up and running
kubectl --kubeconfig /etc/kubernetes/admin.conf get pods -n kube-system
-
Exit root shell
exit
-
Prepare the kubeconfig file for copying to
student-node
node{ sudo cp /etc/kubernetes/admin.conf . sudo chmod 666 admin.conf }
-
Exit to
student-node
exit
Be on student-node
For each of controlplane02
and controlplane03
-
SSH to
controlplane02
-
Become root
sudo -i
-
Paste the join command for control nodes that was output by
kubeadm init
oncontrolplane01
-
Exit back to
student-node
exit exit
-
Repeat the steps 2,3 and 4 on
controlplane03
-
Copy kubeconfig down from
controlplane01
tostudent-node
and set proper permissionsmkdir -p ~/.kube scp controlplane01:~/admin.conf ~/.kube/config sudo chown $(id -u):$(id -g) ~/.kube/config chmod 600 ~/.kube/config
-
Test it
kubectl get pods -n kube-system
You should now see that there are 3 pods for each of the main control plane components. Also if you look at the kubeconfig file in
~/.kube/config
, you'll see that the IP addres for theserver:
entry is that of the load balancer.
-
SSH to
node01
-
Become root
sudo -i
-
Paste the join command for worker nodes that was output by
kubeadm init
oncontrolplane01
-
Return to
student-node
exit exit
-
Repeat the steps 2, 3 and 4 on
node02
-
Now you should be back on
student-node
. Check all nodes are upkubectl get nodes -o wide
There should now be 3 control nodes and 2 workers.
Run the following on student-node
-
Ensure all calico pods are running. They can take a while to initialise
watch kubectl get pods -n calico-system
Press
CTRL-C
to exit watch when pods are stable -
Deploy and expose an nginx pod
kubectl run nginx --image nginx --expose --port 80
-
Convert the service to NodePort
kubectl edit service nginx
Edit the
spec:
part of the service until it looks like this. Don't change anything abovespec:
spec: ports: - port: 80 protocol: TCP targetPort: 80 nodePort: 30080 selector: run: nginx sessionAffinity: None type: NodePort
-
Get the public IP of one of the worker nodes to use in the following steps. These were output by Terraform as
address_node01
andaddress_node02
. You can also find this by looking at the instances on the EC2 page of the AWS console. -
Test with curl on
student-node
Replace the IP address with the one you chose from the above step
```bash curl http://44.201.135.110:30080 ```
-
Test from your own browser
Replace the IP address with the one you chose from the above step
``` http://44.201.135.110:30080 ```
Those of you who are also studying our Terraform courses should look at the terraform files and try to understand what is happening here.
One point of note is that for the kubenode
instances, we create network interfaces for them as separate resources, then attach these ENIs to the instances when they are built. The reason for this is so that the IP addresses of the instances can be known in advance, such that during instance creation /etc/hosts
may be created by the user_data script.