Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create other ASG for calico #828

Merged
merged 1 commit into from
Feb 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions aws/cluster/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@

| Name | Source | Version |
|------|--------|---------|
| <a name="module_managed_node_group"></a> [managed\_node\_group](#module\_managed\_node\_group) | github.com/mattermost/mattermost-cloud-monitoring.git//aws/eks-managed-node-groups | v1.8.19 |
| <a name="module_managed_node_group"></a> [managed\_node\_group](#module\_managed\_node\_group) | github.com/mattermost/mattermost-cloud-monitoring.git//aws/eks-managed-node-groups | v1.8.42 |

## Resources

Expand Down Expand Up @@ -75,9 +75,8 @@
| [kubernetes_storage_class_v1.gp3](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/storage_class_v1) | resource |
| [local_file.kubeconfig](https://registry.terraform.io/providers/hashicorp/local/latest/docs/resources/file) | resource |
| [null_resource.calico_operator_configuration](https://registry.terraform.io/providers/hashicorp/null/latest/docs/resources/resource) | resource |
| [null_resource.delete_aws_node](https://registry.terraform.io/providers/hashicorp/null/latest/docs/resources/resource) | resource |
| [null_resource.install_calico_operator](https://registry.terraform.io/providers/hashicorp/null/latest/docs/resources/resource) | resource |
| [null_resource.refresh_eks_nodes](https://registry.terraform.io/providers/hashicorp/null/latest/docs/resources/resource) | resource |
| [null_resource.patch_aws_node](https://registry.terraform.io/providers/hashicorp/null/latest/docs/resources/resource) | resource |
| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source |
| [aws_eks_cluster.cluster](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster) | data source |
| [aws_eks_cluster_auth.cluster_auth](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster_auth) | data source |
Expand All @@ -100,6 +99,9 @@
| <a name="input_availability_zones"></a> [availability\_zones](#input\_availability\_zones) | List of availability zones to place the instances | `list(string)` | n/a | yes |
| <a name="input_aws_read_only_sso_role_name"></a> [aws\_read\_only\_sso\_role\_name](#input\_aws\_read\_only\_sso\_role\_name) | Name of the read only SSO iam role | `string` | `""` | no |
| <a name="input_aws_reserved_sso_id"></a> [aws\_reserved\_sso\_id](#input\_aws\_reserved\_sso\_id) | n/a | `string` | n/a | yes |
| <a name="input_calico_desired_size"></a> [calico\_desired\_size](#input\_calico\_desired\_size) | Desired size for the Calico node group | `number` | `3` | no |
| <a name="input_calico_max_size"></a> [calico\_max\_size](#input\_calico\_max\_size) | Maximum size for the Calico node group | `number` | `5` | no |
| <a name="input_calico_min_size"></a> [calico\_min\_size](#input\_calico\_min\_size) | Minimum size for the Calico node group | `number` | `2` | no |
| <a name="input_calico_operator_version"></a> [calico\_operator\_version](#input\_calico\_operator\_version) | n/a | `string` | `"v3.29.2"` | no |
| <a name="input_cidr_blocks"></a> [cidr\_blocks](#input\_cidr\_blocks) | n/a | `list(string)` | n/a | yes |
| <a name="input_cluster_short_name"></a> [cluster\_short\_name](#input\_cluster\_short\_name) | n/a | `string` | n/a | yes |
Expand Down
84 changes: 7 additions & 77 deletions aws/cluster/calico.tf
Original file line number Diff line number Diff line change
Expand Up @@ -14,21 +14,7 @@ resource "aws_secretsmanager_secret_version" "kubeconfig_secret_version" {
depends_on = [local.kubeconfig]
}

# 1. Handle aws-node DaemonSet deletion gracefully
resource "null_resource" "delete_aws_node" {
count = var.is_calico_enabled ? 1 : 0

provisioner "local-exec" {
command = <<EOF
KUBECONFIG=${path.root}/kubeconfig-${aws_eks_cluster.cluster.name} \
kubectl delete daemonset aws-node -n kube-system --ignore-not-found=true || true
EOF
}

depends_on = [resource.local_file.kubeconfig]
}

# 2. Install Calico operator only if it is not already installed
# Install Calico operator only if it is not already installed
resource "null_resource" "install_calico_operator" {
count = var.is_calico_enabled ? 1 : 0

Expand Down Expand Up @@ -68,72 +54,16 @@ resource "null_resource" "calico_operator_configuration" {
depends_on = [null_resource.install_calico_operator]
}

# 4. Refresh nodes **ONLY IF** Calico was newly installed (1-by-1)
resource "null_resource" "refresh_eks_nodes" {
resource "null_resource" "patch_aws_node" {
count = var.is_calico_enabled ? 1 : 0

provisioner "local-exec" {
command = <<EOF
if [ -f ${path.root}/calico_config_applied ]; then
echo "Rolling out new nodes for Calico changes (1-by-1 refresh)..."

BASE_ASG_NAME="${var.node_group_name}-arm-nodes"

# Fetch the actual ASG name (handles dynamic suffixes)
ASG_NAME=$(aws autoscaling describe-auto-scaling-groups \
--query "AutoScalingGroups[*].AutoScalingGroupName" \
--output text | tr '\t' '\n' | grep "^$BASE_ASG_NAME-" | awk 'NR==1{print $1}')

if [ -z "$ASG_NAME" ]; then
echo "Error: Auto Scaling Group matching '$BASE_ASG_NAME-' not found."
exit 1
fi

echo "Detected ASG: $ASG_NAME"

# Fetch instance IDs
INSTANCE_IDS=$(aws autoscaling describe-auto-scaling-groups \
--auto-scaling-group-names "$ASG_NAME" \
--query "AutoScalingGroups[0].Instances[*].InstanceId" \
--output text)

if [ -z "$INSTANCE_IDS" ]; then
echo "Error: No instances found in Auto Scaling Group $ASG_NAME."
exit 1
fi

for INSTANCE in $INSTANCE_IDS; do
if [ -n "$INSTANCE" ]; then
echo "Terminating instance: $INSTANCE"
aws autoscaling terminate-instance-in-auto-scaling-group --instance-id "$INSTANCE" --should-decrement-desired-capacity false

echo "Waiting for the new node to be ready..."
sleep 180 # Wait 3 minutes for the node to join the cluster

# Ensure the new node is in a Ready state before proceeding
while true; do
NODE_READY=$(KUBECONFIG=${path.root}/kubeconfig-${aws_eks_cluster.cluster.name} \
kubectl get nodes --no-headers | grep -c " Ready ")

if [ "$NODE_READY" -gt 0 ]; then
echo "New node is ready, proceeding to next."
break
fi

echo "Waiting for the node to become Ready..."
sleep 30
done
else
echo "Skipping invalid instance ID."
fi
done

echo "Node refresh complete."
else
echo "Skipping node refresh since Calico was already installed."
fi
EOF
KUBECONFIG=${path.root}/kubeconfig-${aws_eks_cluster.cluster.name} kubectl patch daemonset aws-node -n kube-system --type='json' -p='[
{ "op": "add", "path": "/spec/template/spec/nodeSelector", "value": { "calico": "false" } }
]'
EOF
}

depends_on = [null_resource.calico_operator_configuration]
depends_on = [aws_eks_cluster.cluster]
}
2 changes: 1 addition & 1 deletion aws/cluster/eks_addon.tf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
resource "aws_eks_addon" "vpc_cni" {
count = var.enable_vpc_cni_addon && !var.is_calico_enabled ? 1 : 0
count = var.enable_vpc_cni_addon ? 1 : 0

cluster_name = aws_eks_cluster.cluster.name
addon_name = "vpc-cni"
Expand Down
6 changes: 6 additions & 0 deletions aws/cluster/manifests/calico_installation.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,9 @@ spec:
type: Calico
calicoNetwork:
bgp: Disabled
nodeSelector:
calico: "true"
tolerations:
- key: "calico"
operator: "Exists"
effect: "NoSchedule"
17 changes: 17 additions & 0 deletions aws/cluster/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -256,3 +256,20 @@ variable "calico_operator_version" {
default = "v3.29.2"
}

variable "calico_desired_size" {
description = "Desired size for the Calico node group"
type = number
default = 3
}

variable "calico_min_size" {
description = "Minimum size for the Calico node group"
type = number
default = 2
}

variable "calico_max_size" {
description = "Maximum size for the Calico node group"
type = number
default = 5
}
6 changes: 5 additions & 1 deletion aws/cluster/worker_asg.tf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
module "managed_node_group" {
source = "github.com/mattermost/mattermost-cloud-monitoring.git//aws/eks-managed-node-groups?ref=v1.8.19"
source = "github.com/mattermost/mattermost-cloud-monitoring.git//aws/eks-managed-node-groups?ref=v1.8.42"
vpc_security_group_ids = [aws_security_group.worker-sg.id]
volume_size = var.node_volume_size
volume_type = var.node_volume_type
Expand Down Expand Up @@ -39,4 +39,8 @@ module "managed_node_group" {
api_server_endpoint = aws_eks_cluster.cluster.endpoint
certificate_authority = aws_eks_cluster.cluster.certificate_authority[0].data
service_ipv4_cidr = local.service_cidr
is_calico_enabled = var.is_calico_enabled
calico_min_size = var.calico_min_size
calico_desired_size = var.calico_desired_size
calico_max_size = var.calico_max_size
}
5 changes: 5 additions & 0 deletions aws/eks-managed-node-groups/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ No modules.

| Name | Type |
|------|------|
| [aws_eks_node_group.calico_arm_nodes](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group) | resource |
| [aws_eks_node_group.general_arm_nodes_eks_cluster_ng](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group) | resource |
| [aws_eks_node_group.general_nodes_eks_cluster_ng](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group) | resource |
| [aws_eks_node_group.spot_nodes_eks_cluster_ng](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group) | resource |
Expand All @@ -41,6 +42,9 @@ No modules.
| <a name="input_arm_max_size"></a> [arm\_max\_size](#input\_arm\_max\_size) | The maximum number of arm nodes in the node group | `string` | n/a | yes |
| <a name="input_arm_min_size"></a> [arm\_min\_size](#input\_arm\_min\_size) | The minimum number of arm nodes in the node group | `string` | n/a | yes |
| <a name="input_availability_zones"></a> [availability\_zones](#input\_availability\_zones) | List of availability zones to place the instances | `list(string)` | n/a | yes |
| <a name="input_calico_desired_size"></a> [calico\_desired\_size](#input\_calico\_desired\_size) | Desired size for the Calico node group | `number` | `3` | no |
| <a name="input_calico_max_size"></a> [calico\_max\_size](#input\_calico\_max\_size) | Maximum size for the Calico node group | `number` | `5` | no |
| <a name="input_calico_min_size"></a> [calico\_min\_size](#input\_calico\_min\_size) | Minimum size for the Calico node group | `number` | `2` | no |
| <a name="input_certificate_authority"></a> [certificate\_authority](#input\_certificate\_authority) | The certificate authority data for the EKS cluster | `string` | n/a | yes |
| <a name="input_cluster_name"></a> [cluster\_name](#input\_cluster\_name) | The name of the cluster that the node group will be assigned to | `string` | n/a | yes |
| <a name="input_cluster_short_name"></a> [cluster\_short\_name](#input\_cluster\_short\_name) | A short name that identifies the cluster | `string` | n/a | yes |
Expand All @@ -50,6 +54,7 @@ No modules.
| <a name="input_enable_spot_nodes"></a> [enable\_spot\_nodes](#input\_enable\_spot\_nodes) | If true, spot nodes will be created | `bool` | `false` | no |
| <a name="input_image_id"></a> [image\_id](#input\_image\_id) | The AMI ID used for the nodes in the node group | `string` | n/a | yes |
| <a name="input_instance_type"></a> [instance\_type](#input\_instance\_type) | The instance type used for the nodes in the node group | `string` | n/a | yes |
| <a name="input_is_calico_enabled"></a> [is\_calico\_enabled](#input\_is\_calico\_enabled) | n/a | `bool` | `false` | no |
| <a name="input_max_size"></a> [max\_size](#input\_max\_size) | The maximum number of nodes in the node group | `string` | n/a | yes |
| <a name="input_min_size"></a> [min\_size](#input\_min\_size) | The minimum number of nodes in the node group | `string` | n/a | yes |
| <a name="input_node_group_name"></a> [node\_group\_name](#input\_node\_group\_name) | A name that can be used to identify the node group | `string` | n/a | yes |
Expand Down
126 changes: 126 additions & 0 deletions aws/eks-managed-node-groups/graviton_node_groups.tf
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,129 @@ resource "aws_eks_node_group" "general_arm_nodes_eks_cluster_ng" {
create_before_destroy = true
}
}

resource "aws_launch_template" "cluster_nodes_eks_arm_launch_template" {
name = "${var.cluster_short_name}_cluster_arm_launch_template"
description = "${var.cluster_short_name} cluster arm nodes launch template"

vpc_security_group_ids = var.vpc_security_group_ids

block_device_mappings {
device_name = "/dev/xvda"

ebs {
volume_size = var.volume_size
volume_type = var.volume_type
}
}

# Support for AL2 and AL2023 images dynamically
image_id = var.use_al2023 ? var.al2023_arm_image_id : var.arm_image_id
instance_type = var.arm_instance_type
ebs_optimized = var.ebs_optimized

user_data = var.use_al2023 ? base64encode(<<USERDATA
#!/bin/bash
echo "export AWS_REGION=${data.aws_region.current.name}" >> /etc/environment
source /etc/environment
cat <<EOF > /etc/eks/nodeadm-config.yaml
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
cluster:
name: ${var.cluster_name}
apiServerEndpoint: |
${var.api_server_endpoint}
certificateAuthority: |
${var.certificate_authority}
cidr: ${var.service_ipv4_cidr}
EOF

/usr/local/bin/nodeadm init -c file:///etc/eks/nodeadm-config.yaml
USERDATA
) : base64encode(<<USERDATA
#!/bin/bash
/etc/eks/bootstrap.sh --apiserver-endpoint '${var.api_server_endpoint}' --b64-cluster-ca '${var.certificate_authority}' '${var.cluster_name}' --kubelet-extra-args "--kube-reserved cpu=250m,memory=1Gi,ephemeral-storage=1Gi --system-reserved cpu=250m,memory=0.2Gi,ephemeral-storage=1Gi --eviction-hard memory.available<0.2Gi,nodefs.available<10%"
USERDATA
)

tag_specifications {
resource_type = "instance"

tags = {
Name = "${var.cluster_short_name}-arm-cluster-nodes"
KubernetesCluster = var.cluster_name
VpcID = var.vpc_id
}
}
}

resource "aws_eks_node_group" "general_arm_nodes_eks_cluster_ng" {
cluster_name = var.cluster_name
node_group_name = "${var.node_group_name}-arm-nodes"

node_role_arn = var.node_role_arn

subnet_ids = var.subnet_ids

tags = {
"Name" : "${var.deployment_name}-worker",
"kubernetes.io/cluster/${var.deployment_name}" : "owned",
"k8s.io/cluster-autoscaler/enabled" : "on",
"k8s.io/cluster-autoscaler/${var.deployment_name}" : "on"
KubernetesCluster : var.cluster_name

}

labels = {
Name = "${var.node_group_name}_arm_nodes"
}

scaling_config {
desired_size = var.arm_desired_size
max_size = var.arm_max_size
min_size = var.arm_min_size
}

launch_template {
name = aws_launch_template.cluster_nodes_eks_arm_launch_template.name
version = aws_launch_template.cluster_nodes_eks_arm_launch_template.latest_version
}

lifecycle {
create_before_destroy = true
}
}

resource "aws_eks_node_group" "calico_arm_nodes" {
count = var.is_calico_enabled ? 1 : 0
cluster_name = var.cluster_name
node_group_name = "${var.node_group_name}-calico-arm-nodes"
node_role_arn = var.node_role_arn
subnet_ids = var.subnet_ids

scaling_config {
desired_size = var.calico_desired_size
min_size = var.calico_min_size
max_size = var.calico_max_size
}

labels = {
"calico" = "true"
}

taints {
key = "calico"
value = "only"
effect = "NO_SCHEDULE"
}

launch_template {
name = aws_launch_template.cluster_nodes_eks_arm_launch_template.name
version = aws_launch_template.cluster_nodes_eks_arm_launch_template.latest_version
}

lifecycle {
create_before_destroy = true
}
}
23 changes: 23 additions & 0 deletions aws/eks-managed-node-groups/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -180,3 +180,26 @@ variable "service_ipv4_cidr" {
description = "The service IPv4 CIDR range for the EKS cluster"
type = string
}

variable "is_calico_enabled" {
type = bool
default = false
}

variable "calico_desired_size" {
description = "Desired size for the Calico node group"
type = number
default = 3
}

variable "calico_min_size" {
description = "Minimum size for the Calico node group"
type = number
default = 2
}

variable "calico_max_size" {
description = "Maximum size for the Calico node group"
type = number
default = 5
}