PRODENG-2850 Reworked drain node logic and upgrade the aws-simple tf chart #547

cranzy · 2025-01-24T05:13:16Z

Upgrade the aws-simple tf chart to the latest aws provision module one
Increased the timeout of the full smoke test to 50min and removed caching( for local testing mainly)
Reworked the docker draining sequence so the leader node gets drained last

Signed-off-by: Dimitar <[email protected]>

examples/terraform/aws-simple/terraform.tfvars.template

james-nesbitt · 2025-01-24T06:58:11Z

examples/terraform/aws-simple/terraform.tfvars.template

+        "public" = true,
+        "role" = "worker",
+        "user_data" = "sudo ufw allow 7946/tcp ; sudo ufw allow 10250/tcp ", "platform" = "ubuntu_22.04"
+    }


why are we reverting to only Ubuntu for the simple/small smoke test? This is a major tactical change that needs to be justified.
Perhaps you need to separate out a change like this from your test fixes.

This is just a terraform template example. It doesn't affect in any way shape or form the simple or full smoke tests. As you can see, I haven't really changed the platforms. https://github.com/Mirantis/mcc/blob/main/examples/terraform/aws-simple/terraform.tfvars.template
The variables/config for each of the smoke tests is passed from within golang(terratest in particular)

if the terraform example doesn't do anything, then why change it in this fix?

james-nesbitt · 2025-01-24T07:00:55Z

pkg/product/mke/phase/uninstall_mcr.go

+	if err := mcr.DrainNode(swarmLeader, swarmLeader); err != nil {
+		return fmt.Errorf("%s: drain leader node: %w", swarmLeader, err)
+	}
+


why drain in this approach separately from the uninstallMCR. Why not just uninstall with the drain in this staged approach instead? Is it because the leader might get deleted before all of the managers are drained?

Yes, there is a big issue if the Lead manager gets MCR uninstalled before we drain all the other nodes. If it does, which it did previously, we get errors draining the remaining nodes.

you already keep the leader drain for the end; this keeps the leader safe, even if you don't separate the drain from the removed.

Makefile

examples/terraform/aws-simple/terraform.tfvars.template

Signed-off-by: Dimitar <[email protected]>

cranzy added 3 commits January 23, 2025 19:24

PRODENG-2850 Removed caching and increased timeout for the smoke test

5459e2b

Signed-off-by: Dimitar <[email protected]>

Upgraded the aws-simple chart to the latest upstream tf provision module

a6efba8

Signed-off-by: Dimitar <[email protected]>

Reworked draining nodes logic and pingHost retries/delay

abb12aa

Signed-off-by: Dimitar <[email protected]>

cranzy requested review from james-nesbitt, pgedara and nekwar January 24, 2025 06:37

james-nesbitt reviewed Jan 24, 2025

View reviewed changes

cranzy force-pushed the 2850-fix-the-smoke-tests branch from fe1c507 to 4803cfa Compare January 24, 2025 18:15

cranzy requested a review from james-nesbitt January 24, 2025 18:16

pgedara reviewed Jan 27, 2025

View reviewed changes

examples/terraform/aws-simple/terraform.tfvars.template Outdated Show resolved Hide resolved

Fixed PR feedback for the tfvar template

5c41799

Signed-off-by: Dimitar <[email protected]>

cranzy force-pushed the 2850-fix-the-smoke-tests branch from 4803cfa to 5c41799 Compare January 27, 2025 18:04

cranzy requested a review from pgedara January 27, 2025 18:35

james-nesbitt approved these changes Jan 28, 2025

View reviewed changes

james-nesbitt merged commit 0e28bdc into main Jan 28, 2025
7 checks passed

james-nesbitt deleted the 2850-fix-the-smoke-tests branch January 28, 2025 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PRODENG-2850 Reworked drain node logic and upgrade the aws-simple tf chart #547

PRODENG-2850 Reworked drain node logic and upgrade the aws-simple tf chart #547

cranzy commented Jan 24, 2025

james-nesbitt Jan 24, 2025

cranzy Jan 24, 2025

james-nesbitt Jan 27, 2025

james-nesbitt Jan 24, 2025

cranzy Jan 24, 2025

james-nesbitt Jan 28, 2025

PRODENG-2850 Reworked drain node logic and upgrade the aws-simple tf chart #547

PRODENG-2850 Reworked drain node logic and upgrade the aws-simple tf chart #547

Conversation

cranzy commented Jan 24, 2025

james-nesbitt Jan 24, 2025

Choose a reason for hiding this comment

cranzy Jan 24, 2025

Choose a reason for hiding this comment

james-nesbitt Jan 27, 2025

Choose a reason for hiding this comment

james-nesbitt Jan 24, 2025

Choose a reason for hiding this comment

cranzy Jan 24, 2025

Choose a reason for hiding this comment

james-nesbitt Jan 28, 2025

Choose a reason for hiding this comment