Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PRODENG-2850 Reworked drain node logic and upgrade the aws-simple tf chart #547

Merged
merged 4 commits into from
Jan 28, 2025

Conversation

cranzy
Copy link
Contributor

@cranzy cranzy commented Jan 24, 2025

  • Upgrade the aws-simple tf chart to the latest aws provision module one
  • Increased the timeout of the full smoke test to 50min and removed caching( for local testing mainly)
  • Reworked the docker draining sequence so the leader node gets drained last

examples/terraform/aws-simple/terraform.tfvars.template Outdated Show resolved Hide resolved
examples/terraform/aws-simple/terraform.tfvars.template Outdated Show resolved Hide resolved
"public" = true,
"role" = "worker",
"user_data" = "sudo ufw allow 7946/tcp ; sudo ufw allow 10250/tcp ", "platform" = "ubuntu_22.04"
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we reverting to only Ubuntu for the simple/small smoke test? This is a major tactical change that needs to be justified.
Perhaps you need to separate out a change like this from your test fixes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a terraform template example. It doesn't affect in any way shape or form the simple or full smoke tests. As you can see, I haven't really changed the platforms. https://github.com/Mirantis/mcc/blob/main/examples/terraform/aws-simple/terraform.tfvars.template
The variables/config for each of the smoke tests is passed from within golang(terratest in particular)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the terraform example doesn't do anything, then why change it in this fix?

if err := mcr.DrainNode(swarmLeader, swarmLeader); err != nil {
return fmt.Errorf("%s: drain leader node: %w", swarmLeader, err)
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why drain in this approach separately from the uninstallMCR. Why not just uninstall with the drain in this staged approach instead? Is it because the leader might get deleted before all of the managers are drained?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a big issue if the Lead manager gets MCR uninstalled before we drain all the other nodes. If it does, which it did previously, we get errors draining the remaining nodes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you already keep the leader drain for the end; this keeps the leader safe, even if you don't separate the drain from the removed.

Makefile Show resolved Hide resolved
@cranzy cranzy force-pushed the 2850-fix-the-smoke-tests branch from fe1c507 to 4803cfa Compare January 24, 2025 18:15
@cranzy cranzy requested a review from james-nesbitt January 24, 2025 18:16
@cranzy cranzy force-pushed the 2850-fix-the-smoke-tests branch from 4803cfa to 5c41799 Compare January 27, 2025 18:04
@cranzy cranzy requested a review from pgedara January 27, 2025 18:35
@james-nesbitt james-nesbitt merged commit 0e28bdc into main Jan 28, 2025
7 checks passed
@james-nesbitt james-nesbitt deleted the 2850-fix-the-smoke-tests branch January 28, 2025 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants