-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PRODENG-2850 Reworked drain node logic and upgrade the aws-simple tf chart #547
Conversation
cranzy
commented
Jan 24, 2025
- Upgrade the aws-simple tf chart to the latest aws provision module one
- Increased the timeout of the full smoke test to 50min and removed caching( for local testing mainly)
- Reworked the docker draining sequence so the leader node gets drained last
Signed-off-by: Dimitar <[email protected]>
Signed-off-by: Dimitar <[email protected]>
Signed-off-by: Dimitar <[email protected]>
"public" = true, | ||
"role" = "worker", | ||
"user_data" = "sudo ufw allow 7946/tcp ; sudo ufw allow 10250/tcp ", "platform" = "ubuntu_22.04" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we reverting to only Ubuntu for the simple/small smoke test? This is a major tactical change that needs to be justified.
Perhaps you need to separate out a change like this from your test fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a terraform template example. It doesn't affect in any way shape or form the simple or full smoke tests. As you can see, I haven't really changed the platforms. https://github.com/Mirantis/mcc/blob/main/examples/terraform/aws-simple/terraform.tfvars.template
The variables/config for each of the smoke tests is passed from within golang(terratest in particular)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the terraform example doesn't do anything, then why change it in this fix?
if err := mcr.DrainNode(swarmLeader, swarmLeader); err != nil { | ||
return fmt.Errorf("%s: drain leader node: %w", swarmLeader, err) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why drain in this approach separately from the uninstallMCR
. Why not just uninstall with the drain in this staged approach instead? Is it because the leader might get deleted before all of the managers are drained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is a big issue if the Lead manager gets MCR uninstalled before we drain all the other nodes. If it does, which it did previously, we get errors draining the remaining nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you already keep the leader drain for the end; this keeps the leader safe, even if you don't separate the drain from the removed.
fe1c507
to
4803cfa
Compare
Signed-off-by: Dimitar <[email protected]>
4803cfa
to
5c41799
Compare