Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Upgrading AKS Cluster to 1.31.1 (Preview) broke the cluster #4607

Open
sanket-t-shah opened this issue Oct 28, 2024 · 7 comments
Open
Labels
k8s-versions Issues associated with k8s versions question

Comments

@sanket-t-shah
Copy link

Describe scenario
I had a cluster running on 1.30.5. I initiated cluster upgrade to 1.31.1 (Preview). Post this, the cluster has moved to failed state.
I did tried to add more nodepools, they get added successfully but Ready Nodes count is zero.
I also tried to scale existing nodepools, stop and start them - that resulted in node pools being visible in Azure Portal but with zero active nodes.

Question
How can I recover cluster and get my services back online? I'd prefer not to re-create AKS cluster as this cluster has lot of important deployments running.

@PixelRobots
Copy link
Collaborator

Hello,

Have you tried to reconcile the cluster? You can use the following command to do that if you have not yet:

az aks update -g MyResourceGroup -n MyManagedCluster

If that has not resolved the issue, you can check the diagnose and solve section under the AKS blade in the portal.

If it is still down I would suggest you get a support ticket opened for this one.

Thanks Richard

@PixelRobots PixelRobots added the k8s-versions Issues associated with k8s versions label Oct 28, 2024
@sanket-t-shah
Copy link
Author

sanket-t-shah commented Oct 28, 2024 via email

@lareeth
Copy link

lareeth commented Oct 28, 2024

You should be able to add nodes using 1.30.5 which will be compatible with the 1.31 API until this issue is resolved

@sanket-t-shah
Copy link
Author

@lareeth - This worked for me. Thanks for the solution. 👍

So this seems to be a bug on Microsoft side - as AKS is not able to identify Nodepools with newer version.

@gevraud
Copy link

gevraud commented Oct 30, 2024

Same here.

Upgraded from 1.30.5 to 1.31.1.
system nodepool nodes don't come in "ready" state.

and I got this error : IMDS query failed, exit code: 28... in logs

I created another system nodepool with the 1.30.5 version and nodes come in ready state.

@alexku7
Copy link

alexku7 commented Nov 4, 2024

The same issue here
I had to open a ticket to Azure support

So embarrassing time after time in Azure

@k-koleda
Copy link

k-koleda commented Nov 4, 2024

We faced the same issue when we updated our development cluster from 1.30.5 to 1.31.1. As described here, I created new pools, and they are in a ready state cluster back to normal operation, but now I am facing a failed state for many operations. As I understand it, I now have control plane 1.31.1 and nodes 1.30.5. Does Azure have any mechanism to roll back the update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
k8s-versions Issues associated with k8s versions question
Projects
None yet
Development

No branches or pull requests

6 participants