-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: NodePool should not late-init node count #452
Conversation
The nodepool's node_count and initial_node_count should not be late-initialized to prevent a fight with the GKE autoscaler. Signed-off-by: Moritz Johner <[email protected]>
Hey @turkenf could you please ✔️ the PR pipeline? I'd like to get an artifact out of the //edit thank you 🙇 |
/test-examples="examples/container/nodepool.yaml" |
damn, i just realized the CI job doesn't build a public accessible image. Going to publish one on my own 👾 |
Hey @turkenf i've updated my proof of work. Can you please guide me regarding the I can find the logs: Is this a flake or an issue with my change 🤔 ?
|
/test-examples="examples/container/nodepool.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @moolen,
Thank you for your effort on this PR and your nice testing explanation. If you look at the PR #353, which contains the same change before, we have a managementPolicy
feature to be used in these cases. Have you tried this feature? (see this comment)
Hey @turkenf i've updated my proof of work. Can you please guide me regarding the Uptest-examples/container/nodepool.yaml failure?
The default time for testing our examples in Uptest is 20 minutes, and we use the uptest.upbound.io/timeout
annotation for resources that need to be tested longer than this time. I think adding an annotation to the example like here will solve the issue.
Thank you @turkenf, i got it to work with bespoke |
The nodepool's
node_count
andinitial_node_count
should not be late-initialized to prevent a fight with the GKE autoscaler.Also see discussion on #353
Fixes: #340
I have:
make reviewable test
to ensure this PR is ready for review.How has this code been tested
I created a cluster manually and used the following configuration:
Nodepool was created, note: 12 max node count:
![Screenshot from 2024-01-26 16-12-53](https://private-user-images.githubusercontent.com/1709030/300027837-c9e7b0a7-46bd-44dc-916a-d4a7e5ad0be9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyNzk1NzQsIm5iZiI6MTczOTI3OTI3NCwicGF0aCI6Ii8xNzA5MDMwLzMwMDAyNzgzNy1jOWU3YjBhNy00NmJkLTQ0ZGMtOTE2YS1kNGE3ZTVhZDBiZTkucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTFUMTMwNzU0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MTYwMGZkZWYzYzJjZDYxNTdiYzI1YWFjYTlhMTk5YmQ3MWI3MGRiYzBmOTNkYTM1ZjAyN2I4ZTI3MGZmNjMwOSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.HT7MbM6MMXmYyTR3f70JSTknAHh5tE5OPIbeeke4zhk)
after it has been created, some of the late init fields are shown. Both
![Screenshot from 2024-01-26 16-15-08](https://private-user-images.githubusercontent.com/1709030/300027905-f507f806-9284-4624-bca9-7b48378958de.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyNzk1NzQsIm5iZiI6MTczOTI3OTI3NCwicGF0aCI6Ii8xNzA5MDMwLzMwMDAyNzkwNS1mNTA3ZjgwNi05Mjg0LTQ2MjQtYmNhOS03YjQ4Mzc4OTU4ZGUucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTFUMTMwNzU0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MTE1MzJiODM4ZjIxMWVmYjA5Y2VkYzFkYWYyNTFjYTg0NzdiYmVjNzllNjhjYWQ1OGRiZGQxMDE4MWQxYWJkYiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.D6n2sAOc4JbAKLRTmAwfJhe58bxoxenVARGBkwRBAIs)
initalNodeCount
andnodeCount
are not shown, as expected.I scaled up a workload to trigger the GKE autoscaler.
![Screenshot from 2024-01-26 16-20-33](https://private-user-images.githubusercontent.com/1709030/300028239-c53ce7c8-8db3-4f8f-be6b-3a863ac88340.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyNzk1NzQsIm5iZiI6MTczOTI3OTI3NCwicGF0aCI6Ii8xNzA5MDMwLzMwMDAyODIzOS1jNTNjZTdjOC04ZGIzLTRmOGYtYmU2Yi0zYTg2M2FjODgzNDAucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTFUMTMwNzU0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MTQzZjgyMGRhM2RmNjJjMDBhZjBlN2IyMjZjYmI0N2ZkNWVjNmIxMWFiMzgyYjZkY2NiMzZkNTEzOTFiYzM5YSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.bjn15X3bWKvwHvVq_FaLVtkuJi202yY2YpzbeMdPfbg)
Nodes join the cluster (= autoscaling is working)
![Screenshot from 2024-01-26 16-20-52](https://private-user-images.githubusercontent.com/1709030/300028349-e7c9d700-0cc3-421c-be1d-6d03662f788a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyNzk1NzQsIm5iZiI6MTczOTI3OTI3NCwicGF0aCI6Ii8xNzA5MDMwLzMwMDAyODM0OS1lN2M5ZDcwMC0wY2MzLTQyMWMtYmUxZC02ZDAzNjYyZjc4OGEucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTFUMTMwNzU0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9N2VlNGYzMWQ5ZDA3MmEyMTJkODI1Nzk5NmMyZjZhMmEwODMwZmM0MzQzNjI5NDYzYmIzZDI4NjUwM2EwODA4NSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.1wAAGtftRPSnai_3cDU_Z3zxFFW2UFCupB7IrnxYkZ0)
I've set the
autoscaling.maxNodeCount=10
to trigger a reconciliation and verify that crossplane is able to update the nodepool despite the nodeCount being changed. Proof: nodepool has changed in GCP console: