Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update k8s dependencies to 1.32 #2427

Merged
merged 20 commits into from
Feb 18, 2025
Merged

Conversation

kale-amruta
Copy link
Contributor

@kale-amruta kale-amruta commented Jan 25, 2025

What issue type does this pull request address? (keep at least one, remove the others)
/kind enhancement
/kind feature
/kind test

What does this pull request do? Which issues does it resolve? (use resolves #<issue_number> if possible)
resolves ENG-5389
resolves ENG-5387
resolves ENG-5428

Please provide a short message that should be published in the vcluster release notes
Updated vcluster Kubernetes dependencies to v1.32

What else do we need to know?

This PR also fixes vcluster for failing conformance tests because of following issues:

  1. The field status.qosClass has been made immutable from k8s version 1.32 as done over here. When there is a mismatch in status.qosClass field in vcluster and host cluster and vlcuster tries to patch the field in vcluster, the patch fails with an error. This causes conformance tests to fail. The PR fixes tests by ignoring this field in the pod syncer patch updates.
  2. Test Job should allow to use a pod failure policy to ignore failure matching on DisruptionTarget condition [Conformance] was failing because the job controller on vcluster expected the pod to be in failed state before deleting it.
    In this case when the host pod is marked for deletion, the host pod gets into Failed state. But these status updates were not getting propogated to vcluster pod. This led to vcluster pod to be dangling state which caused the test failure. The PR adds a fix of propogating the status updates of host pod to vcluster pod when the host pod is marked for deletion.
  3. Test should run through the lifecycle of a node and should run through the lifecycle of a csinode was failing because these tests perform the lifecycle operations (CRUD) for nodes and CSI nodes on vcluster respectively but since vcluster doesn't inherently support the node creation on the virtual cluster, so syncToHost method simply deletes the node that was created during the conformance test. Hence, the test can't perform any further action such as get or update and the test fails.This PR fixes these tests as well

Copy link

netlify bot commented Jan 25, 2025

Deploy Preview for vcluster-docs canceled.

Built without sensitive environment variables

Name Link
🔨 Latest commit a345049
🔍 Latest deploy log https://app.netlify.com/sites/vcluster-docs/deploys/67b43b8ef0706e00088fe30d

@kale-amruta kale-amruta marked this pull request as ready for review January 25, 2025 12:19
@kale-amruta kale-amruta changed the title Conformance test fix bump k8s version to 1.32 Jan 25, 2025
@kale-amruta kale-amruta changed the title bump k8s version to 1.32 Update k8s version to 1.32 Jan 25, 2025
@kale-amruta kale-amruta changed the title Update k8s version to 1.32 Update k8s dependencies to 1.32 Jan 25, 2025
@ApsTomar
Copy link
Contributor

ApsTomar commented Jan 25, 2025

EDIT:

This test is fixed and solution has been added to the PR
.............................................................................................

There are two more tests which are failing. These tests were recently added in conformance test suite (a couple of months back). Below are the tests and the RCA:

  1. should run through the lifecycle of a node
  2. should run through the lifecycle of a csinode

Why are these tests failing?

These tests perform the lifecycle operations (CRUD) for nodes and CSI nodes on vcluster respectively but since vcluster doesn't inherently support the node creation on the virtual cluster, so syncToHost method simply deletes the node that was created during the conformance test. Hence, the test can't perform any further action such as get or update and the test fails.

Probable Solutions:

Since this is a case of a feature not supported so ideally it should be skipped but that might impact the certification failure as mentioned in conformance test instructions.
In that case, I tried some workarounds which certainly work but they have some downsides as well. I've detailed them out, please take a look:

  1. The conformance test creates the nodes which have a certain prefix in the name and are unschedulable in nature. This can be used to identify the test nodes and we can skip their deletion during sync operation. Later the test clean-up will delete that node after completion. The caveat is that- a user can misuse this to create a similar node which vcluster won't be able to delete during the regular sync operation.

  2. We can add some gracePeriod to the node deletion in syncToHost method which will keep the node active till the tests are passed. The caveat is that- it'll impact all the node deletion process and the users might not like that the nodes are taking more time to get deleted now. Also, what should be the ideal grace period is also uncertain and might make the test flaky.

  3. Better solution might be a combination of both of the above points: we identify the node created during the test (similar to point-1) and then mark it for deletion with a decent grace period (30 to 60 sec). This will not only make sure that the code change won't impact vcluster's natural behaviour but also won't allow any user to misuse it because it'll eventually get deleted after the grace period runs out.

Let me know your opinions and based on that I will create the PR to fix these remaining 2 conformance tests.

@kale-amruta kale-amruta force-pushed the ConformanceTestFix-1 branch 2 times, most recently from 4b135de to d387224 Compare February 6, 2025 11:23
@cbron
Copy link
Contributor

cbron commented Feb 6, 2025

Original PR was here, I will close: #2388

@cbron cbron requested a review from a team February 6, 2025 21:43
@kale-amruta kale-amruta removed the request for review from a team February 12, 2025 15:26
@kale-amruta kale-amruta requested a review from a team February 13, 2025 10:50
Copy link
Contributor

@lizardruss lizardruss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some questions and suggestions to try.

@kale-amruta kale-amruta force-pushed the ConformanceTestFix-1 branch 3 times, most recently from 43bd1a3 to 00fd0bd Compare February 14, 2025 16:00
@cbron cbron merged commit 28c7b75 into loft-sh:main Feb 18, 2025
63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants