Migrate CRI-O jobs away from `kubernetes_e2e.py` #32567

saschagrunert · 2024-05-06T09:04:12Z

The kubernetes_e2e.py script is deprecated and we should use kubetest2 instead.

All affected tests are listed in https://testgrid.k8s.io/sig-node-cri-o

cc @kubernetes/sig-node-cri-o-test-maintainers

Ref: https://github.com/kubernetes/test-infra/tree/master/scenarios, #20760

The text was updated successfully, but these errors were encountered:

haircommander · 2024-05-06T13:57:33Z

/sig node

k8s-triage-robot · 2024-08-04T14:13:56Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

saschagrunert · 2024-08-05T07:01:42Z

/remove-lifecycle stale

kannon92 · 2024-08-21T17:20:38Z

/triage accepted
/priority important-longterm

elieser1101 · 2024-09-05T12:07:51Z

Does this still need help? can i start looking at it?

saschagrunert · 2024-09-05T12:14:26Z

@elieser1101 I'd appreciate your eyes on that. 🙏

elieser1101 · 2024-09-05T12:33:50Z

/assign

elieser1101 · 2025-01-03T19:34:26Z

So, kubetest2 changes --timeout 300m to ginkgo's --timeout=180m for some reason. Do you have any idea why?

I have seen that before, and I cant point to the WHY is that. but i think is more of test-e2e-node.sh and e2e_node/remote/remote.go change

is not this one ?
https://github.com/kubernetes-sigs/kubetest2/blob/22d5b1410bef09ae679fa5813a5f0d196b6079de/pkg/testers/node/node.go#L73

Yeah that is the flag we are using(tester flags), but then under the hood, the rabithole transforms the timeout in several places

When we pass to kubetest2 --timeout=300m we got this

Running the command ssh, with args: [-o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o CheckHostIP=no -o StrictHostKeyChecking=no -o ServerAliveInterval=30 -o LogLevel=ERROR -i /root/.ssh/google_compute_engine [email protected] -- sudo /bin/bash -c 'cd /tmp/node-e2e-20250103T183438 && set -o pipefail; timeout -k 30s 18000.000000s ./ginkgo -timeout=24h -focus="\[NodeFeature:Eviction\]"  -skip=""""  --no-color -v --timeout=180m ./e2e_node.test -- --system-spec-name= --system-spec-file= --extra-envs= --runtime-config= --v 4 --node-name=test-fedora-coreos-41-20241122-3-0-gcp-x86-64 --report-dir=/tmp/node-e2e-20250103T183438/results --report-prefix=fedora --image-description="fedora-coreos-41-20241122-3-0-gcp-x86-64" --kubelet-flags="--cluster-domain=cluster.local" --dns-domain="cluster.local" --prepull-images=false  --container-runtime-endpoint=unix:///run/containerd/containerd.sock --container-runtime-endpoint=unix:///var/run/crio/crio.sock --container-runtime-process-name=/usr/local/bin/crio --container-runtime-pid-file= --kubelet-flags="--cgroup-driver=systemd --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/crio.service --kubelet-cgroups=/system.slice/kubelet.service" --extra-log="{\"name\": \"crio.log\", \"journalctl\": [\"-u\", \"crio\"]}" 2>&1 | tee -i /tmp/node-e2e-20250103T183438/results/test-fedora-coreos-41-20241122-3-0-gcp-x86-64-ginkgo.log']

Which results in a process timeout of 18000.000000s
also test-e2e-node.sh introduces -timeout=24h no matter if you pass other timeout
And finaly the timeout we specified but trimmed by the remote.go resulting in --timeout=180m

so setting up 300min -> (300 + 60) /2 = 180min passed to ginkgo

bart0sh · 2025-01-03T20:52:14Z

I hope that timeout recalculation has some reason. It's not obvious, but hopefully it exists :)

BTW, increasing timeout helped the job, but not fixed it. One test case still fails.

@kannon92 @elieser1101 Any ideas how to fix it?

elieser1101 · 2025-01-06T12:52:44Z

Is it possible the test itself is flaky? I can see the nonkubetest2 works intermittently and also can found one run with a similar error to the job running with kubetest2

@bart0sh

kannon92 · 2025-01-06T14:36:05Z

eviction crio tests have some issues. I wouldn't worry about that.

bart0sh · 2025-01-07T17:29:59Z

Is it possible the test itself is flaky?

Could be, but I've never managed to run -kubetest2 tests without failure. non-kubetest2 tests are almost always green.

eviction crio tests have some issues.

It's probably off-topic here, so feel free to ignore.
I've noticed unexpectedly long timeouts in e2e eviction test cases. Is it considered normal for eviction to start 10 minutes after the issue (disk/pid pressure) started to manifest itself?

$ grep 'pressureTimeout :=' test/e2e_node/eviction_test.go 
        pressureTimeout := 15 * time.Minute
        pressureTimeout := 10 * time.Minute
        pressureTimeout := 10 * time.Minute
        pressureTimeout := 15 * time.Minute
        pressureTimeout := 10 * time.Minute
        pressureTimeout := 10 * time.Minute
        pressureTimeout := 10 * time.Minute
        pressureTimeout := 15 * time.Minute
        pressureTimeout := 10 * time.Minute

elieser1101 · 2025-01-16T23:47:45Z

Opened PR #34164 to promote the kubetest2 jobs that have been consistently working, pendings for rework are still

Evented pleg where non kubetest seem not working

Hugepages

pull-crio-cgroupv1-node-e2e-hugepages-kubetest2 (green but skiping tests)
pull-crio-cgroupv2-node-e2e-hugepages-kubetest2

Eviction

REsource manager

pr-crio-cgroupv1-node-e2e-resource-managers-kubetest2 gree but seem to skip everything
pr-crio-cgroupv2-node-e2e-resource-managers-kubetest2

SergeyKanzhelev · 2025-02-12T18:34:38Z

Discussed this at node CI meeting. @elieser1101 curious if you are still working on this?

elieser1101 · 2025-02-13T07:21:51Z

been doin other stuff around sig-release, I can reprioritize this and spent some hours to move this forward, i have a couple of things ready related to this.
did something change about this issue during the discussion? @SergeyKanzhelev

SergeyKanzhelev · 2025-02-13T18:39:16Z

Nothing specific. The question was basically whether this is still in progress or new owner is needed. Slow progress is OK.

bart0sh · 2025-02-20T09:18:27Z

@elieser1101 @kannon92 @ffromani @swatisehgal

pr-crio-cgroupv1-node-e2e-resource-managers-kubetest2 green but seem to skip everything pr-crio-cgroupv2-node-e2e-resource-managers-kubetest2

They're not green since Feb 13 2025. Is there an issue about it?

swatisehgal · 2025-02-20T12:58:05Z

@elieser1101 @kannon92 @ffromani @swatisehgal

pr-crio-cgroupv1-node-e2e-resource-managers-kubetest2 green but seem to skip everything pr-crio-cgroupv2-node-e2e-resource-managers-kubetest2

They're not green since Feb 13 2025. Is there an issue about it?

We have failures since kubernetes/kubernetes#127525 was merged. We have a tracking issue for this: kubernetes/kubernetes#130146 and a fix in place kubernetes/kubernetes#130163 which is being reviewed.

elieser1101 · 2025-02-21T00:55:37Z

once eventedpleg gets in place the only missing presubmits are the eviction ones

bart0sh · 2025-02-21T08:46:47Z

@elieser1101 Thanks for the reminder. I was distracted from the eviction job investigation by other tasks. Now I'm back to it.

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 6, 2024

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 6, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 4, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 5, 2024

SergeyKanzhelev added this to SIG Node CI/Test Board Aug 11, 2024

github-project-automation bot moved this to Triage in SIG Node CI/Test Board Aug 11, 2024

kannon92 moved this from Triage to Issues - To do in SIG Node CI/Test Board Aug 21, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Aug 21, 2024

k8s-ci-robot assigned elieser1101 Sep 5, 2024

This was referenced Jan 16, 2025

remove jobs from sig-node-presubmits.yaml and generate them for dra-canary.yaml using jinja #34163

Closed

Promote stable crio kubetest2 presubmits #34164

Closed

elieser1101 mentioned this issue Jan 17, 2025

REQUEST: New membership for elieser1101 (v1.33 Release Signal Shadow) kubernetes/org#5347

Closed

11 tasks

dshebib mentioned this issue Feb 5, 2025

Fix Flaking Container Lifecycle Tests kubernetes/kubernetes#129984

Open

SergeyKanzhelev moved this from Issues - To do to Issues - In progress in SIG Node CI/Test Board Feb 12, 2025

This was referenced Feb 16, 2025

Promote stable kubetest2 presubmits jobss #34337

Merged

Delete duplicated dra jobs #34339

Merged

This was referenced Feb 18, 2025

hugepages: use empty skip-regex for kubetest2 jobs #34352

Merged

replace crio hugepages jobs with kubetest2 #34354

Merged

managers: use empty skip-regex for kubetest2 jobs #34364

Merged

bart0sh mentioned this issue Feb 20, 2025

Replace CRI-O resource managers jobs with their kubetest2 variants #34375

Merged

elieser1101 mentioned this issue Feb 21, 2025

replace eventedpleg crio job with kubetest2 variant #34383

Merged

bart0sh mentioned this issue Feb 24, 2025

e2e_node: fix ImageGCNoEviction test for kubetest2 kubernetes/kubernetes#130391

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate CRI-O jobs away from `kubernetes_e2e.py` #32567

Migrate CRI-O jobs away from `kubernetes_e2e.py` #32567

saschagrunert commented May 6, 2024 •

edited

Loading

haircommander commented May 6, 2024

k8s-triage-robot commented Aug 4, 2024

saschagrunert commented Aug 5, 2024

kannon92 commented Aug 21, 2024

elieser1101 commented Sep 5, 2024

saschagrunert commented Sep 5, 2024

elieser1101 commented Sep 5, 2024

elieser1101 commented Jan 3, 2025 •

edited

Loading

bart0sh commented Jan 3, 2025

elieser1101 commented Jan 6, 2025

kannon92 commented Jan 6, 2025

bart0sh commented Jan 7, 2025

elieser1101 commented Jan 16, 2025 •

edited

Loading

SergeyKanzhelev commented Feb 12, 2025

elieser1101 commented Feb 13, 2025

SergeyKanzhelev commented Feb 13, 2025

bart0sh commented Feb 20, 2025

swatisehgal commented Feb 20, 2025 •

edited

Loading

elieser1101 commented Feb 21, 2025

bart0sh commented Feb 21, 2025

Migrate CRI-O jobs away from kubernetes_e2e.py #32567

Migrate CRI-O jobs away from kubernetes_e2e.py #32567

Comments

saschagrunert commented May 6, 2024 • edited Loading

haircommander commented May 6, 2024

k8s-triage-robot commented Aug 4, 2024

saschagrunert commented Aug 5, 2024

kannon92 commented Aug 21, 2024

elieser1101 commented Sep 5, 2024

saschagrunert commented Sep 5, 2024

elieser1101 commented Sep 5, 2024

elieser1101 commented Jan 3, 2025 • edited Loading

bart0sh commented Jan 3, 2025

elieser1101 commented Jan 6, 2025

kannon92 commented Jan 6, 2025

bart0sh commented Jan 7, 2025

elieser1101 commented Jan 16, 2025 • edited Loading

SergeyKanzhelev commented Feb 12, 2025

elieser1101 commented Feb 13, 2025

SergeyKanzhelev commented Feb 13, 2025

bart0sh commented Feb 20, 2025

swatisehgal commented Feb 20, 2025 • edited Loading

elieser1101 commented Feb 21, 2025

bart0sh commented Feb 21, 2025

Migrate CRI-O jobs away from `kubernetes_e2e.py` #32567

Migrate CRI-O jobs away from `kubernetes_e2e.py` #32567

saschagrunert commented May 6, 2024 •

edited

Loading

elieser1101 commented Jan 3, 2025 •

edited

Loading

elieser1101 commented Jan 16, 2025 •

edited

Loading

swatisehgal commented Feb 20, 2025 •

edited

Loading