This directory contains scripts used by the OpenShift CI pipelines to monitor selected functional tests on OpenShift. There are 2 pipelines, history and logs can be accessed here:
To run openshift-tests (or other suites) with kata-containers one can use the kata-webhook. To deploy everything you can mimic the CI pipeline by:
#!/bin/bash -e
# Setup your kubectl and check it's accessible by
kubectl nodes
# Deploy kata (set KATA_DEPLOY_IMAGE to override the default kata-deploy-ci:latest image)
./test.sh
# Deploy the webhook
KATA_RUNTIME=kata-qemu cluster/deploy_webhook.sh
This should ensure kata-containers as well as kata-webhook are installed and working. Before running the openshift-tests it's (currently) recommended to ignore some security features by:
#!/bin/bash -e
oc adm policy add-scc-to-group privileged system:authenticated system:serviceaccounts
oc adm policy add-scc-to-group anyuid system:authenticated system:serviceaccounts
oc label --overwrite ns default pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/warn=baseline pod-security.kubernetes.io/audit=baseline
Now you should be ready to run the openshift-tests. Our CI only uses a subset
of tests, to get the current TEST_SKIPS
see
the pipeline config.
Following steps require the openshift tests
being cloned and built in the current directory:
#!/bin/bash -e
# Define tests to be skipped (see the pipeline config for the current version)
TEST_SKIPS="\[sig-node\] Security Context should support seccomp runtime/default\|\[sig-node\] Variable Expansion should allow substituting values in a volume subpath\|\[k8s.io\] Probing container should be restarted with a docker exec liveness probe with timeout\|\[sig-node\] Pods Extended Pod Container lifecycle evicted pods should be terminal\|\[sig-node\] PodOSRejection \[NodeConformance\] Kubelet should reject pod when the node OS doesn't match pod's OS\|\[sig-network\].*for evicted pods\|\[sig-network\].*HAProxy router should override the route\|\[sig-network\].*HAProxy router should serve a route\|\[sig-network\].*HAProxy router should serve the correct\|\[sig-network\].*HAProxy router should run\|\[sig-network\].*when FIPS.*the HAProxy router\|\[sig-network\].*bond\|\[sig-network\].*all sysctl on whitelist\|\[sig-network\].*sysctls should not affect\|\[sig-network\] pods should successfully create sandboxes by adding pod to network"
# Get the list of tests to be executed
TESTS="$(./openshift-tests run --dry-run --provider "${TEST_PROVIDER}" "${TEST_SUITE}")"
# Store the list of tests in /tmp/tsts file
echo "${TESTS}" | grep -v "$TEST_SKIPS" > /tmp/tsts
# Remove previously-existing temporarily files as well as previous results
OUT=RESULTS/tmp
rm -Rf /tmp/*test* /tmp/e2e-*
rm -R $OUT
mkdir -p $OUT
# Run the tests ignoring the monitor health checks
./openshift-tests run --provider azure -o "$OUT/job.log" --junit-dir "$OUT" --file /tmp/tsts --max-parallel-tests 5 --cluster-stability Disruptive --run '^\[sig-node\].*|^\[sig-network\]'
[!NOTE]
Note we are ignoring the cluster stability checks because our public cloud is
not that stable and running with VMs instead of containers results in minor
stability issues. Some of the old monitor stability tests do not reflect
the --cluster-stability
setting, one should simply ignore these. If you
get a message like invariant was violated
or error: failed due to a MonitorTest failure
, it's usually an indication that only those kind of
tests failed but the real tests passed. See
wrapped-openshift-tests.sh
for details how our pipeline deals with that.
[!TIP] To compare multiple results locally one can use junit2html tool.
If you need to cleanup the cluster after testing, you can use the
cleanup.sh
script from the current directory. It tries to delete all
resources created by test.sh
as well as cluster/deploy_webhook.sh
ignoring all failures. The primary purpose of this script is to allow
soft-cleanup after deployment to test different versions without
re-provisioning everything.
[!WARNING] Do not rely on this script in production, return codes are not checked!**
Let's say the OCP pipeline passed running with
quay.io/kata-containers/kata-deploy-ci:kata-containers-d7afd31fd40e37a675b25c53618904ab57e74ccd-amd64
but failed running with
quay.io/kata-containers/kata-deploy-ci:kata-containers-9f512c016e75599a4a921bd84ea47559fe610057-amd64
and you'd like to know which PR caused the regression. You can either run with
all the 60 tags between or you can utilize the bisecter
to optimize the number of steps in between.
Before running the bisection you need a reproducer script. Sample one called
sample-test-reproducer.sh
is provided in this directory but you might
want to copy and modify it, especially:
OCP_DIR
- directory where your openshift/release is located (can be exported)E2E_TEST
- openshift-test(s) to be executed (can be exported)- behaviour of SETUP (returning 125 skips the current image tag, returning
=128 interrupts the execution, everything else reports the tag as failure
- what should be executed (perhaps running the setup is enough for you or you might want to be looking for specific failures...)
- use
timeout
to interrupt execution in case you know things should be faster
Executing that script with the GOOD commit should pass
./sample-test-reproducer.sh quay.io/kata-containers/kata-deploy-ci:kata-containers-d7afd31fd40e37a675b25c53618904ab57e74ccd-amd64
and fail when executed with the BAD commit
./sample-test-reproducer.sh quay.io/kata-containers/kata-deploy-ci:kata-containers-9f512c016e75599a4a921bd84ea47559fe610057-amd64
.
To get the list of all tags in between those two PRs you can use the
bisect-range.sh
script
./bisect-range.sh d7afd31fd40e37a675b25c53618904ab57e74ccd 9f512c016e75599a4a921bd84ea47559fe610057
[!NOTE] The tagged images are only built per PR, not for individual commits. See kata-deploy-ci to see the available images.
To find out which PR caused this regression, you can either manually try the individual commits or you can simply execute:
bisecter start "$(./bisect-range.sh d7afd31fd40 9f512c016)"
OCP_DIR=/path/to/openshift/release bisecter run ./sample-test-reproducer.sh
[!NOTE]
If you use KATA_WITH_SYSTEM_QEMU=yes
you might want to deploy once with
it and skip it for the cleanup. That way you might (in most cases) test
all images with a single MCP update instead of per-image MCP update.
[!TIP]
You can check the bisection progress during/after execution by running
bisecter log
from the current directory. Before starting a new
bisection you need to execute bisecter reset
.