This section provides information about the Kubecheck functionality.
- Procedure Execution From CLI
- Check Procedures
- IAAS Procedure
- PAAS Procedure
- 201 Service Status
- 205 System packages versions
- 206 Generic Packages Version
- 207 Pods Condition
- 208 Dashboard Availability
- 209 Nodes Existence
- 210 Nodes Roles
- 211 Nodes Condition
- 213 Selinux security policy
- 214 Selinux configuration
- 215 Firewalld status
- 216 Swap state
- 217 Modprobe rules
- 218 Time difference
- 219 Health status ETCD
- 220 Control plane configuration status
- 221 Control plane health status
- 222 Default services configuration status
- 223 Default services health status
- 224 Calico configuration check
- 225 Pod security admission status
- 226 Geo connectivity status
- 227 Apparmor status
- 228 Apparmor configuration
- Report File Generation
The Kubernetes Check procedure provides an opportunity to automatically verify the environment and quickly get a report on the results. The environment is checked against the following criteria, which is defined in advance:
- Minimal - The minimum results that the test environment must meet. If it does not satisfy this, there is no guarantee that this environment will be operational.
- Recommended - The recommended results in which the test development environment for the Full-HA cluster scheme showed the best results and performance. If you have a production environment, you must independently calculate the number of resources for your cluster. This number is more than that recommended by the Kubernetes Check procedure.
If the detected test results deviate from the criteria, the following status types are assigned to them:
- OK - This status indicates the compliance with the recommended values, if any, and successful completion of the test without errors.
- WARN - This status indicates that the test deviated slightly from the expected values. For example, the results found do not correspond to the recommended values. However, this test passed the minimum requirements and has not failed.
- FAIL - This status indicates that the test does not meet the minimum requirements or it has failed. This test requires attention to fix the environment.
- ERROR? - This status indicates that an internal error occurred in the test and it cannot be continued.
At the end of the logs of the procedure, a summary report table with all the test results is displayed. For example:
Group Status ID Test Actual result Minimal Recommended
SSH OK 001 Connectivity .......................................................... Connected
SSH WARN 002 Latency - Single Thread .................................................. 1500ms 10000 1000
SSH FAIL 003 Latency - Multi Thread .................................................. 50000ms 15000 2000
OVERALL RESULTS: 1 SUCCEEDED 1 WARNED 1 FAILED
The following columns are presented in this table:
- Group - The logical group of checks to which the test belongs.
- Status - The final status assigned to the test according to the results of the check.
- ID - The test identifier.
- Name - The short test name.
- Actual result - The actual value detected by the test on the environment.
- Minimal (optional) - The minimum required value for this test.
- Recommended (optional) - The recommended required value for this test.
The final report is generated in a file. For more information, see Report File Generation.
Check procedure execution form CLI can be started with the following command:
kubemarine check_iaas
kubemarine check_paas
It begins the execution of all tasks available in the procedure in accordance with the procedure type. For more information about how a tasks list can be redefined, see Tasks List Redefinition in Kubemarine Installation Procedure.
Note: some paas
checks work only if additional configuration file is provided.
Without additional configuration, such checks will be skipped without warning.
See particular check documentation for configuration format. Following is an example of passing procedure config file:
kubemarine check_paas procedure.yaml
A check procedure is divided into logical sub-procedures. Each of them is responsible for its own set of tests conducted on the environment.
The IAAS procedure verifies only the infrastructure. For example, it checks the amount of hardware resources or checks the speed of the environment. These tests do not perform cluster checks and are intended to be performed both on a completely empty environment and an environment with the cluster installed.
The task tree is as follows:
- ssh
- connectivity
- latency
- single
- multiple
- sudoer_access
- network
- pod_subnet_connectivity
- service_subnet_connectivity
- check_tcp_ports
- thirdparties_available
- hardware
- members_amount
- vips
- balancers
- control-planes
- workers
- total
- cpu
- balancers
- control-planes
- workers
- ram
- balancers
- control-planes
- workers
- members_amount
- system
- distributive
- thirdparties
- availability
Task: ssh.connectivity
This test checks whether it is possible to establish the SSH-connection with nodes. If you are unable to connect to the nodes, check and fix the following:
- The credentials for the connection are correct (verify the ip address, user, and key).
- The node is up.
- The node is online.
- The network connection to the node is available.
- The node port 22 (or other custom, if configured) is open and can be binded.
- The SSHD is running and its configuration is correct.
Task: ssh.latency.single
This test checks the delay between the nodes in the single-threaded mode. The test of the nodes passes one after another.
Task: ssh.latency.multiple
This test checks the delay between the nodes in the multi-threaded mode. The test of all nodes passes at the same time.
Task: ssh.sudoer_access
Tests of this type check the availability of the required amount of resources.
Task: hardware.members_amount.vips
This test checks the number of VIPs present for Keepalived.
Task: hardware.members_amount.balancers
This test checks the number of nodes present with the balancer
role.
Task: hardware.members_amount.control-planes
This test checks the number of nodes present with the control-plane
role.
Task: hardware.members_amount.workers
This test checks the number of nodes present with the worker
role.
Task: hardware.members_amount.total
This test checks the number of all the nodes present.
Tests of this type check the availability of the required number of processors.
Task: hardware.cpu.balancers
This test checks the number of processors on the nodes with the balancer
role.
Task: hardware.cpu.control-planes
This test checks the number of processors on the nodes with the control-plane
role.
Task: hardware.cpu.workers
This test checks the number of processors on the nodes with the worker
role.
Tests of this type check the availability of the required number of RAM.
Task: hardware.ram.balancers
This test checks the amount of RAM on nodes with the balancer
role.
Task: hardware.ram.control-planes
This test checks the amount of RAM on nodes with the control-plane
role.
Task: hardware.ram.workers
This test checks the amount of RAM on nodes with the worker
role.
Task: system.distributive
This test checks the family and release version of the operating system on the hosts.
Task: network.pod_subnet_connectivity
This test checks the connectivity between nodes inside a pod's subnetwork.
Task: network.service_subnet_connectivity
This test checks the connectivity between nodes inside the service's subnetwork.
Task: network.check_tcp_ports
This test checks if necessary ports are opened on the nodes.
Task: software.thirdparties.availability
This test checks if thirdparties are available from sources on needed nodes.
Task: software.packages.repositories
This test checks if defined package repositories are available from nodes.
Task: software.packages.availability
This test checks if needed package are available from nodes.
Task: software.kernel.version
This test checks the Linux kernel version for a unstable kernel version
, if it is equal to a unstable one, it issues a warning
Unstable kernel version
- is a kernel with detected serious issue that affects any part of cluster, therefore it's unsupported
The PAAS procedure verifies the platform solution. For example, it checks the health of a cluster or service statuses on nodes. This test checks the already configured environment. All services and the Kubernetes cluster must be installed and should be in working condition. Apart from the environment installed and configured by Kubemarine, the test can check other environments too.
The task tree is as follows:
- services
- security
- selinux
- status
- config
- apparmor
- status
- config
- firewalld
- status
- selinux
- system
- time
- swap
- status
- modprobe
- rules
- haproxy
- status
- keepalived
- status
- container_runtime
- status
- kubelet
- status
- configuration
- version
- packages
- system
- recommended_versions
- cri_version
- haproxy_version
- keepalived_version
- audit_version
- mandatory_versions
- generic
- version
- system
- security
- thirdparties
- hashes
- kubernetes
- pods
- plugins
- dashboard
- nodes
- existence
- roles
- condition
- network
- memory
- disk
- pid
- ready
- admission
- etcd
- health_status
- control_plane
- configuration_status
- health_status
- default_services
- configuration_status
- health_status
- calico
- config_check
- geo_check
Tests of this type verify the correctness of service statuses.
Task: services.haproxy.status
This test checks the status of the Haproxy service on all hosts in the cluster where this service is expected.
Task: services.keepalived.status
This test checks the status of the Keepalived service on all hosts in the cluster where this service is expected.
Task: services.container_runtime.status
This test checks the status of the Container Runtime (docker/containerd) service on all hosts in the cluster where this service is expected.
Task: services.kubelet.status
This test checks the status of the Kubelet service on all hosts in the cluster where this service is expected.
Task: services.kubelet.configuration
This test checks that kubelet maxPods
and podPidsLimit
are correctly alligned with kernel pid_max
.
Task: services.kubelet.version
This test checks the Kubelet version on all hosts in a cluster.
Tests of this type check that the system packages are installed, have equal versions and those versions are recommended and correspond with cluster.yaml.
Task: services.packages.system.cri_version
This test checks that the configured CRI package is installed on all nodes, has equal version and this version is recommended and corresponds with cluster.yaml.
Task: services.packages.system.haproxy_version
This test checks that the configured HAproxy package is installed on all nodes, has equal version and this version is recommended and corresponds with cluster.yaml.
Task: services.packages.system.keepalived_version
This test checks that the configured Keepalived package is installed on all nodes, has equal version and this version is recommended and corresponds with cluster.yaml.
Task: services.packages.system.audit_version
This test checks that the configured Audit package is installed on all nodes, has equal version and this version is recommended and corresponds with cluster.yaml.
Task: services.packages.system.mandatory_versions
This test checks that the configured mandatory packages are installed on all nodes, has equal version and this version corresponds with cluster.yaml.
Task: services.packages.generic.version
This test checks that the configured generic packages are installed on all nodes, has equal version and this version corresponds with cluster.yaml.
Task: thirdparties.hashes
This test checks that configured thirdparties hashes are equal to actual files hashes on nodes.
Task: kubernetes.pods
This test checks that system pods are in good condition.
Task: kubernetes.plugins.dashboard
This test checks that the dashboard is available by its URL.
Task: kubernetes.nodes.existence
This test checks for the presence of nodes in the Kubernetes cluster.
Task: kubernetes.nodes.roles
This test checks the nodes' roles in the Kubernetes cluster.
Tests of this type check the condition of the nodes that the Kubernetes reports.
Task: kubernetes.nodes.condition.network
This test checks the condition NetworkUnavailable
of the Kubernetes nodes of the cluster.
Task: kubernetes.nodes.condition.memory
This test checks the condition MemoryPressure
of the Kubernetes nodes of the cluster.
Task: kubernetes.nodes.condition.disk
This test checks the condition DiskPressure
of the Kubernetes nodes of the cluster.
Task: kubernetes.nodes.condition.pid
This test checks the condition PIDPressure
of the Kubernetes nodes of the cluster.
Task: kubernetes.nodes.condition.ready
This test checks the condition Ready
of the Kubernetes nodes of the cluster.
Task: services.security.selinux.status
The test checks the status of Selinux. It must be enforcing
. It may be permissive
, but must be explicitly specified
in the inventory. Otherwise, the test will fail. This test is applicable only for systems of the RHEL family.
Task: services.security.selinux.config
The test compares the configuration of Selinux on the nodes with the configuration specified in the inventory or with the one by default. If the configuration does not match, the test will fail.
Task: services.security.firewalld.status
The test verifies that the FirewallD is disabled on cluster nodes, otherwise the test will fail.
Task: services.system.swap.status
The test verifies that swap is disabled on all nodes in the cluster, otherwise the test will fail.
Task: services.system.modprobe.rules
The test compares the modprobe rules on the nodes with the rules specified in the inventory or with default rules. If rules does not match, the test will fail.
Task: services.system.time
The test verifies that the time between all nodes does not differ by more than the maximum limit value. Otherwise, the cluster may not work properly and the test will be highlighted with a warning.
Maximum limit value: 15000ms
Note: this test can give a false-positive result if there are a lot of nodes in the cluster, there is too much delay between the deployer node and all the others, or any other conditions of the environment. It is also recommended to be sure to perform latency tests: 002 Latency - Single Thread and 003 Latency - Multi Thread.
Task: etcd.health_status
This test verifies ETCD health.
Task: control_plane.configuration_status
This test verifies the consistency of the configuration (image version, extra_args
, extra_volumes
) of static pods of Control Plain like kube-apiserver
, kube-controller-manager
and kube-scheduler
.
Task: control_plane.health_status
This test verifies the health of static pods kube-apiserver
, kube-controller-manager
and kube-scheduler
.
Task: default_services.configuration_status
In this test, the versions of the images of the default services, such as kube-proxy
, coredns
, calico-node
, calico-kube-controllers
and ingress-nginx-controller
, are checked, and the coredns
configmap is also checked.
Task: default_services.health_status
This test verifies the health of pods kube-proxy
, coredns
, calico-node
, calico-kube-controllers
and ingress-nginx-controller
.
Task: calico.config_check
This test checks the configuration of the calico-node
envs, Calico's ConfigMap in case of ipam
, and also performed calicoctl ipam check
.
Task: kubernetes.admission
The test checks status of Pod Security Admissions, default PSS(Pod Security Standards) profile and match consistance between 'cluster.yaml' and current Kubernetes configuration. Also it check consistancy between 'kube-apiserver.yaml' and 'kubeadm-config'.
Task: geo_check
The task checks status of DNS resolving, pod-to-service and pod-to-pod connectivity between cluster in geographically
distributed schemas. This task works only if procedure config file is provided with information about paas-geo-monitor
,
at least service name and namespace. For example:
geo-monitor:
namespace: site-manager
service: paas-geo-monitor
For more information about paas-geo-monitor
service, refer to DRNavigator repository.
Task: services.security.apparmor.status
The test checks the status of AppArmor. It should be enabled
by default.
Task: services.security.apparmor.config
The test checks the AppArmor configuration. It has several modes: enforce
, complain
, and disable
. The profiles (resources) stick to one of the modes. The cluster.yaml
may incude only part of the profiles.
In addition to the resulting table in the log output, the same report is presented in the form of files.
The report allows you to visually see the final report. All content, including styles, is already included inside a single file. You can use the following supported command line arguments:
Argument | Default | Description |
---|---|---|
--html-report | report.html |
The full absolute path to the file location where the report is saved. |
--disable-html-report | If specified, the report generation is disabled. |
Report file example (trimmed):
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>PAAS Check Report</title>
</head>
<body>
<div id="date">2020-04-29 10:09:31.096773</div>
<div id="stats">
<div class="succeeded">12 succeeded</div>
</div>
<h1>PAAS Check Report</h1>
<table>
<thead>
<tr>
<td>Group</td>
<td>Status</td>
<td>ID</td>
<td>Test</td>
<td>Actual Result</td>
<td>Minimal</td>
<td>Recommended</td>
</tr>
</thead>
<tbody>
<tr class="ok">
<td>services</td>
<td>
<div>ok</div>
</td>
<td>201</td>
<td>Haproxy Status</td>
<td>active (running)</td>
<td></td>
<td></td>
</tr>
<tr class="ok">
<td>services</td>
<td>
<div>ok</div>
</td>
<td>201</td>
<td>Keepalived Status</td>
<td>active (running)</td>
<td></td>
<td></td>
</tr>
<tr class="ok">
<td>services</td>
<td>
<div>ok</div>
</td>
<td>201</td>
<td>Docker Status</td>
<td>active (running)</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
</body>
</html>
This report allows a thirdparty software to parse the report result. This is convenient when working with Excel or automatic metrics collection systems. You can use the following supported command line arguments:
Argument | Default | Description |
---|---|---|
--csv-report | report.csv |
The full absolute path to the file location where the report is saved. |
--csv-report-delimiter | ; |
The character used as a column separator. |
--disable-csv-report | If specified, the report generation is disabled. |
Report file example:
group;status;test_id;test_name;current_result;minimal_result;recommended_result
services;ok;201;Haproxy Status;active (running);;
services;ok;201;Keepalived Status;active (running);;
services;ok;201;Docker Status;active (running);;
services;ok;201;Kubelet Status;active (running);;
kubernetes;ok;202;Kubelet Version;v1.16.3;;
kubernetes;ok;203;Nodes Existence;All nodes presented;;
kubernetes;ok;204;Nodes Roles;All nodes have the correct roles;;
kubernetes;ok;205;Nodes Condition - NetworkUnavailable;CalicoIsUp;;
kubernetes;ok;205;Nodes Condition - MemoryPressure;KubeletHasSufficientMemory;;
kubernetes;ok;205;Nodes Condition - DiskPressure;KubeletHasNoDiskPressure;;
kubernetes;ok;205;Nodes Condition - PIDPressure;KubeletHasSufficientPID;;
kubernetes;ok;205;Nodes Condition - Ready;KubeletReady;;