One of the challenges of the OpenShift/NERC integration is accommodating existing tools that NERC already uses. This document covers usage accounting through XDMoD.
XDMoD is a usage accounting tool whose data structures and functionality have proven suitable for NERC. Its UI has multiple features that existing NERC users have grown accustomed to, including:
- Different admin and PI views
- A variety of output formats
- Report generation
However there is a drawback to using XDMoD: the codebase is hard to decipher, making it difficult to make reliable estimations regarding the time needed to extend it to accommodate OpenShift. Fortunately an alternative approach to OpenShift integration is available: simply using one of the existing data structures for OpenShift data. This approach was discussed with XDMoD developers, who raised no objections.
XDMoD has two approaches when collecting data for a resource:
- Job Based: Each data entry encapsulates all the known information about a single job.
- Event Based: Each data entry represents an event (such as the creation or deletion of a VM)
Both methods allow XDMoD to reconstruct the state of a system and view the consumption of computing resources at a particular point of time. Which method is better for OpenShift?
OpenShift jobs - pods - can run for a long period of time; that makes it difficult to create a single job entry in a database. However an event-based approach is also problematic, as the OpenShift CLI does not have strong event querying capabilities, and the information contained within an event does not have the level of detail required for XDMoD.
One possible alternative is to take periodic samples of data and treat each sample as a completed job. The question becomes whether this method of data gathering results in usable output as viewed from the XDMoD UI.
The remainder of this document explores this approach.
OpenShift metrics are stored in Prometheus, and can be queried using Prometheus’s query language: PromQL.
The following PromQL query retrieves metric data for an OpenShift cluster. That data is averaged over the past hour using one minute samples and aggregated by namespace.
avg_over_time(sum by (namespace) (<metric>)[1h:1m])
That query can be combined with the following OpenShift REST API call to produce values for each hour in a day.
/api/v1/query_range?query=<query>&start=<date_string>T00:00:00Z&end=<date_string>T23:59:59Z&step=3600s
We can use the above to query the following metrics over a day:
kube_pod_init_container_resource_requests_cpu_cores
- The number of CPU cores requested by an init container.
kube_pod_init_container_resource_limits_cpu_cores
- The number of CPU cores requested limit by an init container.
kube_pod_init_container_resource_requests_memory_bytes
- Bytes of memory requested by an init container.
kube_pod_init_container_resource_limits_memory_bytes
- Bytes of memory requested limit by an init container.
kube_deployment_status_replicas
- The number of replicas per deployment.
The mapping of these values into the XDMoD data structure is described below.
Some job information required by XDMoD are not metrics, but simply additional information about the namespace. That information can be included when creating a namespace through the use of annotations.
Kristi Nikolla has already created a patch that adds OpenShift support to the ColdFront OpenStack plugin. This support is similar to that for OpenStack, allowing for the activation and deactivation of an allocation, as well as the association and dissociation of a user with the allocation. This code can be updated to also set annotations by passing in the following:
"metadata": {
"annotations": {
"cf_pi": <pi_username>
"cf_project_id": <project_id>
}
}
These annotations can later be queried through the OpenShift Python client.
The XDMoD data structure that accepts job data is the one used for Slurm. This table shows one possible correspondence between Slurm and OpenShift data:
Slurm | OpenShift Equivalent |
job_id | autogenerated by script |
job_id_raw | autogenerated by script |
cluster_name | openshift cluster environment variable |
partition_name | blank |
qos_name | blank |
account_name | cf_pi annotation |
group_name | cf_project_id annotation |
gid_number | blank |
user_name | openshift namespace |
uid_number | blank |
submit_time | set to start_time |
eligible_time | set to start_time |
start_time | beginning of report time |
end_time | end of report time |
elapsed | end_time - start_time |
exit_code | blank |
state | RUNNING |
nnodes | kube_deployment_status_replicas
|
ncpus | kube_pod_init_container_resource_requests_cpu_cores
|
req_cpus | kube_pod_init_container_resource_limits_cpu_cores
|
req_mem | kube_pod_init_container_resource_limits_memory_bytes
|
req_tres | cpu=<req_cpu>,mem=<req_mem>,node=<req_pods> |
alloc_tres | set to req_tres |
timelimit | set to elapsed |
node_list | blank |
job_name | openshift pod name |
In order to retrieve the above OpenShift information in the Slurm format needed by XDMoD, we can create a script that pulls the required data from Prometheus and OpenShift and formats it appropriately. XDMoD can then “shred” and “ingest” that data, allowing it to be viewed in its GUI. Other existing XDMoD functions - such as the automatic generation of reports - should then also be accessible for this OpenShift data.