diff --git a/design/KruizeLocalAPI.md b/design/KruizeLocalAPI.md new file mode 100644 index 000000000..1d5e22f39 --- /dev/null +++ b/design/KruizeLocalAPI.md @@ -0,0 +1,2247 @@ +# Local Monitoring Mode - Proof of Concept + +This article describes how to quickly get started with the Kruize Local Monitoring Mode use case REST API using curl command. +Documentation still in progress stay tuned. + +# Table of Contents + +1. [Resource Analysis Terms and Defaults](#resource-analysis-terms-and-defaults) + +- [Terms, Duration & Threshold Table](#terms-duration--threshold-table) + +2. [API's](#apis) +- [List Datasources API](#list-datasources-api) + - Introduction + - Example Request and Response + - Invalid Scenarios + +- [Import Metadata API](#import-metadata-api) + - Introduction + - Example Request and Response + - Invalid Scenarios + +- [List Metadata API](#list-metadata-api) + - Introduction + - Example Request and Response + - Invalid Scenarios + +- [Delete Metadata API](#delete-metadata-api) + - Introduction + - Example Request and Response + - Invalid Scenarios + +- [Create Experiment API](#create-experiment-api) + - Introduction + - Example Request and Response + - Invalid Scenarios + +- [List Experiments API](#list-experiments-api) + - Introduction + - Example Request and Response + - Invalid Scenarios + +- [Generate Recommendations API](#generate-recommendations-api) + - Introduction + - Example Request and Response + - Invalid Scenarios + + + +## Resource Analysis Terms and Defaults + +When analyzing resource utilization in Kubernetes, it's essential to define terms that specify the duration of past data +considered for recommendations and the threshold for obtaining additional data. These terms help in categorizing and +fine-tuning resource allocation. + +Below are the default terms used in resource analysis, along with their respective durations and thresholds: + + + +### Terms, Duration & Threshold Table + +| Term | Minimum Data Threshold | Duration | +|--------|------------------------|----------| +| Short | 30 mins | 1 day | +| Medium | 2 Days | 7 days | +| Long | 8 Days | 15 days | + +**Minimum Data Threshold**: The "minimum data threshold" represents the minimum amount of data needed for generating a +recommendation associated with a given duration term. + +**Duration**: The "duration" in the term analysis refers to the amount of historical data taken into account when +assessing resource utilization. + +Read more about the Term Threshold scenarios [here](TermThresholdDesign.md) + +### Profile Algorithm's (How Kruize calculate's the recommendations) + +**Profile:** + +This column represents different profiles or criteria that the recommendation algorithm takes into account when making +recommendations. + +**CPU (Percentile):** + +It indicates the percentile value for the timeseries CPU usage data that the algorithm considers for each profile. + +**Memory (Percentile):** + +Similarly, this column denotes the percentile value for the timeseries memory usage data that is used by the algorithm +for each profile. + +#### Profiles + +**Cost Profile:** +For the "Cost" profile, Kruize's recommendation algorithm will consider the 60th percentile for CPU usage and the 100th +percentile for memory usage when making recommendations. This means that cost-related recommendations will be based on +CPU usage that falls at or above the 60th percentile and memory usage at the 100th percentile. + +**Performance Profile:** +In the "Performance" profile, the algorithm takes into account the 98th percentile for CPU usage and the 100th +percentile for memory usage. Consequently, recommendations related to performance will be generated when CPU usage is at +or above the 98th percentile, and memory usage is at the 100th percentile. + +| Profile | CPU (Percentile) | Memory (percentile) | +|-------------|------------------|---------------------| +| Cost | 60 th | 100 th | +| Performance | 98 th | 100 th | + + + +## API's + + + +### List Datasources API + +This is quick guide instructions to list available datasources as follows. + +**Request without Parameter** + +`GET /datasources` + +`curl -H 'Accept: application/json' http://:/datasources` + +If no parameter is passed API returns all the datasources available. + +**Response** + +
+Example Response + +### Example Response + +```json +{ + "version": "v1.0", + "datasources": [ + { + "name": "prometheus-1", + "provider": "prometheus", + "serviceName": "prometheus-k8s", + "namespace": "monitoring", + "url": "http://prometheus-k8s.monitoring.svc.cluster.local:9090" + } + ] +} +``` + +
+ +**Request with datasource name parameter** + +`GET /datasources` + +`curl -H 'Accept: application/json' http://:/datasources?name=` + +Returns the datasource details of the specified datasource + +**Response for datasource name - `prometheus-1`** + +
+Example Response + +### Example Response + +```json +{ + "version": "v1.0", + "datasources": [ + { + "name": "prometheus-1", + "provider": "prometheus", + "serviceName": "prometheus-k8s", + "namespace": "monitoring", + "url": "http://prometheus-k8s.monitoring.svc.cluster.local:9090" + } + ] +} +``` + +
+ + + +### Import Metadata API + +This is quick guide instructions to import metadata using input JSON as follows. + +**Request** +`POST /dsmetadata` + +`curl -H 'Accept: application/json' -X POST --data 'copy paste below JSON' http://:/dsmetadata` + +
+ +Example Request + +### Example Request + +```json +{ + "version": "v1.0", + "datasource_name": "prometheus-1" +} +``` + +
+ + +**Response** + +
+Example Response + +### Example Response + +```json +{ + "datasources": { + "prometheus-1": { + "datasource_name": "prometheus-1", + "clusters": { + "default": { + "cluster_name": "default" + } + } + } + } +} +``` + +
+ + + +### List Metadata API + +This is quick guide instructions to retrieve metadata for a specific datasource as follows. + +**Request Parameters** + +| Parameter | Type | Required | Description | +|--------------|--------|----------|-------------------------------------------| +| datasource | string | Yes | The name of the datasource. | +| cluster_name | string | optional | The name of the cluster | +| namespace | string | optional | The namespace | +| verbose | string | optional | Flag to retrieve container-level metadata | + +In the context of `GET /dsmetadata` REST API, the term `verbose` refers to a parameter or option that controls +granularity of metadata included in the API response. When the verbose parameter is set to true, the API response +includes granular container-level details in the metadata, offering a more comprehensive view of the clusters, namespaces, +workloads and containers associated with the specified datasource. When the verbose parameter is not provided or set to +false, the API response provides basic information like list of clusters, namespaces associated with the specified datasource. + +**Request with datasource name parameter** + +`GET /dsmetadata` + +`curl -H 'Accept: application/json' http://:/dsmetadata?datasource=` + +Returns the list of cluster details of the specified datasource + +**Response for datasource name - `prometheus-1`** + +***Note:*** +- Currently, only `default` cluster is supported for POC. +- When the `verbose` parameter is not provided, is set to `false` by default - the response provides basic information +about the clusters of the specified datasource. + +
+Example Response + +### Example Response + +```json +{ + "datasources": { + "prometheus-1": { + "datasource_name": "prometheus-1", + "clusters": { + "default": { + "cluster_name": "default" + } + } + } + } +} +``` +
+ +
+ +**Request with verbose set to true and with datasource name parameter** + +`GET /dsmetadata` + +`curl -H 'Accept: application/json' "http://:/dsmetadata?datasource=&verbose=true"` + +Returns the metadata of all the containers present in the specified datasource + +***Note : When we don't pass `verbose` in the query URL, it is set to `false` by default.*** + +**Response for datasource name - `prometheus-1` and verbose - `true`** + +With `verbose` parameter set to `true`, the response includes detailed metadata about all namespaces, workloads and +containers in addition to cluster information with the specified datasource. + +
+Example Response + +### Example Response + +```json +{ + "datasources": { + "prometheus-1": { + "datasource_name": "prometheus-1", + "clusters": { + "default": { + "cluster_name": "default", + "namespaces": { + "default": { + "namespace": "default" + }, + "cadvisor": { + "namespace": "cadvisor", + "workloads": { + "cadvisor": { + "workload_name": "cadvisor", + "workload_type": "daemonset", + "containers": { + "cadvisor": { + "container_name": "cadvisor", + "container_image_name": "gcr.io/cadvisor/cadvisor:v0.45.0" + } + } + } + } + }, + "kube-node-lease": { + "namespace": "kube-node-lease" + }, + "kube-system": { + "namespace": "kube-system", + "workloads": { + "coredns": { + "workload_name": "coredns", + "workload_type": "deployment", + "containers": { + "coredns": { + "container_name": "coredns", + "container_image_name": "k8s.gcr.io/coredns/coredns:v1.8.6" + } + } + }, + "kube-proxy": { + "workload_name": "kube-proxy", + "workload_type": "daemonset", + "containers": { + "kube-proxy": { + "container_name": "kube-proxy", + "container_image_name": "k8s.gcr.io/kube-proxy:v1.24.3" + } + } + } + } + }, + "monitoring": { + "namespace": "monitoring", + "workloads": { + "kube-state-metrics": { + "workload_name": "kube-state-metrics", + "workload_type": "deployment", + "containers": { + "kube-state-metrics": { + "container_name": "kube-state-metrics", + "container_image_name": "k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0" + }, + "kube-rbac-proxy-self": { + "container_name": "kube-rbac-proxy-self", + "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0" + }, + "kube-rbac-proxy-main": { + "container_name": "kube-rbac-proxy-main", + "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0" + } + } + }, + "node-exporter": { + "workload_name": "node-exporter", + "workload_type": "daemonset", + "containers": { + "node-exporter": { + "container_name": "node-exporter", + "container_image_name": "quay.io/prometheus/node-exporter:v1.1.2" + }, + "kube-rbac-proxy": { + "container_name": "kube-rbac-proxy", + "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0" + } + } + }, + "postgres-deployment": { + "workload_name": "postgres-deployment", + "workload_type": "deployment", + "containers": { + "postgres": { + "container_name": "postgres", + "container_image_name": "quay.io/kruizehub/postgres:15.2" + } + } + }, + "alertmanager-main": { + "workload_name": "alertmanager-main", + "workload_type": "statefulset", + "containers": { + "config-reloader": { + "container_name": "config-reloader", + "container_image_name": "quay.io/prometheus-operator/prometheus-config-reloader:v0.47.0" + }, + "alertmanager": { + "container_name": "alertmanager", + "container_image_name": "quay.io/prometheus/alertmanager:v0.21.0" + } + } + }, + "prometheus-adapter": { + "workload_name": "prometheus-adapter", + "workload_type": "deployment", + "containers": { + "prometheus-adapter": { + "container_name": "prometheus-adapter", + "container_image_name": "directxman12/k8s-prometheus-adapter:v0.8.4" + } + } + }, + "kruize": { + "workload_name": "kruize", + "workload_type": "deployment", + "containers": { + "kruize": { + "container_name": "kruize", + "container_image_name": "quay.io/kruize/autotune_operator:0.0.21_mvp" + } + } + }, + "grafana": { + "workload_name": "grafana", + "workload_type": "deployment", + "containers": { + "grafana": { + "container_name": "grafana", + "container_image_name": "grafana/grafana:7.5.4" + } + } + }, + "prometheus-k8s": { + "workload_name": "prometheus-k8s", + "workload_type": "statefulset", + "containers": { + "config-reloader": { + "container_name": "config-reloader", + "container_image_name": "quay.io/prometheus-operator/prometheus-config-reloader:v0.47.0" + }, + "prometheus": { + "container_name": "prometheus", + "container_image_name": "quay.io/prometheus/prometheus:v2.26.0" + } + } + }, + "blackbox-exporter": { + "workload_name": "blackbox-exporter", + "workload_type": "deployment", + "containers": { + "kube-rbac-proxy": { + "container_name": "kube-rbac-proxy", + "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0" + }, + "module-configmap-reloader": { + "container_name": "module-configmap-reloader", + "container_image_name": "jimmidyson/configmap-reload:v0.5.0" + }, + "blackbox-exporter": { + "container_name": "blackbox-exporter", + "container_image_name": "quay.io/prometheus/blackbox-exporter:v0.18.0" + } + } + }, + "prometheus-operator": { + "workload_name": "prometheus-operator", + "workload_type": "deployment", + "containers": { + "kube-rbac-proxy": { + "container_name": "kube-rbac-proxy", + "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0" + }, + "prometheus-operator": { + "container_name": "prometheus-operator", + "container_image_name": "quay.io/prometheus-operator/prometheus-operator:v0.47.0" + } + } + } + } + }, + "kube-public": { + "namespace": "kube-public" + } + } + } + } + } + } +} +``` + +
+ +
+ +**Request with datasource name and cluster name parameter** + +`GET /dsmetadata` + +`curl -H 'Accept: application/json' "http://:/dsmetadata?datasource=&cluster_name="` + +Returns the list of namespaces present in the specified cluster name and datasource + +**Response for datasource name - `prometheus-1` and cluster name - `default`** + +With `verbose` parameter set to `false`, the response includes list of namespaces present in the specified cluster name +and datasource. + +
+Example Response + +### Example Response + +```json +{ + "datasources": { + "prometheus-1": { + "datasource_name": "prometheus-1", + "clusters": { + "default": { + "cluster_name": "default", + "namespaces": { + "default": { + "namespace": "default" + }, + "cadvisor": { + "namespace": "cadvisor" + }, + "kube-node-lease": { + "namespace": "kube-node-lease" + }, + "kube-system": { + "namespace": "kube-system" + }, + "monitoring": { + "namespace": "monitoring" + }, + "kube-public": { + "namespace": "kube-public" + } + } + } + } + } + } +} +``` + +
+ +
+ +**Request with datasource name, cluster name and verbose parameters** + +`GET /dsmetadata` + +`curl -H 'Accept: application/json' "http://:/dsmetadata?datasource=&cluster_name=&verbose=true"` + +Returns the container-level metadata of all the namespaces present in the specified cluster name and datasource + +**Response for datasource name - `prometheus-1`, cluster name - `default` and verbose - `true`** + +With `verbose` parameter set to `true`, the response includes detailed metadata about workloads and containers +in addition to namespace information with the specified cluster name and datasource. + +
+Example Response + +### Example Response + +```json +{ + "datasources": { + "prometheus-1": { + "datasource_name": "prometheus-1", + "clusters": { + "default": { + "cluster_name": "default", + "namespaces": { + "default": { + "namespace": "default" + }, + "cadvisor": { + "namespace": "cadvisor", + "workloads": { + "cadvisor": { + "workload_name": "cadvisor", + "workload_type": "daemonset", + "containers": { + "cadvisor": { + "container_name": "cadvisor", + "container_image_name": "gcr.io/cadvisor/cadvisor:v0.45.0" + } + } + } + } + }, + "kube-node-lease": { + "namespace": "kube-node-lease" + }, + "kube-system": { + "namespace": "kube-system", + "workloads": { + "coredns": { + "workload_name": "coredns", + "workload_type": "deployment", + "containers": { + "coredns": { + "container_name": "coredns", + "container_image_name": "k8s.gcr.io/coredns/coredns:v1.8.6" + } + } + }, + "kube-proxy": { + "workload_name": "kube-proxy", + "workload_type": "daemonset", + "containers": { + "kube-proxy": { + "container_name": "kube-proxy", + "container_image_name": "k8s.gcr.io/kube-proxy:v1.24.3" + } + } + } + } + }, + "monitoring": { + "namespace": "monitoring", + "workloads": { + "kube-state-metrics": { + "workload_name": "kube-state-metrics", + "workload_type": "deployment", + "containers": { + "kube-state-metrics": { + "container_name": "kube-state-metrics", + "container_image_name": "k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0" + }, + "kube-rbac-proxy-self": { + "container_name": "kube-rbac-proxy-self", + "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0" + }, + "kube-rbac-proxy-main": { + "container_name": "kube-rbac-proxy-main", + "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0" + } + } + }, + "node-exporter": { + "workload_name": "node-exporter", + "workload_type": "daemonset", + "containers": { + "node-exporter": { + "container_name": "node-exporter", + "container_image_name": "quay.io/prometheus/node-exporter:v1.1.2" + }, + "kube-rbac-proxy": { + "container_name": "kube-rbac-proxy", + "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0" + } + } + }, + "postgres-deployment": { + "workload_name": "postgres-deployment", + "workload_type": "deployment", + "containers": { + "postgres": { + "container_name": "postgres", + "container_image_name": "quay.io/kruizehub/postgres:15.2" + } + } + }, + "alertmanager-main": { + "workload_name": "alertmanager-main", + "workload_type": "statefulset", + "containers": { + "config-reloader": { + "container_name": "config-reloader", + "container_image_name": "quay.io/prometheus-operator/prometheus-config-reloader:v0.47.0" + }, + "alertmanager": { + "container_name": "alertmanager", + "container_image_name": "quay.io/prometheus/alertmanager:v0.21.0" + } + } + }, + "prometheus-adapter": { + "workload_name": "prometheus-adapter", + "workload_type": "deployment", + "containers": { + "prometheus-adapter": { + "container_name": "prometheus-adapter", + "container_image_name": "directxman12/k8s-prometheus-adapter:v0.8.4" + } + } + }, + "kruize": { + "workload_name": "kruize", + "workload_type": "deployment", + "containers": { + "kruize": { + "container_name": "kruize", + "container_image_name": "quay.io/kruize/autotune_operator:0.0.21_mvp" + } + } + }, + "grafana": { + "workload_name": "grafana", + "workload_type": "deployment", + "containers": { + "grafana": { + "container_name": "grafana", + "container_image_name": "grafana/grafana:7.5.4" + } + } + }, + "prometheus-k8s": { + "workload_name": "prometheus-k8s", + "workload_type": "statefulset", + "containers": { + "config-reloader": { + "container_name": "config-reloader", + "container_image_name": "quay.io/prometheus-operator/prometheus-config-reloader:v0.47.0" + }, + "prometheus": { + "container_name": "prometheus", + "container_image_name": "quay.io/prometheus/prometheus:v2.26.0" + } + } + }, + "blackbox-exporter": { + "workload_name": "blackbox-exporter", + "workload_type": "deployment", + "containers": { + "kube-rbac-proxy": { + "container_name": "kube-rbac-proxy", + "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0" + }, + "module-configmap-reloader": { + "container_name": "module-configmap-reloader", + "container_image_name": "jimmidyson/configmap-reload:v0.5.0" + }, + "blackbox-exporter": { + "container_name": "blackbox-exporter", + "container_image_name": "quay.io/prometheus/blackbox-exporter:v0.18.0" + } + } + }, + "prometheus-operator": { + "workload_name": "prometheus-operator", + "workload_type": "deployment", + "containers": { + "kube-rbac-proxy": { + "container_name": "kube-rbac-proxy", + "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0" + }, + "prometheus-operator": { + "container_name": "prometheus-operator", + "container_image_name": "quay.io/prometheus-operator/prometheus-operator:v0.47.0" + } + } + } + } + }, + "kube-public": { + "namespace": "kube-public" + } + } + } + } + } + } +} +``` +
+ +
+ +**Request with datasource name, cluster name and namespace parameters** + +`GET /dsmetadata` + +`curl -H 'Accept: application/json' "http://:/dsmetadata?datasource=&cluster_name=&namespace="` + +Returns the container-level metadata of the specified namespace, cluster name and datasource + +***Note : `verbose` in the query URL to fetch container-level metadata is set to `true` by default*** + +**Response for datasource name - `prometheus-1`, cluster name - `default` and namespace - `monitoring`** + +The response includes granular metadata about workloads and associated containers within specified namespace, cluster +and datasource. + +
+Example Response + +### Example Response + +```json +{ + "datasources": { + "prometheus-1": { + "datasource_name": "prometheus-1", + "clusters": { + "default": { + "cluster_name": "default", + "namespaces": { + "monitoring": { + "namespace": "monitoring", + "workloads": { + "kube-state-metrics": { + "workload_name": "kube-state-metrics", + "workload_type": "deployment", + "containers": { + "kube-state-metrics": { + "container_name": "kube-state-metrics", + "container_image_name": "k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0" + }, + "kube-rbac-proxy-self": { + "container_name": "kube-rbac-proxy-self", + "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0" + }, + "kube-rbac-proxy-main": { + "container_name": "kube-rbac-proxy-main", + "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0" + } + } + }, + "node-exporter": { + "workload_name": "node-exporter", + "workload_type": "daemonset", + "containers": { + "node-exporter": { + "container_name": "node-exporter", + "container_image_name": "quay.io/prometheus/node-exporter:v1.1.2" + }, + "kube-rbac-proxy": { + "container_name": "kube-rbac-proxy", + "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0" + } + } + }, + "postgres-deployment": { + "workload_name": "postgres-deployment", + "workload_type": "deployment", + "containers": { + "postgres": { + "container_name": "postgres", + "container_image_name": "quay.io/kruizehub/postgres:15.2" + } + } + }, + "alertmanager-main": { + "workload_name": "alertmanager-main", + "workload_type": "statefulset", + "containers": { + "config-reloader": { + "container_name": "config-reloader", + "container_image_name": "quay.io/prometheus-operator/prometheus-config-reloader:v0.47.0" + }, + "alertmanager": { + "container_name": "alertmanager", + "container_image_name": "quay.io/prometheus/alertmanager:v0.21.0" + } + } + }, + "prometheus-adapter": { + "workload_name": "prometheus-adapter", + "workload_type": "deployment", + "containers": { + "prometheus-adapter": { + "container_name": "prometheus-adapter", + "container_image_name": "directxman12/k8s-prometheus-adapter:v0.8.4" + } + } + }, + "kruize": { + "workload_name": "kruize", + "workload_type": "deployment", + "containers": { + "kruize": { + "container_name": "kruize", + "container_image_name": "quay.io/kruize/autotune_operator:0.0.21_mvp" + } + } + }, + "grafana": { + "workload_name": "grafana", + "workload_type": "deployment", + "containers": { + "grafana": { + "container_name": "grafana", + "container_image_name": "grafana/grafana:7.5.4" + } + } + }, + "prometheus-k8s": { + "workload_name": "prometheus-k8s", + "workload_type": "statefulset", + "containers": { + "config-reloader": { + "container_name": "config-reloader", + "container_image_name": "quay.io/prometheus-operator/prometheus-config-reloader:v0.47.0" + }, + "prometheus": { + "container_name": "prometheus", + "container_image_name": "quay.io/prometheus/prometheus:v2.26.0" + } + } + }, + "blackbox-exporter": { + "workload_name": "blackbox-exporter", + "workload_type": "deployment", + "containers": { + "kube-rbac-proxy": { + "container_name": "kube-rbac-proxy", + "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0" + }, + "module-configmap-reloader": { + "container_name": "module-configmap-reloader", + "container_image_name": "jimmidyson/configmap-reload:v0.5.0" + }, + "blackbox-exporter": { + "container_name": "blackbox-exporter", + "container_image_name": "quay.io/prometheus/blackbox-exporter:v0.18.0" + } + } + }, + "prometheus-operator": { + "workload_name": "prometheus-operator", + "workload_type": "deployment", + "containers": { + "kube-rbac-proxy": { + "container_name": "kube-rbac-proxy", + "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0" + }, + "prometheus-operator": { + "container_name": "prometheus-operator", + "container_image_name": "quay.io/prometheus-operator/prometheus-operator:v0.47.0" + } + } + } + } + } + } + } + } + } + } +} +``` + +
+ +
+ + + +### Delete Metadata API + +This is quick guide instructions to delete metadata using input JSON as follows. + +**Request** +`DELETE /dsmetadata` + +`curl -H 'Accept: application/json' -X DELETE --data 'copy paste below JSON' http://:/dsmetadata` + +
+ +Example Request + +### Example Request + +```json +{ + "version": "v1.0", + "datasource_name": "prometheus-1" +} +``` + +
+ + +**Response** + +
+Example Response + +### Example Response + +```json +{ + "message": "Datasource metadata deleted successfully. View imported metadata at GET /dsmetadata", + "httpcode": 201, + "documentationLink": "", + "status": "SUCCESS" +} +``` + +
+ +
+ + + +### Create Experiment API + +This is quick guide instructions to create experiments using input JSON as follows. For a more detailed guide, +see [Create Experiment](/design/CreateExperiment.md) + +**Request** +`POST /createExperiment` + +`curl -H 'Accept: application/json' -X POST --data 'copy paste below JSON' http://:/createExperiment` + +
+ +Example Request for datasource - `prometheus-1` + +### Example Request + +```json +[ + { + "version": "v2.0", + "experiment_name": "default|default|deployment|tfb-qrh-deployment", + "cluster_name": "default", + "performance_profile": "resource-optimization-openshift", + "mode": "monitor", + "target_cluster": "local", + "kubernetes_objects": [ + { + "type": "deployment", + "name": "tfb-qrh-deployment", + "namespace": "default", + "containers": [ + { + "container_image_name": "kruize/tfb-db:1.15", + "container_name": "tfb-server-0" + }, + { + "container_image_name": "kruize/tfb-qrh:1.13.2.F_et17", + "container_name": "tfb-server-1" + } + ] + } + ], + "trial_settings": { + "measurement_duration": "15min" + }, + "recommendation_settings": { + "threshold": "0.1" + }, + "datasource": "prometheus-1" + } +] +``` + +
+ + +**Response** + +
+Example Response + +### Example Response + +```json +{ + "message": "Experiment registered successfully with Autotune. View registered experiments at /listExperiments", + "httpcode": 201, + "documentationLink": "", + "status": "SUCCESS" +} +``` + +
+ + + +### List Experiments API + +**Request with experiment name parameter** + +`GET /listExperiments` + +`curl -H 'Accept: application/json' http://:/listExperiments?experiment_name=` + +Returns the experiment details of the specified experiment +

+ +**Request with recommendations set to true** + +`GET /listExperiments` + +`curl -H 'Accept: application/json' http://:/listExperiments?recommendations=true` + +Returns the latest recommendations of all the experiments + +**Response for experiment name - `default|default_0|deployment|tfb-qrh-deployment_0`** + +
+Example Response + +### Example Response + +```json +[ + { + "version": "v2.0", + "experiment_id": "f0007796e65c999d843bebd447c2fbaa6aaf9127c614da55e333cd6bdb628a74", + "experiment_name": "default|default_0|deployment|tfb-qrh-deployment_0", + "cluster_name": "default", + "datasource": "prometheus-1", + "mode": "monitor", + "target_cluster": "local", + "status": "IN_PROGRESS", + "performance_profile": "resource-optimization-openshift", + "trial_settings": { + "measurement_duration": "15min" + }, + "recommendation_settings": { + "threshold": "0.1" + }, + "experiment_usecase_type": { + "remote_monitoring": false, + "local_monitoring": true, + "local_experiment": false + }, + "validation_data": { + "success": true, + "message": "Registered successfully with Kruize! View registered experiments at /listExperiments", + "errorCode": 201 + }, + "kubernetes_objects": [ + { + "type": "deployment", + "name": "tfb-qrh-deployment_0", + "namespace": "default_0", + "containers": { + "tfb-server-1": { + "container_image_name": "kruize/tfb-qrh:1.13.2.F_et17", + "container_name": "tfb-server-1", + "recommendations": { + "version": "1.0", + "notifications": { + "112101": { + "type": "info", + "message": "Cost Recommendations Available", + "code": 112101 + } + }, + "data": { + "2023-04-02T08:00:00.680Z": { + "cost": { + "short_term": { + "monitoring_start_time": "2023-04-01T06:45:00.000Z", + "monitoring_end_time": "2023-04-02T08:00:00.680Z", + "duration_in_hours": 24.0, + "pods_count": 27, + "confidence_level": 0.0, + "current": { + "requests": { + "memory": { + "amount": 490.93, + "format": "MiB" + }, + "cpu": { + "amount": 1.46, + "format": "cores" + } + }, + "limits": { + "memory": { + "amount": 712.21, + "format": "MiB" + }, + "cpu": { + "amount": 1.54, + "format": "cores" + } + } + }, + "config": { + "requests": { + "memory": { + "amount": 1197.9840000000002, + "format": "MiB" + }, + "cpu": { + "amount": 7.68, + "format": "cores" + } + }, + "limits": { + "memory": { + "amount": 1197.9840000000002, + "format": "MiB" + }, + "cpu": { + "amount": 7.68, + "format": "cores" + } + } + }, + "variation": { + "requests": { + "memory": { + "amount": 707.0540000000001, + "format": "MiB" + }, + "cpu": { + "amount": 6.22, + "format": "cores" + } + }, + "limits": { + "memory": { + "amount": 485.7740000000001, + "format": "MiB" + }, + "cpu": { + "amount": 6.14, + "format": "cores" + } + } + }, + "notifications": {} + }, + "medium_term": { + "pods_count": 0, + "confidence_level": 0.0, + "notifications": { + "120001": { + "type": "info", + "message": "There is not enough data available to generate a recommendation.", + "code": 120001 + } + } + }, + "long_term": { + "pods_count": 0, + "confidence_level": 0.0, + "notifications": { + "120001": { + "type": "info", + "message": "There is not enough data available to generate a recommendation.", + "code": 120001 + } + } + } + } + } + } + } + }, + "tfb-server-0": { + "container_image_name": "kruize/tfb-db:1.15", + "container_name": "tfb-server-0", + "recommendations": { + "version": "1.0", + "notifications": { + "120001": { + "type": "info", + "message": "There is not enough data available to generate a recommendation.", + "code": 120001 + } + }, + "data": {} + } + } + } + } + ] + }, + ... + ... + ... + { + "version": "v2.0", + "experiment_id": "ab0a31a522cebdde52561482300d078ed1448fa7b75834fa216677d1d9d5cda6", + "experiment_name": "default|default_1|deployment|tfb-qrh-deployment_1", + "cluster_name": "default", + "datasource": "prometheus-1", + "mode": "monitor", + "target_cluster": "local", + "status": "IN_PROGRESS", + "performance_profile": "resource-optimization-openshift", + "trial_settings": { + "measurement_duration": "15min" + }, + "recommendation_settings": { + "threshold": "0.1" + }, + "experiment_usecase_type": { + "remote_monitoring": false, + "local_monitoring": true, + "local_experiment": false + }, + "validation_data": { + "success": true, + "message": "Registered successfully with Kruize! View registered experiments at /listExperiments", + "errorCode": 201 + }, + "kubernetes_objects": [ + { + "type": "deployment", + "name": "tfb-qrh-deployment_1", + "namespace": "default_1", + "containers": { + "tfb-server-1": { + "container_image_name": "kruize/tfb-qrh:1.13.2.F_et17", + "container_name": "tfb-server-1", + "recommendations": { + "version": "1.0", + "notifications": { + "120001": { + "type": "info", + "message": "There is not enough data available to generate a recommendation.", + "code": 120001 + } + }, + "data": {} + } + }, + "tfb-server-0": { + "container_image_name": "kruize/tfb-db:1.15", + "container_name": "tfb-server-0", + "recommendations": { + "version": "1.0", + "notifications": { + "120001": { + "type": "info", + "message": "There is not enough data available to generate a recommendation.", + "code": 120001 + } + }, + "data": {} + } + } + } + } + ] + } +] +``` + +
+ + +

+**Request with recommendations set to true with experiment name parameter** + +`GET /listExperiments` + +`curl -H 'Accept: application/json' http://:/listExperiments?recommendations=true&experiment_name=` + +Returns the latest recommendations of the specified experiment with no results +

+ +**Request with recommendations set to true and latest set to false** + +`GET /listExperiments` + +`curl -H 'Accept: application/json' http://:/listExperiments?recommendations=true&latest=false` + +Returns all the recommendations of all the experiments + +**Response for experiment name - `default|default_0|deployment|tfb-qrh-deployment_0`** + +
+Example Response + +### Example Response + +```json +[ + { + "version": "v2.0", + "experiment_id": "f0007796e65c999d843bebd447c2fbaa6aaf9127c614da55e333cd6bdb628a74", + "experiment_name": "default|default_0|deployment|tfb-qrh-deployment_0", + "cluster_name": "default", + "datasource": "prometheus-1", + "mode": "monitor", + "target_cluster": "local", + "status": "IN_PROGRESS", + "performance_profile": "resource-optimization-openshift", + "trial_settings": { + "measurement_duration": "15min" + }, + "recommendation_settings": { + "threshold": "0.1" + }, + "experiment_usecase_type": { + "remote_monitoring": false, + "local_monitoring": true, + "local_experiment": false + }, + "validation_data": { + "success": true, + "message": "Registered successfully with Kruize! View registered experiments at /listExperiments", + "errorCode": 201 + }, + "kubernetes_objects": [ + { + "type": "deployment", + "name": "tfb-qrh-deployment_0", + "namespace": "default_0", + "containers": { + "tfb-server-1": { + "container_image_name": "kruize/tfb-qrh:1.13.2.F_et17", + "container_name": "tfb-server-1", + "recommendations": { + "version": "1.0", + "notifications": { + "112101": { + "type": "info", + "message": "Cost Recommendations Available", + "code": 112101 + } + }, + "data": { + "2023-04-02T06:00:00.770Z": { + "cost": { + "short_term": { + "monitoring_start_time": "2023-04-01T04:45:00.000Z", + "monitoring_end_time": "2023-04-02T06:00:00.770Z", + "duration_in_hours": 24, + "pods_count": 27, + "confidence_level": 0, + "current": { + "requests": { + "memory": { + "amount": 490.93, + "format": "MiB" + }, + "cpu": { + "amount": 1.46, + "format": "cores" + } + }, + "limits": { + "memory": { + "amount": 712.21, + "format": "MiB" + }, + "cpu": { + "amount": 1.54, + "format": "cores" + } + } + }, + "config": { + "requests": { + "memory": { + "amount": 1197.9840000000002, + "format": "MiB" + }, + "cpu": { + "amount": 7.68, + "format": "cores" + } + }, + "limits": { + "memory": { + "amount": 1197.9840000000002, + "format": "MiB" + }, + "cpu": { + "amount": 7.68, + "format": "cores" + } + } + }, + "variation": { + "requests": { + "memory": { + "amount": 707.0540000000001, + "format": "MiB" + }, + "cpu": { + "amount": 6.22, + "format": "cores" + } + }, + "limits": { + "memory": { + "amount": 485.7740000000001, + "format": "MiB" + }, + "cpu": { + "amount": 6.14, + "format": "cores" + } + } + }, + "notifications": {} + }, + "medium_term": { + "pods_count": 0, + "confidence_level": 0, + "notifications": { + "120001": { + "type": "info", + "message": "There is not enough data available to generate a recommendation.", + "code": 120001 + } + } + }, + "long_term": { + "pods_count": 0, + "confidence_level": 0, + "notifications": { + "120001": { + "type": "info", + "message": "There is not enough data available to generate a recommendation.", + "code": 120001 + } + } + } + } + }, + ... + ... + ... + "2023-04-02T04:30:00.000Z": { + "cost": { + "short_term": { + "monitoring_start_time": "2023-04-01T03:15:00.000Z", + "monitoring_end_time": "2023-04-02T04:30:00.000Z", + "duration_in_hours": 24, + "pods_count": 27, + "confidence_level": 0, + "current": { + "requests": { + "memory": { + "amount": 490.93, + "format": "MiB" + }, + "cpu": { + "amount": 1.46, + "format": "cores" + } + }, + "limits": { + "memory": { + "amount": 712.21, + "format": "MiB" + }, + "cpu": { + "amount": 1.54, + "format": "cores" + } + } + }, + "config": { + "requests": { + "memory": { + "amount": 1197.9840000000002, + "format": "MiB" + }, + "cpu": { + "amount": 7.68, + "format": "cores" + } + }, + "limits": { + "memory": { + "amount": 1197.9840000000002, + "format": "MiB" + }, + "cpu": { + "amount": 7.68, + "format": "cores" + } + } + }, + "variation": { + "requests": { + "memory": { + "amount": 707.0540000000001, + "format": "MiB" + }, + "cpu": { + "amount": 6.22, + "format": "cores" + } + }, + "limits": { + "memory": { + "amount": 485.7740000000001, + "format": "MiB" + }, + "cpu": { + "amount": 6.14, + "format": "cores" + } + } + }, + "notifications": {} + }, + "medium_term": { + "pods_count": 0, + "confidence_level": 0, + "notifications": { + "120001": { + "type": "info", + "message": "There is not enough data available to generate a recommendation.", + "code": 120001 + } + } + }, + "long_term": { + "pods_count": 0, + "confidence_level": 0, + "notifications": { + "120001": { + "type": "info", + "message": "There is not enough data available to generate a recommendation.", + "code": 120001 + } + } + } + } + } + } + } + }, + "tfb-server-0": { + "container_image_name": "kruize/tfb-db:1.15", + "container_name": "tfb-server-0", + "recommendations": { + "version": "1.0", + "notifications": { + "120001": { + "type": "info", + "message": "There is not enough data available to generate a recommendation.", + "code": 120001 + } + }, + "data": {} + } + } + } + } + ] + }, + ... + ... + ... +] +``` + +
+ + +

+**Request with recommendations set to true, latest set to false and with experiment name parameter** + +`GET /listExperiments` + +`curl -H 'Accept: application/json' http://:/listExperiments?recommendations=true&latest=false&experiment_name=` + +Returns all the recommendations of the specified experiment +

+ +**List Experiments also allows the user to send a request body to fetch the records based on `cluster_name` and `kubernetes_object`.** +

+*Note: This request body can be sent along with other query params which are mentioned above.* + +`curl -H 'Accept: application/json' -X GET --data 'copy paste below JSON' http://:/listExperiments` + +
+ +Example Request + +### Example Request + +```json +{ + "cluster_name": "default", + "kubernetes_objects": [ + { + "type": "deployment", + "name": "tfb-qrh-deployment", + "namespace": "default", + "containers": [ + { + "container_image_name": "kruize/tfb-db:1.15", + "container_name": "tfb-server-1" + } + ] + } + ] +} +``` + +
+ +--- + + +### Generate Recommendations API + +**Note: This API is specific to the Local Monitoring use case.**
+Generates the recommendation for a specific experiment based on provided parameters similar to update recommendations API. +This can be called directly after creating the experiment and doesn't require the update results API as metrics are +fetched from the provided `datasource` (E.g. Prometheus) instead of the database. + +**Request Parameters** + +| Parameter | Type | Required | Description | +|---------------------|--------|----------|--------------------------------------------------------------------------------------------------------------------------------------------| +| experiment_name | string | Yes | The name of the experiment. | +| interval_end_time | string | optional | The end time of the interval in the format `yyyy-MM-ddTHH:mm:sssZ`. This should be the date on which recommendation needs to be generated. | +| interval_start_time | string | optional | The start time of the interval in the format `yyyy-MM-ddTHH:mm:sssZ`. | + +The recommendation API requires only one mandatory field i.e. `experiment_name`. Other optional parameter like `interval_end_time` will be fetched from the provided datasource. +Similarly, `interval_start_time` will be calculated based on `interval_end_time`, if not provided. By utilizing +these parameters, the API generates recommendations based on short-term, medium-term, and long-term factors. For +instance, if the long-term setting is configured for `15 days` and the interval_end_time is set to `Jan 15 2023 00:00: +00.000Z`, the API retrieves data from the past 15 days, starting from January 1st. Using this data, the API generates +three recommendations for `Jan 15th 2023`. + +It is important to ensure that the difference between `interval_end_time` and `interval_start_time` should not exceed 15 +days. This restriction is in place to prevent potential timeouts, as generating recommendations beyond this threshold +might require more time. + +**Request with experiment name and interval_end_time parameters** + +`POST /generateRecommendations?experiment_name=?&interval_end_time=?` + +`POST /generateRecommendations?experiment_name=?&interval_end_time=?&interval_start_time=?` + +example + +`curl --location --request POST 'http://:/generateRecommendations?interval_end_time=2023-01-02T00:15:00.000Z&experiment_name=temp_1'` + +success status code : 201 + +**Response** + +The response will contain a array of JSON object with the recommendations for the specified experiment. + +
+Example Response Body + +```json +[ + { + "cluster_name": "default", + "kubernetes_objects": [ + { + "type": "deployment", + "name": "tfb-qrh-deployment_5", + "namespace": "default_5", + "containers": [ + { + "container_image_name": "kruize/tfb-qrh:1.13.2.F_et17", + "container_name": "tfb-server-1", + "recommendations": { + "version": "1.0", + "notifications": { + "111000": { + "type": "info", + "message": "Recommendations Are Available", + "code": 111000 + } + }, + "data": { + "2023-04-02T13:30:00.680Z": { + "notifications": { + "111101": { + "type": "info", + "message": "Short Term Recommendations Available", + "code": 111101 + } + }, + "monitoring_end_time": "2023-04-02T13:30:00.680Z", + "current": { + "limits": { + "memory": { + "amount": 1.048576E8, + "format": "bytes" + }, + "cpu": { + "amount": 0.5, + "format": "cores" + } + }, + "requests": { + "memory": { + "amount": 5.264900096E7, + "format": "bytes" + }, + "cpu": { + "amount": 5.37, + "format": "cores" + } + } + }, + "recommendation_terms": { + "short_term": { + "duration_in_hours": 24.0, + "notifications": { + "112101": { + "type": "info", + "message": "Cost Recommendations Available", + "code": 112101 + }, + "112102": { + "type": "info", + "message": "Performance Recommendations Available", + "code": 112102 + } + }, + "monitoring_start_time": "2023-04-01T12:00:00.000Z", + "recommendation_engines": { + "cost": { + "pods_count": 7, + "confidence_level": 0.0, + "config": { + "limits": { + "memory": { + "amount": 2.497708032E8, + "format": "bytes" + }, + "cpu": { + "amount": 0.9299999999999999, + "format": "cores" + } + }, + "requests": { + "memory": { + "amount": 2.497708032E8, + "format": "bytes" + }, + "cpu": { + "amount": 0.9299999999999999, + "format": "cores" + } + } + }, + "variation": { + "limits": { + "memory": { + "amount": 1.449132032E8, + "format": "bytes" + }, + "cpu": { + "amount": -4.44, + "format": "cores" + } + }, + "requests": { + "memory": { + "amount": 1.9712180223999997902848E8, + "format": "bytes" + }, + "cpu": { + "amount": -4.44, + "format": "cores" + } + } + }, + "notifications": {} + }, + "performance": { + "pods_count": 27, + "confidence_level": 0.0, + "config": { + "limits": { + "memory": { + "amount": 2.497708032E8, + "format": "bytes" + }, + "cpu": { + "amount": 0.9299999999999999, + "format": "cores" + } + }, + "requests": { + "memory": { + "amount": 2.497708032E8, + "format": "bytes" + }, + "cpu": { + "amount": 0.9299999999999999, + "format": "cores" + } + } + }, + "variation": { + "limits": { + "memory": { + "amount": 1.449132032E8, + "format": "bytes" + }, + "cpu": { + "amount": -4.44, + "format": "cores" + } + }, + "requests": { + "memory": { + "amount": 1.9712180223999997902848E8, + "format": "bytes" + }, + "cpu": { + "amount": -4.44, + "format": "cores" + } + } + }, + "notifications": {} + } + } + }, + "medium_term": { + "duration_in_hours": 33.8, + "notifications": { + "120001": { + "type": "info", + "message": "There is not enough data available to generate a recommendation.", + "code": 120001 + } + } + }, + "long_term": { + "duration_in_hours": 33.8, + "notifications": { + "120001": { + "type": "info", + "message": "There is not enough data available to generate a recommendation.", + "code": 120001 + } + } + } + } + } + } + } + }, + { + "container_image_name": "kruize/tfb-db:1.15", + "container_name": "tfb-server-0", + "recommendations": { + "version": "1.0", + "notifications": { + "120001": { + "type": "info", + "message": "There is not enough data available to generate a recommendation.", + "code": 120001 + } + }, + "data": {} + } + } + ] + } + ], + "version": "v2.0", + "experiment_name": "temp_1" + } +] +``` + +
+ +**Request without interval_end_time parameter** + +`POST /generateRecommendations?experiment_name=?` + +example + +`curl --location --request POST 'http://:/generateRecommendations?experiment_name=temp_1'` + +success status code : 201 + +**Response** + +The response will contain an array of JSON object with the recommendations for the specified experiment. + +When `interval_end_time` is not specified, Kruize will determine the latest timestamp from the specified datasource +(E.g. Prometheus) by checking the latest active container CPU usage. + +
+Example Response Body + +```json +[ + { + "cluster_name": "default", + "kubernetes_objects": [ + { + "type": "deployment", + "name": "tfb-qrh-deployment_5", + "namespace": "default_5", + "containers": [ + { + "container_image_name": "kruize/tfb-qrh:1.13.2.F_et17", + "container_name": "tfb-server-1", + "recommendations": { + "version": "1.0", + "notifications": { + "111000": { + "type": "info", + "message": "Recommendations Are Available", + "code": 111000 + } + }, + "data": { + "2023-04-02T13:30:00.680Z": { + "notifications": { + "111101": { + "type": "info", + "message": "Short Term Recommendations Available", + "code": 111101 + } + }, + "monitoring_end_time": "2023-04-02T13:30:00.680Z", + "current": { + "limits": { + "memory": { + "amount": 1.048576E8, + "format": "bytes" + }, + "cpu": { + "amount": 0.5, + "format": "cores" + } + }, + "requests": { + "memory": { + "amount": 5.264900096E7, + "format": "bytes" + }, + "cpu": { + "amount": 5.37, + "format": "cores" + } + } + }, + "recommendation_terms": { + "short_term": { + "duration_in_hours": 24.0, + "notifications": { + "112101": { + "type": "info", + "message": "Cost Recommendations Available", + "code": 112101 + }, + "112102": { + "type": "info", + "message": "Performance Recommendations Available", + "code": 112102 + } + }, + "monitoring_start_time": "2023-04-01T12:00:00.000Z", + "recommendation_engines": { + "cost": { + "pods_count": 7, + "confidence_level": 0.0, + "config": { + "limits": { + "memory": { + "amount": 2.497708032E8, + "format": "bytes" + }, + "cpu": { + "amount": 0.9299999999999999, + "format": "cores" + } + }, + "requests": { + "memory": { + "amount": 2.497708032E8, + "format": "bytes" + }, + "cpu": { + "amount": 0.9299999999999999, + "format": "cores" + } + } + }, + "variation": { + "limits": { + "memory": { + "amount": 1.449132032E8, + "format": "bytes" + }, + "cpu": { + "amount": -4.44, + "format": "cores" + } + }, + "requests": { + "memory": { + "amount": 1.9712180223999997902848E8, + "format": "bytes" + }, + "cpu": { + "amount": -4.44, + "format": "cores" + } + } + }, + "notifications": {} + }, + "performance": { + "pods_count": 27, + "confidence_level": 0.0, + "config": { + "limits": { + "memory": { + "amount": 2.497708032E8, + "format": "bytes" + }, + "cpu": { + "amount": 0.9299999999999999, + "format": "cores" + } + }, + "requests": { + "memory": { + "amount": 2.497708032E8, + "format": "bytes" + }, + "cpu": { + "amount": 0.9299999999999999, + "format": "cores" + } + } + }, + "variation": { + "limits": { + "memory": { + "amount": 1.449132032E8, + "format": "bytes" + }, + "cpu": { + "amount": -4.44, + "format": "cores" + } + }, + "requests": { + "memory": { + "amount": 1.9712180223999997902848E8, + "format": "bytes" + }, + "cpu": { + "amount": -4.44, + "format": "cores" + } + } + }, + "notifications": {} + } + } + }, + "medium_term": { + "duration_in_hours": 33.8, + "notifications": { + "120001": { + "type": "info", + "message": "There is not enough data available to generate a recommendation.", + "code": 120001 + } + } + }, + "long_term": { + "duration_in_hours": 33.8, + "notifications": { + "120001": { + "type": "info", + "message": "There is not enough data available to generate a recommendation.", + "code": 120001 + } + } + } + } + } + } + } + }, + { + "container_image_name": "kruize/tfb-db:1.15", + "container_name": "tfb-server-0", + "recommendations": { + "version": "1.0", + "notifications": { + "120001": { + "type": "info", + "message": "There is not enough data available to generate a recommendation.", + "code": 120001 + } + }, + "data": {} + } + } + ] + } + ], + "version": "v2.0", + "experiment_name": "temp_1" + } +] +``` + +
+ + +**Error Responses** + +| HTTP Status Code | Description | +|------------------|----------------------------------------------------------------------------------------------------| +| 400 | experiment_name is mandatory. | +| 400 | Given timestamp - \" 2023-011-02T00:00:00.000Z \" is not a valid timestamp format. | +| 400 | Not Found: experiment_name does not exist: exp_1. | +| 400 | No metrics available from `2024-01-15T00:00:00.000Z` to `2023-12-31T00:00:00.000Z`. | +| 400 | The gap between the interval_start_time and interval_end_time must be within a maximum of 15 days! | +| 400 | The Start time should precede the End time! | | +| 500 | Internal Server Error | + diff --git a/design/KruizePromQL.md b/design/KruizePromQL.md index 3c708659d..54ad31118 100644 --- a/design/KruizePromQL.md +++ b/design/KruizePromQL.md @@ -1,6 +1,7 @@ # Custom Prometheus Queries for Kruize -These are the custom Prometheus queries that you can use while running Kruize. These queries provide valuable insights into the performance of Kruize APIs and KruizeDB methods. +These are the custom Prometheus queries that you can use while running Kruize. These queries provide valuable insights +into the performance of Kruize APIs and KruizeDB methods. ## KruizeAPI Metrics @@ -16,24 +17,34 @@ The following are the available Kruize APIs that you can monitor: To monitor the performance of these APIs, you can use the following metrics: -- `kruizeAPI_count`: This metric provides the count of invocations for a specific API. It measures how many times the API has been called. -- `kruizeAPI_sum`: This metric provides the sum of the time taken by a specific API. It measures the total time consumed by the API across all invocations. -- `kruizeAPI_max`: This metric provides the maximum time taken by a specific API. It measures the highest execution time observed for the API. +- `kruizeAPI_count`: This metric provides the count of invocations for a specific API. It measures how many times the + API has been called. +- `kruizeAPI_sum`: This metric provides the sum of the time taken by a specific API. It measures the total time consumed + by the API across all invocations. +- `kruizeAPI_max`: This metric provides the maximum time taken by a specific API. It measures the highest execution time + observed for the API. Here are some sample metrics for the mentioned APIs which can run in Prometheus: -- `kruizeAPI_count{api="createExperiment", application="Kruize", method="POST", status="success"}`: Returns the count of successful invocations for the `createExperiment` API. -- `kruizeAPI_count{api="createExperiment", application="Kruize", method="POST", status="failure"}`: Returns the count of failed invocations for the `createExperiment` API. -- `kruizeAPI_sum{api="createExperiment", application="Kruize", method="POST", status="success"}`: Returns the sum of the time taken by the successful invocations of `createExperiment` API. -- `kruizeAPI_max{api="createExperiment", application="Kruize", method="POST", status="success"}`: Returns the maximum time taken by the successful invocation of `createExperiment` API. +- `kruizeAPI_count{api="createExperiment", application="Kruize", method="POST", status="success"}`: Returns the count of + successful invocations for the `createExperiment` API. +- `kruizeAPI_count{api="createExperiment", application="Kruize", method="POST", status="failure"}`: Returns the count of + failed invocations for the `createExperiment` API. +- `kruizeAPI_sum{api="createExperiment", application="Kruize", method="POST", status="success"}`: Returns the sum of the + time taken by the successful invocations of `createExperiment` API. +- `kruizeAPI_max{api="createExperiment", application="Kruize", method="POST", status="success"}`: Returns the maximum + time taken by the successful invocation of `createExperiment` API. -By changing the value of the `api` and `method` label, you can gather metrics for other Kruize APIs such as `listRecommendations`, `listExperiments`, and `updateResults`. +By changing the value of the `api` and `method` label, you can gather metrics for other Kruize APIs such +as `listRecommendations`, `listExperiments`, and `updateResults`. Here is a sample command to collect the metric through `curl` -- `curl --silent -G -kH "Authorization: Bearer ${TOKEN}" --data-urlencode 'query=kruizeAPI_sum{api="listRecommendations", application="Kruize", method="GET", status="success"}' ${PROMETHEUS_URL} | jq` : -Returns the sum of the time taken by `listRecommendations` API. - + +- `curl --silent -G -kH "Authorization: Bearer ${TOKEN}" --data-urlencode 'query=kruizeAPI_sum{api="listRecommendations", application="Kruize", method="GET", status="success"}' ${PROMETHEUS_URL} | jq` : + Returns the sum of the time taken by `listRecommendations` API. + Sample Output: + ``` { "status": "success", @@ -72,7 +83,8 @@ The following are the available Kruize DB methods that you can monitor: - `addExperimentToDB`: Method for adding an experiment to the database. - `addResultToDB`: Method for adding experiment results to the database. -- `addBulkResultsToDBAndFetchFailedResults`: Method for adding bulk experiment results to the database and fetch the failed results. +- `addBulkResultsToDBAndFetchFailedResults`: Method for adding bulk experiment results to the database and fetch the + failed results. - `addRecommendationToDB`: Method for adding a recommendation to the database. - `loadExperimentByName`: Method for loading an experiment by name. - `loadResultsByExperimentName`: Method for loading experiment results by experiment name. @@ -82,28 +94,51 @@ The following are the available Kruize DB methods that you can monitor: - `loadPerformanceProfileByName`: Method to load a specific performance profile. - `loadAllPerformanceProfiles`: Method to load all performance profiles. +## KruizeMethod Metrics + +The following are the available Kruize methods that you can monitor: + +- `generatePlots`: Method to generate box plot metrics for all terms. + +Sample Output: + +``` +KruizeMethod_max{application="Kruize",method="generatePlots",status="success",} 0.036112854 +KruizeMethod_count{application="Kruize",method="generatePlots",status="success",} 2.0 +KruizeMethod_sum{application="Kruize",method="generatePlots",status="success",} 0.050705769 +``` + ## Time taken for KruizeDB metrics To monitor the performance of these methods, you can use the following metrics: -- `kruizeDB_count`: This metric provides the count of calls made to the specific DB method. It measures how many times the DB method has been called. -- `kruizeDB_sum`: This metric provides the sum of the time taken by a specific DB method. It measures the total time consumed by the DB method across all invocations. -- `kruizeDB_max`: This metric provides the maximum time taken by a specific DB method. It measures the highest execution time observed for the DB method. +- `kruizeDB_count`: This metric provides the count of calls made to the specific DB method. It measures how many times + the DB method has been called. +- `kruizeDB_sum`: This metric provides the sum of the time taken by a specific DB method. It measures the total time + consumed by the DB method across all invocations. +- `kruizeDB_max`: This metric provides the maximum time taken by a specific DB method. It measures the highest execution + time observed for the DB method. Here are some sample metrics for the mentioned DB methods which can run in Prometheus: -- `kruizeDB_count{application="Kruize", method="addExperimentToDB", status="success"}`: Number of successful invocations of `addExperimentToDB` method. -- `kruizeDB_count{application="Kruize", method="addExperimentToDB", status="failure"}`: Number of failed invocations of `addExperimentToDB` method. -- `kruizeDB_sum{application="Kruize", method="addExperimentToDB", status="success"}`: Total time taken by the `addExperimentToDB` method which were success. -- `kruizeDB_max{application="Kruize", method="addExperimentToDB", status="success"}`: Maximum time taken by the `addExperimentToDB` method which were success. +- `kruizeDB_count{application="Kruize", method="addExperimentToDB", status="success"}`: Number of successful invocations + of `addExperimentToDB` method. +- `kruizeDB_count{application="Kruize", method="addExperimentToDB", status="failure"}`: Number of failed invocations + of `addExperimentToDB` method. +- `kruizeDB_sum{application="Kruize", method="addExperimentToDB", status="success"}`: Total time taken by + the `addExperimentToDB` method which were success. +- `kruizeDB_max{application="Kruize", method="addExperimentToDB", status="success"}`: Maximum time taken by + the `addExperimentToDB` method which were success. By changing the value of the `method` label, you can gather metrics for other KruizeDB metrics. Here is a sample command to collect the metric through `curl` + - `curl --silent -G -kH "Authorization: Bearer ${TOKEN}" --data-urlencode 'query=kruizeDB_sum{application="Kruize", method="loadRecommendationsByExperimentName", status="success"}' ${PROMETHEUS_URL} | jq` : Returns the sum of the time taken by `loadRecommendationsByExperimentName` method. Sample Output: + ``` { "status": "success", @@ -139,15 +174,20 @@ Sample Output: # Kruize Metrics Collection and Analysis -To facilitate the performance analysis of the Kruize application, we provide a comprehensive script, [kruize_metrics.py](../scripts/kruize_metrics.py), which enables the collection of Kruize metrics in CSV format. -This script generates two distinct output files: increase_kruizemetrics.csv and total_kruizemetrics.csv. Notably, the PostgresDB metrics maintain consistency across both files. +To facilitate the performance analysis of the Kruize application, we provide a comprehensive +script, [kruize_metrics.py](../scripts/kruize_metrics.py), which enables the collection of Kruize metrics in CSV format. +This script generates two distinct output files: increase_kruizemetrics.csv and total_kruizemetrics.csv. Notably, the +PostgresDB metrics maintain consistency across both files. ### Output Files and Format -- `increase_kruizemetrics.csv`: This file leverages increase() queries to ascertain the total incremental changes in Kruize metric values over time. -- `total_kruizemetrics.csv`: This file employs the original queries to compute cumulative metric values since the inception of the Kruize application. +- `increase_kruizemetrics.csv`: This file leverages increase() queries to ascertain the total incremental changes in + Kruize metric values over time. +- `total_kruizemetrics.csv`: This file employs the original queries to compute cumulative metric values since the + inception of the Kruize application. -Each column within the CSV files corresponds to specific API and DB metrics, capturing counts, sums, and maximum values for both successful and failed operations. +Each column within the CSV files corresponds to specific API and DB metrics, capturing counts, sums, and maximum values +for both successful and failed operations. ### Some key columns for insightful analysis: @@ -175,19 +215,25 @@ Each column within the CSV files corresponds to specific API and DB metrics, cap | kruizeDB_size | Current size of the Kruize database. | | kruizeDB_results | Total count of results available in the database across all experiments. | - # Initial Analysis Insights Upon analyzing the collected metrics, several crucial insights emerge: -- `Database Growth`: As the number of experiments and associated results increases, there is a proportional growth in the size of the database. +- `Database Growth`: As the number of experiments and associated results increases, there is a proportional growth in + the size of the database. -- `Update Recommendations Time`: Currently, the time required for updating recommendations exhibits an increasing trend with the growth in results. This aspect necessitates closer attention and potential optimization efforts. +- `Update Recommendations Time`: Currently, the time required for updating recommendations exhibits an increasing trend + with the growth in results. This aspect necessitates closer attention and potential optimization efforts. -- `Stable Update Results Time`: The time taken for updating experiment results is expected to remain relatively stable. Any deviations from this expected pattern warrant further investigation for potential performance issues. +- `Stable Update Results Time`: The time taken for updating experiment results is expected to remain relatively stable. + Any deviations from this expected pattern warrant further investigation for potential performance issues. -- `DB Method Aggregation`: While individual DB method metrics provide valuable insights, it is important to understand how they collectively contribute to the overall API metrics. A comprehensive analysis of both individual and aggregated DB metrics is essential for a holistic performance assessment. +- `DB Method Aggregation`: While individual DB method metrics provide valuable insights, it is important to understand + how they collectively contribute to the overall API metrics. A comprehensive analysis of both individual and + aggregated DB metrics is essential for a holistic performance assessment. -- `Max Value Analysis`: Evaluating the maximum values allows for the identification of peak performance periods for each method, aiding in the identification of potential performance bottlenecks. +- `Max Value Analysis`: Evaluating the maximum values allows for the identification of peak performance periods for each + method, aiding in the identification of potential performance bottlenecks. -By conducting a thorough analysis based on these initial insights, users can effectively monitor and optimize the performance of the Kruize application, thereby ensuring a seamless and efficient user experience. +By conducting a thorough analysis based on these initial insights, users can effectively monitor and optimize the +performance of the Kruize application, thereby ensuring a seamless and efficient user experience. diff --git a/manifests/crc/BYODB-installation/minikube/kruize-crc-minikube.yaml b/manifests/crc/BYODB-installation/minikube/kruize-crc-minikube.yaml index 9df33af34..abda3c3f3 100644 --- a/manifests/crc/BYODB-installation/minikube/kruize-crc-minikube.yaml +++ b/manifests/crc/BYODB-installation/minikube/kruize-crc-minikube.yaml @@ -32,7 +32,7 @@ data: "monitoringendpoint": "prometheus-k8s", "savetodb": "true", "dbdriver": "jdbc:postgresql://", - "plots": "false", + "plots": "true", "local": "false", "logAllHttpReqAndResp": "true", "hibernate": { @@ -78,7 +78,7 @@ spec: spec: containers: - name: kruize - image: kruize/autotune_operator:0.0.21_rm + image: kruize/autotune_operator:0.0.22_rm imagePullPolicy: Always volumeMounts: - name: config-volume diff --git a/manifests/crc/BYODB-installation/openshift/kruize-crc-openshift.yaml b/manifests/crc/BYODB-installation/openshift/kruize-crc-openshift.yaml index e17500e2c..c4db437ad 100644 --- a/manifests/crc/BYODB-installation/openshift/kruize-crc-openshift.yaml +++ b/manifests/crc/BYODB-installation/openshift/kruize-crc-openshift.yaml @@ -32,7 +32,7 @@ data: "monitoringendpoint": "prometheus-k8s", "savetodb": "true", "dbdriver": "jdbc:postgresql://", - "plots": "false", + "plots": "true", "local": "false", "logAllHttpReqAndResp": "true", "hibernate": { @@ -78,7 +78,7 @@ spec: spec: containers: - name: kruize - image: kruize/autotune_operator:0.0.21_rm + image: kruize/autotune_operator:0.0.22_rm imagePullPolicy: Always volumeMounts: - name: config-volume diff --git a/manifests/crc/default-db-included-installation/minikube/kruize-crc-minikube.yaml b/manifests/crc/default-db-included-installation/minikube/kruize-crc-minikube.yaml index 8456e7608..b778eab77 100644 --- a/manifests/crc/default-db-included-installation/minikube/kruize-crc-minikube.yaml +++ b/manifests/crc/default-db-included-installation/minikube/kruize-crc-minikube.yaml @@ -110,7 +110,7 @@ data: "monitoringendpoint": "prometheus-k8s", "savetodb": "true", "dbdriver": "jdbc:postgresql://", - "plots": "false", + "plots": "true", "local": "false", "logAllHttpReqAndResp": "true", "hibernate": { @@ -165,7 +165,7 @@ spec: spec: containers: - name: kruize - image: kruize/autotune_operator:0.0.21_rm + image: kruize/autotune_operator:0.0.22_rm imagePullPolicy: Always volumeMounts: - name: config-volume @@ -230,7 +230,7 @@ spec: spec: containers: - name: kruizecronjob - image: kruize/autotune_operator:0.0.21_rm + image: kruize/autotune_operator:0.0.22_rm imagePullPolicy: Always volumeMounts: - name: config-volume @@ -356,7 +356,7 @@ spec: spec: containers: - name: kruizedeletejob - image: kruize/autotune_operator:0.0.21_rm + image: kruize/autotune_operator:0.0.22_rm imagePullPolicy: Always volumeMounts: - name: config-volume diff --git a/manifests/crc/default-db-included-installation/openshift/kruize-crc-openshift.yaml b/manifests/crc/default-db-included-installation/openshift/kruize-crc-openshift.yaml index 5932e8448..dd742a7cf 100644 --- a/manifests/crc/default-db-included-installation/openshift/kruize-crc-openshift.yaml +++ b/manifests/crc/default-db-included-installation/openshift/kruize-crc-openshift.yaml @@ -91,7 +91,7 @@ data: "monitoringendpoint": "prometheus-k8s", "savetodb": "true", "dbdriver": "jdbc:postgresql://", - "plots": "false", + "plots": "true", "local": "false", "logAllHttpReqAndResp": "true", "hibernate": { @@ -211,7 +211,7 @@ spec: serviceAccountName: kruize-sa containers: - name: kruize - image: kruize/autotune_operator:0.0.21_rm + image: kruize/autotune_operator:0.0.22_rm imagePullPolicy: Always volumeMounts: - name: config-volume @@ -283,7 +283,7 @@ spec: spec: containers: - name: kruizecronjob - image: kruize/autotune_operator:0.0.21_rm + image: kruize/autotune_operator:0.0.22_rm imagePullPolicy: Always volumeMounts: - name: config-volume @@ -324,7 +324,7 @@ spec: spec: containers: - name: kruizedeletejob - image: kruize/autotune_operator:0.0.21_rm + image: kruize/autotune_operator:0.0.22_rm imagePullPolicy: Always volumeMounts: - name: config-volume diff --git a/pom.xml b/pom.xml index 48c11eba5..fef5e3f55 100644 --- a/pom.xml +++ b/pom.xml @@ -6,7 +6,7 @@ org.autotune autotune - 0.0.21_mvp + 0.0.22_mvp 4.13.2 20240303 diff --git a/src/main/java/com/autotune/analyzer/plots/PlotManager.java b/src/main/java/com/autotune/analyzer/plots/PlotManager.java index 2b3d867c1..16a5efc2c 100644 --- a/src/main/java/com/autotune/analyzer/plots/PlotManager.java +++ b/src/main/java/com/autotune/analyzer/plots/PlotManager.java @@ -1,16 +1,19 @@ package com.autotune.analyzer.plots; +import com.autotune.analyzer.recommendations.model.CostBasedRecommendationModel; import com.autotune.analyzer.recommendations.term.Terms; import com.autotune.analyzer.utils.AnalyzerConstants; -import com.autotune.common.data.metrics.MetricResults; import com.autotune.common.data.result.IntervalResults; import com.autotune.common.utils.CommonUtils; +import com.autotune.utils.KruizeConstants; +import org.json.JSONArray; +import org.json.JSONException; +import org.json.JSONObject; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Timestamp; import java.util.*; -import java.util.stream.Collectors; import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationEngine.PercentileConstants.*; @@ -35,7 +38,7 @@ public PlotData.PlotsData generatePlots() { sortedResultsHashMap.putAll(containerResultsMap); // Retrieve entries within the specified range - Map resultInRange = sortedResultsHashMap.subMap(monitoringEndTime, true, monitoringStartTime, true); + Map resultInRange = sortedResultsHashMap.subMap(monitoringEndTime, true, monitoringStartTime, false); int delimiterNumber = (int) (resultInRange.size() / recommendationTerm.getPlots_datapoints()); @@ -58,8 +61,10 @@ public PlotData.PlotsData generatePlots() { calendar.add(Calendar.MILLISECOND, (int) millisecondsToAdd); // Convert the modified Calendar back to a Timestamp Timestamp newTimestamp = new Timestamp(calendar.getTimeInMillis()); - PlotData.UsageData cpuUsage = getUsageData(sortedResultsHashMap.subMap(newTimestamp, true, incrementStartTime, true), AnalyzerConstants.MetricName.cpuUsage, "cores"); - PlotData.UsageData memoryUsage = getUsageData(sortedResultsHashMap.subMap(newTimestamp, true, incrementStartTime, true), AnalyzerConstants.MetricName.memoryUsage, "MiB"); + PlotData.UsageData cpuUsage = getUsageData(sortedResultsHashMap.subMap(newTimestamp, true, + incrementStartTime,false), AnalyzerConstants.MetricName.cpuUsage); + PlotData.UsageData memoryUsage = getUsageData(sortedResultsHashMap.subMap(newTimestamp, true, + incrementStartTime, false), AnalyzerConstants.MetricName.memoryUsage); plotsDataMap.put(newTimestamp, new PlotData.PlotPoint(cpuUsage, memoryUsage)); incrementStartTime = newTimestamp; } @@ -67,28 +72,80 @@ public PlotData.PlotsData generatePlots() { return new PlotData.PlotsData(recommendationTerm.getPlots_datapoints(), plotsDataMap); } - PlotData.UsageData getUsageData(Map resultInRange, AnalyzerConstants.MetricName metricName, String format) { - // Extract CPU values - List cpuValues = resultInRange.values().stream() - .filter(intervalResults -> intervalResults.getMetricResultsMap().containsKey(metricName)) - .mapToDouble(intervalResults -> { - MetricResults metricResults = intervalResults.getMetricResultsMap().get(metricName); - return (metricResults != null && metricResults.getAggregationInfoResult() != null) ? metricResults.getAggregationInfoResult().getSum() : 0.0; - }) - .boxed() // Convert double to Double - .collect(Collectors.toList()); - if (cpuValues.size() > 0) { - double q1 = CommonUtils.percentile(TWENTYFIVE_PERCENTILE, cpuValues); - double q3 = CommonUtils.percentile(SEVENTYFIVE_PERCENTILE, cpuValues); - double median = CommonUtils.percentile(FIFTY_PERCENTILE, cpuValues); - // Find max and min - double max = Collections.max(cpuValues); - double min = Collections.min(cpuValues); - return new PlotData.UsageData(min, q1, median, q3, max, format); - } else { - return null; + PlotData.UsageData getUsageData(Map resultInRange, AnalyzerConstants.MetricName metricName) { + // stream through the results value and extract the CPU values + try { + if (metricName.equals(AnalyzerConstants.MetricName.cpuUsage)) { + JSONArray cpuValues = CostBasedRecommendationModel.getCPUUsageList(resultInRange); + LOGGER.debug("cpuValues : {}", cpuValues); + if (!cpuValues.isEmpty()) { + // Extract "max" values from cpuUsageList + List cpuMaxValues = new ArrayList<>(); + List cpuMinValues = new ArrayList<>(); + for (int i = 0; i < cpuValues.length(); i++) { + JSONObject jsonObject = cpuValues.getJSONObject(i); + double maxValue = jsonObject.getDouble(KruizeConstants.JSONKeys.MAX); + double minValue = jsonObject.getDouble(KruizeConstants.JSONKeys.MIN); + cpuMaxValues.add(maxValue); + cpuMinValues.add(minValue); + } + LOGGER.debug("cpuMaxValues : {}", cpuMaxValues); + LOGGER.debug("cpuMinValues : {}", cpuMinValues); + return getPercentileData(cpuMaxValues, cpuMinValues, resultInRange, metricName); + } + + } else { + // loop through the results value and extract the memory values + CostBasedRecommendationModel costBasedRecommendationModel = new CostBasedRecommendationModel(); + List memUsageMinList = new ArrayList<>(); + List memUsageMaxList = new ArrayList<>(); + boolean memDataAvailable = false; + for (IntervalResults intervalResults: resultInRange.values()) { + JSONObject jsonObject = costBasedRecommendationModel.calculateMemoryUsage(intervalResults); + if (!jsonObject.isEmpty()) { + memDataAvailable = true; + Double memUsageMax = jsonObject.getDouble(KruizeConstants.JSONKeys.MAX); + Double memUsageMin = jsonObject.getDouble(KruizeConstants.JSONKeys.MIN); + memUsageMaxList.add(memUsageMax); + memUsageMinList.add(memUsageMin); + } + } + LOGGER.debug("memValues Max : {}, Min : {}", memUsageMaxList, memUsageMinList); + if (memDataAvailable) + return getPercentileData(memUsageMaxList, memUsageMinList, resultInRange, metricName); + } + } catch (JSONException e) { + LOGGER.error("Exception occurred while extracting metric values: {}", e.getMessage()); } + return null; + } - + private PlotData.UsageData getPercentileData(List metricValuesMax, List metricValuesMin, Map resultInRange, AnalyzerConstants.MetricName metricName) { + try { + if (!metricValuesMax.isEmpty()) { + double q1 = CommonUtils.percentile(TWENTYFIVE_PERCENTILE, metricValuesMax); + double q3 = CommonUtils.percentile(SEVENTYFIVE_PERCENTILE, metricValuesMax); + double median = CommonUtils.percentile(FIFTY_PERCENTILE, metricValuesMax); + // Find max and min + double max = Collections.max(metricValuesMax); + double min; + // check for non zero values + boolean nonZeroCheck = metricValuesMin.stream().noneMatch(value -> value.equals(0.0)); + if (nonZeroCheck) { + min = Collections.min(metricValuesMin); + } else { + min = 0.0; + } + + LOGGER.debug("q1 : {}, q3 : {}, median : {}, max : {}, min : {}", q1, q3, median, max, min); + String format = CostBasedRecommendationModel.getFormatValue(resultInRange, metricName); + return new PlotData.UsageData(min, q1, median, q3, max, format); + } else { + return null; + } + } catch (Exception e) { + LOGGER.error("Exception occurred while generating percentiles: {}", e.getMessage()); + } + return null; } } diff --git a/src/main/java/com/autotune/analyzer/recommendations/engine/RecommendationEngine.java b/src/main/java/com/autotune/analyzer/recommendations/engine/RecommendationEngine.java index 322a7c087..e1c9ec837 100644 --- a/src/main/java/com/autotune/analyzer/recommendations/engine/RecommendationEngine.java +++ b/src/main/java/com/autotune/analyzer/recommendations/engine/RecommendationEngine.java @@ -31,11 +31,13 @@ import com.autotune.operator.KruizeDeploymentInfo; import com.autotune.utils.GenericRestApiClient; import com.autotune.utils.KruizeConstants; +import com.autotune.utils.MetricsConfig; import com.autotune.utils.Utils; import com.google.gson.Gson; import com.google.gson.JsonArray; import com.google.gson.JsonElement; import com.google.gson.JsonObject; +import io.micrometer.core.instrument.Timer; import org.json.JSONObject; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -586,13 +588,26 @@ private boolean generateRecommendationsBasedOnTerms(ContainerData containerData, mappedRecommendationForTerm.addNotification(recommendationNotification); } mappedRecommendationForTerm.setMonitoringStartTime(monitoringStartTime); - } - Terms.setDurationBasedOnTerm(containerData, mappedRecommendationForTerm, recommendationTerm); - if (KruizeDeploymentInfo.plots == true) { - if (null != monitoringStartTime) { - mappedRecommendationForTerm.setPlots(new PlotManager(containerData.getResults(), terms, monitoringStartTime, monitoringEndTime).generatePlots()); + // generate plots when minimum data is available for the term + if (KruizeDeploymentInfo.plots) { + if (null != monitoringStartTime) { + Timer.Sample timerBoxPlots = null; + String status = "success"; // TODO avoid this constant at multiple place + try { + timerBoxPlots = Timer.start(MetricsConfig.meterRegistry()); + mappedRecommendationForTerm.setPlots(new PlotManager(containerData.getResults(), terms, monitoringStartTime, monitoringEndTime).generatePlots()); + } catch (Exception e) { + status = String.format("Box plots Failed due to - %s", e.getMessage()); + } finally { + if (timerBoxPlots != null) { + MetricsConfig.timerBoxPlots = MetricsConfig.timerBBoxPlots.tag("status", status).register(MetricsConfig.meterRegistry()); + timerBoxPlots.stop(MetricsConfig.timerBoxPlots); + } + } + } } } + Terms.setDurationBasedOnTerm(containerData, mappedRecommendationForTerm, recommendationTerm); timestampRecommendation.setRecommendationForTermHashMap(recommendationTerm, mappedRecommendationForTerm); } @@ -1407,7 +1422,7 @@ private String getResults(Map mainKruizeExperimentMAP, Kru * @param interval_start_time The start time of the interval for fetching metrics. * @param dataSourceInfo The datasource object to fetch metrics from. * @throws Exception if an error occurs during the fetching process. - * TODO: Need to add right abstractions for this + * TODO: Need to add right abstractions for this */ public void fetchMetricsBasedOnDatasource(KruizeObject kruizeObject, Timestamp interval_end_time, Timestamp interval_start_time, DataSourceInfo dataSourceInfo) throws Exception { try { @@ -1492,10 +1507,10 @@ public void fetchMetricsBasedOnDatasource(KruizeObject kruizeObject, Timestamp i if (secondMethodName.equals(KruizeConstants.JSONKeys.SUM)) secondMethodName = KruizeConstants.JSONKeys.AVG; promQL = String.format(metricEntry.getValue(), methodName, secondMethodName, namespace, containerName, measurementDurationMinutesInDouble.intValue()); - format = KruizeConstants.JSONKeys.GIBIBYTE; + format = KruizeConstants.JSONKeys.BYTES; } else if (metricEntry.getKey() == AnalyzerConstants.MetricName.memoryLimit || metricEntry.getKey() == AnalyzerConstants.MetricName.memoryRequest) { promQL = String.format(metricEntry.getValue(), methodName, namespace, containerName); - format = KruizeConstants.JSONKeys.GIBIBYTE; + format = KruizeConstants.JSONKeys.BYTES; } // If promQL is determined, fetch metrics from the datasource if (promQL != null) { @@ -1570,7 +1585,8 @@ public void fetchMetricsBasedOnDatasource(KruizeObject kruizeObject, Timestamp i } } containerData.setResults(containerDataResults); - setInterval_end_time(Collections.max(containerDataResults.keySet())); //TODO Temp fix invalide date is set if experiment having two container with different last seen date + if (containerDataResults.size() > 0) + setInterval_end_time(Collections.max(containerDataResults.keySet())); //TODO Temp fix invalide date is set if experiment having two container with different last seen date } } } catch (Exception e) { diff --git a/src/main/java/com/autotune/analyzer/recommendations/model/CostBasedRecommendationModel.java b/src/main/java/com/autotune/analyzer/recommendations/model/CostBasedRecommendationModel.java index c9c72a51d..ae506b2e0 100644 --- a/src/main/java/com/autotune/analyzer/recommendations/model/CostBasedRecommendationModel.java +++ b/src/main/java/com/autotune/analyzer/recommendations/model/CostBasedRecommendationModel.java @@ -8,12 +8,17 @@ import com.autotune.common.data.metrics.MetricResults; import com.autotune.common.data.result.IntervalResults; import com.autotune.common.utils.CommonUtils; +import com.autotune.utils.KruizeConstants; +import org.json.JSONArray; +import org.json.JSONObject; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Timestamp; import java.util.*; import java.util.stream.Collectors; +import java.util.stream.IntStream; +import java.util.stream.Stream; import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationEngine.PercentileConstants.COST_CPU_PERCENTILE; import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationEngine.PercentileConstants.COST_MEMORY_PERCENTILE; @@ -42,49 +47,21 @@ public RecommendationConfigItem getCPURequestRecommendation(Map cpuUsageList = filteredResultsMap.values() - .stream() - .map(e -> { - Optional cpuUsageResults = Optional.ofNullable(e.getMetricResultsMap().get(AnalyzerConstants.MetricName.cpuUsage)); - Optional cpuThrottleResults = Optional.ofNullable(e.getMetricResultsMap().get(AnalyzerConstants.MetricName.cpuThrottle)); - double cpuUsageAvg = cpuUsageResults.map(m -> m.getAggregationInfoResult().getAvg()).orElse(0.0); - double cpuUsageMax = cpuUsageResults.map(m -> m.getAggregationInfoResult().getMax()).orElse(0.0); - double cpuUsageSum = cpuUsageResults.map(m -> m.getAggregationInfoResult().getSum()).orElse(0.0); - double cpuThrottleAvg = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getAvg()).orElse(0.0); - double cpuThrottleMax = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getMax()).orElse(0.0); - double cpuThrottleSum = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getSum()).orElse(0.0); - double cpuRequestInterval = 0.0; - double cpuUsagePod = 0; - int numPods = 0; - - // Use the Max value when available, if not use the Avg - double cpuUsage = (cpuUsageMax > 0) ? cpuUsageMax : cpuUsageAvg; - double cpuThrottle = (cpuThrottleMax > 0) ? cpuThrottleMax : cpuThrottleAvg; - double cpuUsageTotal = cpuUsage + cpuThrottle; - - // Usage is less than 1 core, set it to the observed value. - if (CPU_ONE_CORE > cpuUsageTotal) { - cpuRequestInterval = cpuUsageTotal; - } else { - // Sum/Avg should give us the number of pods - if (0 != cpuUsageAvg) { - numPods = (int) Math.ceil(cpuUsageSum / cpuUsageAvg); - if (0 < numPods) { - cpuUsagePod = (cpuUsageSum + cpuThrottleSum) / numPods; - } - } - cpuRequestInterval = Math.max(cpuUsagePod, cpuUsageTotal); - } - return cpuRequestInterval; - }) - .collect(Collectors.toList()); + JSONArray cpuUsageList = getCPUUsageList(filteredResultsMap); + // Extract 'max' values from cpuUsageList + List cpuMaxValues = new ArrayList<>(); + for (int i = 0; i < cpuUsageList.length(); i++) { + JSONObject jsonObject = cpuUsageList.getJSONObject(i); + double maxValue = jsonObject.getDouble(KruizeConstants.JSONKeys.MAX); + cpuMaxValues.add(maxValue); + } - Double cpuRequest = 0.0; - Double cpuRequestMax = Collections.max(cpuUsageList); + Double cpuRequest; + Double cpuRequestMax = Collections.max(cpuMaxValues); if (null != cpuRequestMax && CPU_ONE_CORE > cpuRequestMax) { cpuRequest = cpuRequestMax; } else { - cpuRequest = CommonUtils.percentile(COST_CPU_PERCENTILE, cpuUsageList); + cpuRequest = CommonUtils.percentile(COST_CPU_PERCENTILE, cpuMaxValues); } // TODO: This code below should be optimised with idle detection (0 cpu usage in recorded data) in recommendation ALGO @@ -116,23 +93,64 @@ else if (CPU_ONE_MILLICORE >= cpuRequest) { } } + format = getFormatValue(filteredResultsMap, AnalyzerConstants.MetricName.cpuUsage); + + recommendationConfigItem = new RecommendationConfigItem(cpuRequest, format); + return recommendationConfigItem; + } + + public static JSONArray getCPUUsageList(Map filteredResultsMap) { + JSONArray cpuRequestIntervalArray = new JSONArray(); for (IntervalResults intervalResults : filteredResultsMap.values()) { - MetricResults cpuUsageResults = intervalResults.getMetricResultsMap().get(AnalyzerConstants.MetricName.cpuUsage); - if (cpuUsageResults != null) { - MetricAggregationInfoResults aggregationInfoResult = cpuUsageResults.getAggregationInfoResult(); - if (aggregationInfoResult != null) { - format = aggregationInfoResult.getFormat(); - if (format != null && !format.isEmpty()) { - break; + JSONObject cpuRequestInterval = new JSONObject(); + Optional cpuUsageResults = Optional.ofNullable(intervalResults.getMetricResultsMap().get(AnalyzerConstants.MetricName.cpuUsage)); + Optional cpuThrottleResults = Optional.ofNullable(intervalResults.getMetricResultsMap().get(AnalyzerConstants.MetricName.cpuThrottle)); + double cpuUsageAvg = cpuUsageResults.map(m -> m.getAggregationInfoResult().getAvg()).orElse(0.0); + double cpuUsageMax = cpuUsageResults.map(m -> m.getAggregationInfoResult().getMax()).orElse(0.0); + double cpuUsageSum = cpuUsageResults.map(m -> m.getAggregationInfoResult().getSum()).orElse(0.0); + double cpuUsageMin = cpuUsageResults.map(m -> m.getAggregationInfoResult().getMin()).orElse(0.0); + double cpuThrottleAvg = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getAvg()).orElse(0.0); + double cpuThrottleMax = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getMax()).orElse(0.0); + double cpuThrottleSum = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getSum()).orElse(0.0); + double cpuThrottleMin = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getMin()).orElse(0.0); + + double cpuRequestIntervalMax; + double cpuRequestIntervalMin; + double cpuUsagePod = 0; + int numPods; + + // Use the Max value when available, if not use the Avg + double cpuUsage = (cpuUsageMax > 0) ? cpuUsageMax : cpuUsageAvg; + double cpuThrottle = (cpuThrottleMax > 0) ? cpuThrottleMax : cpuThrottleAvg; + double cpuUsageTotal = cpuUsage + cpuThrottle; + + // Usage is less than 1 core, set it to the observed value. + if (CPU_ONE_CORE > cpuUsageTotal) { + cpuRequestIntervalMax = cpuUsageTotal; + } else { + // Sum/Avg should give us the number of pods + if (0 != cpuUsageAvg) { + numPods = (int) Math.ceil(cpuUsageSum / cpuUsageAvg); + if (0 < numPods) { + cpuUsagePod = (cpuUsageSum + cpuThrottleSum) / numPods; } } + cpuRequestIntervalMax = Math.max(cpuUsagePod, cpuUsageTotal); } + double cpuMinTotal = cpuUsageMin + cpuThrottleMin; + // traverse over a stream of positive values and find the minimum value + cpuRequestIntervalMin = Stream.of(cpuUsagePod, cpuUsageTotal, cpuMinTotal) + .filter(value -> value > 0.0) + .min(Double::compare) + .orElse(0.0); + + cpuRequestInterval.put(KruizeConstants.JSONKeys.MIN, cpuRequestIntervalMin); + cpuRequestInterval.put(KruizeConstants.JSONKeys.MAX, cpuRequestIntervalMax); + LOGGER.debug("cpuRequestInterval : {}", cpuRequestInterval); + cpuRequestIntervalArray.put(cpuRequestInterval); } - - recommendationConfigItem = new RecommendationConfigItem(cpuRequest, format); - return recommendationConfigItem; + return cpuRequestIntervalArray; } - @Override public RecommendationConfigItem getMemoryRequestRecommendation(Map filteredResultsMap, ArrayList notifications) { @@ -143,10 +161,13 @@ public RecommendationConfigItem getMemoryRequestRecommendation(Map memUsageList = filteredResultsMap.values() - .stream() - .map(CostBasedRecommendationModel::calculateMemoryUsage) - .collect(Collectors.toList()); + CostBasedRecommendationModel costBasedRecommendationModel = new CostBasedRecommendationModel(); + List memUsageList = new ArrayList<>(); + for (IntervalResults intervalResults: filteredResultsMap.values()) { + JSONObject jsonObject = costBasedRecommendationModel.calculateMemoryUsage(intervalResults); + Double memUsage = jsonObject.getDouble(KruizeConstants.JSONKeys.MAX); + memUsageList.add(memUsage); + } List spikeList = filteredResultsMap.values() .stream() @@ -169,8 +190,16 @@ public RecommendationConfigItem getMemoryRequestRecommendation(Map filteredResultsMap, AnalyzerConstants.MetricName metricName) { + String format = ""; for (IntervalResults intervalResults : filteredResultsMap.values()) { - MetricResults memoryUsageResults = intervalResults.getMetricResultsMap().get(AnalyzerConstants.MetricName.memoryUsage); + MetricResults memoryUsageResults = intervalResults.getMetricResultsMap().get(metricName); if (memoryUsageResults != null) { MetricAggregationInfoResults aggregationInfoResult = memoryUsageResults.getAggregationInfoResult(); if (aggregationInfoResult != null) { @@ -181,9 +210,7 @@ public RecommendationConfigItem getMemoryRequestRecommendation(Map cpuUsageResults = Optional.ofNullable(intervalResults.getMetricResultsMap().get(AnalyzerConstants.MetricName.cpuUsage)); double cpuUsageAvg = cpuUsageResults.map(m -> m.getAggregationInfoResult().getAvg()).orElse(0.0); double cpuUsageSum = cpuUsageResults.map(m -> m.getAggregationInfoResult().getSum()).orElse(0.0); Optional memoryUsageResults = Optional.ofNullable(intervalResults.getMetricResultsMap().get(AnalyzerConstants.MetricName.memoryUsage)); double memUsageAvg = memoryUsageResults.map(m -> m.getAggregationInfoResult().getAvg()).orElse(0.0); double memUsageMax = memoryUsageResults.map(m -> m.getAggregationInfoResult().getMax()).orElse(0.0); + double memUsageMin = memoryUsageResults.map(m -> m.getAggregationInfoResult().getMin()).orElse(0.0); double memUsageSum = memoryUsageResults.map(m -> m.getAggregationInfoResult().getSum()).orElse(0.0); double memUsage = 0; int numPods = 0; @@ -216,9 +246,18 @@ private static double calculateMemoryUsage(IntervalResults intervalResults) { if (0 < numPods) { memUsage = (memUsageSum / numPods); } - memUsage = Math.max(memUsage, memUsageMax); - - return memUsage; + memUsageMax = Math.max(memUsage, memUsageMax); + // traverse over a stream of positive values and find the minimum value + memUsageMin = Stream.of(memUsage, memUsageMax, memUsageMin) + .filter(value -> value > 0.0) + .min(Double::compare) + .orElse(0.0); + + jsonObject.put(KruizeConstants.JSONKeys.MIN, memUsageMin); + jsonObject.put(KruizeConstants.JSONKeys.MAX, memUsageMax); + + LOGGER.debug("memRequestInterval : {}", jsonObject); + return jsonObject; } private static double calculateIntervalSpike(IntervalResults intervalResults) { diff --git a/src/main/java/com/autotune/analyzer/recommendations/model/PerformanceBasedRecommendationModel.java b/src/main/java/com/autotune/analyzer/recommendations/model/PerformanceBasedRecommendationModel.java index 3409df3b3..27febf51a 100644 --- a/src/main/java/com/autotune/analyzer/recommendations/model/PerformanceBasedRecommendationModel.java +++ b/src/main/java/com/autotune/analyzer/recommendations/model/PerformanceBasedRecommendationModel.java @@ -9,12 +9,16 @@ import com.autotune.common.data.metrics.MetricResults; import com.autotune.common.data.result.IntervalResults; import com.autotune.common.utils.CommonUtils; +import com.autotune.utils.KruizeConstants; +import org.json.JSONArray; +import org.json.JSONObject; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Timestamp; import java.util.*; import java.util.stream.Collectors; +import java.util.stream.IntStream; import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationEngine.PercentileConstants.PERFORMANCE_CPU_PERCENTILE; import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationEngine.PercentileConstants.PERFORMANCE_MEMORY_PERCENTILE; @@ -45,49 +49,22 @@ public RecommendationConfigItem getCPURequestRecommendation(Map cpuUsageList = filteredResultsMap.values() - .stream() - .map(e -> { - Optional cpuUsageResults = Optional.ofNullable(e.getMetricResultsMap().get(AnalyzerConstants.MetricName.cpuUsage)); - Optional cpuThrottleResults = Optional.ofNullable(e.getMetricResultsMap().get(AnalyzerConstants.MetricName.cpuThrottle)); - double cpuUsageAvg = cpuUsageResults.map(m -> m.getAggregationInfoResult().getAvg()).orElse(0.0); - double cpuUsageMax = cpuUsageResults.map(m -> m.getAggregationInfoResult().getMax()).orElse(0.0); - double cpuUsageSum = cpuUsageResults.map(m -> m.getAggregationInfoResult().getSum()).orElse(0.0); - double cpuThrottleAvg = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getAvg()).orElse(0.0); - double cpuThrottleMax = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getMax()).orElse(0.0); - double cpuThrottleSum = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getSum()).orElse(0.0); - double cpuRequestInterval = 0.0; - double cpuUsagePod = 0; - int numPods = 0; - - // Use the Max value when available, if not use the Avg - double cpuUsage = (cpuUsageMax > 0) ? cpuUsageMax : cpuUsageAvg; - double cpuThrottle = (cpuThrottleMax > 0) ? cpuThrottleMax : cpuThrottleAvg; - double cpuUsageTotal = cpuUsage + cpuThrottle; - - // Usage is less than 1 core, set it to the observed value. - if (CPU_ONE_CORE > cpuUsageTotal) { - cpuRequestInterval = cpuUsageTotal; - } else { - // Sum/Avg should give us the number of pods - if (0 != cpuUsageAvg) { - numPods = (int) Math.ceil(cpuUsageSum / cpuUsageAvg); - if (0 < numPods) { - cpuUsagePod = (cpuUsageSum + cpuThrottleSum) / numPods; - } - } - cpuRequestInterval = Math.max(cpuUsagePod, cpuUsageTotal); - } - return cpuRequestInterval; - }) - .collect(Collectors.toList()); + JSONArray cpuUsageList = CostBasedRecommendationModel.getCPUUsageList(filteredResultsMap); + LOGGER.debug("cpuUsageList : {}", cpuUsageList); + // Extract "max" values from cpuUsageList + List cpuMaxValues = new ArrayList<>(); + for (int i = 0; i < cpuUsageList.length(); i++) { + JSONObject jsonObject = cpuUsageList.getJSONObject(i); + double maxValue = jsonObject.getDouble(KruizeConstants.JSONKeys.MAX); + cpuMaxValues.add(maxValue); + } Double cpuRequest = 0.0; - Double cpuRequestMax = Collections.max(cpuUsageList); + Double cpuRequestMax = Collections.max(cpuMaxValues); if (null != cpuRequestMax && CPU_ONE_CORE > cpuRequestMax) { cpuRequest = cpuRequestMax; } else { - cpuRequest = CommonUtils.percentile(PERFORMANCE_CPU_PERCENTILE, cpuUsageList); + cpuRequest = CommonUtils.percentile(PERFORMANCE_CPU_PERCENTILE, cpuMaxValues); } // TODO: This code below should be optimised with idle detection (0 cpu usage in recorded data) in recommendation ALGO diff --git a/src/main/java/com/autotune/utils/KruizeConstants.java b/src/main/java/com/autotune/utils/KruizeConstants.java index 29612d53a..6938b1d52 100644 --- a/src/main/java/com/autotune/utils/KruizeConstants.java +++ b/src/main/java/com/autotune/utils/KruizeConstants.java @@ -178,7 +178,7 @@ public static final class JSONKeys { public static final String MEDIAN = "median"; public static final String RANGE = "range"; public static final String CORES = "cores"; - public static final String GIBIBYTE = "GiB"; + public static final String BYTES = "bytes"; // Datasource JSON keys public static final String DATASOURCES = "datasources"; diff --git a/src/main/java/com/autotune/utils/MetricsConfig.java b/src/main/java/com/autotune/utils/MetricsConfig.java index 8baa24c5d..320eaf389 100644 --- a/src/main/java/com/autotune/utils/MetricsConfig.java +++ b/src/main/java/com/autotune/utils/MetricsConfig.java @@ -8,57 +8,56 @@ import io.micrometer.core.instrument.config.NamingConvention; import io.micrometer.prometheus.PrometheusConfig; import io.micrometer.prometheus.PrometheusMeterRegistry; -import io.micrometer.core.instrument.MeterRegistry; -import org.eclipse.jetty.util.thread.ThreadPool; public class MetricsConfig { - private static MetricsConfig INSTANCE; public static Timer timerListRec, timerListExp, timerCreateExp, timerUpdateResults, timerUpdateRecomendations; - public static Timer timerLoadRecExpName, timerLoadResultsExpName, timerLoadExpName, timerLoadRecExpNameDate; + public static Timer timerLoadRecExpName, timerLoadResultsExpName, timerLoadExpName, timerLoadRecExpNameDate, timerBoxPlots; public static Timer timerLoadAllRec, timerLoadAllExp, timerLoadAllResults; - public static Timer timerAddRecDB , timerAddResultsDB , timerAddExpDB, timerAddBulkResultsDB; - public static Timer timerAddPerfProfileDB , timerLoadPerfProfileName , timerLoadAllPerfProfiles; - public static Timer.Builder timerBListRec, timerBListExp, timerBCreateExp, timerBUpdateResults, timerBUpdateRecommendations ; - public static Timer.Builder timerBLoadRecExpName, timerBLoadResultsExpName, timerBLoadExpName, timerBLoadRecExpNameDate; + public static Timer timerAddRecDB, timerAddResultsDB, timerAddExpDB, timerAddBulkResultsDB; + public static Timer timerAddPerfProfileDB, timerLoadPerfProfileName, timerLoadAllPerfProfiles; + public static Timer.Builder timerBListRec, timerBListExp, timerBCreateExp, timerBUpdateResults, timerBUpdateRecommendations; + public static Timer.Builder timerBLoadRecExpName, timerBLoadResultsExpName, timerBLoadExpName, timerBLoadRecExpNameDate, timerBBoxPlots; public static Timer.Builder timerBLoadAllRec, timerBLoadAllExp, timerBLoadAllResults; - public static Timer.Builder timerBAddRecDB, timerBAddResultsDB , timerBAddExpDB, timerBAddBulkResultsDB; + public static Timer.Builder timerBAddRecDB, timerBAddResultsDB, timerBAddExpDB, timerBAddBulkResultsDB; public static Timer.Builder timerBAddPerfProfileDB, timerBLoadPerfProfileName, timerBLoadAllPerfProfiles; - public String API_METRIC_DESC = "Time taken for Kruize APIs"; - public String DB_METRIC_DESC = "Time taken for KruizeDB methods"; public static PrometheusMeterRegistry meterRegistry; - public static Timer timerListDS, timerImportDSMetadata; public static Timer.Builder timerBListDS, timerBImportDSMetadata; + private static MetricsConfig INSTANCE; + public String API_METRIC_DESC = "Time taken for Kruize APIs"; + public String DB_METRIC_DESC = "Time taken for KruizeDB methods"; + public String METHOD_METRIC_DESC = "Time taken for Kruize methods"; private MetricsConfig() { meterRegistry = new PrometheusMeterRegistry(PrometheusConfig.DEFAULT); meterRegistry.config().commonTags("application", "Kruize"); - timerBListRec = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api","listRecommendations").tag("method","GET"); - timerBListExp = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api","listExperiments").tag("method","GET"); - timerBCreateExp = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api","createExperiment").tag("method","POST"); - timerBUpdateResults = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api","updateResults").tag("method","POST"); - timerBUpdateRecommendations = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api","updateRecommendations").tag("method","POST"); + timerBListRec = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "listRecommendations").tag("method", "GET"); + timerBListExp = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "listExperiments").tag("method", "GET"); + timerBCreateExp = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "createExperiment").tag("method", "POST"); + timerBUpdateResults = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "updateResults").tag("method", "POST"); + timerBUpdateRecommendations = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "updateRecommendations").tag("method", "POST"); - timerBLoadRecExpName = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadRecommendationsByExperimentName"); - timerBLoadRecExpNameDate = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadRecommendationsByExperimentNameAndDate"); - timerBLoadResultsExpName = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadResultsByExperimentName"); - timerBLoadExpName = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadExperimentByName"); - timerBLoadAllRec = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadAllRecommendations"); - timerBLoadAllExp = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadAllExperiments"); - timerBLoadAllResults = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadAllResults"); - timerBAddRecDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","addRecommendationToDB"); - timerBAddResultsDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","addResultToDB"); - timerBAddBulkResultsDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","addBulkResultsToDBAndFetchFailedResults"); - timerBAddExpDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","addExperimentToDB"); - timerBAddPerfProfileDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","addPerformanceProfileToDB"); - timerBLoadPerfProfileName = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadPerformanceProfileByName"); - timerBLoadAllPerfProfiles = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadAllPerformanceProfiles"); + timerBLoadRecExpName = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadRecommendationsByExperimentName"); + timerBLoadRecExpNameDate = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadRecommendationsByExperimentNameAndDate"); + timerBLoadResultsExpName = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadResultsByExperimentName"); + timerBLoadExpName = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadExperimentByName"); + timerBLoadAllRec = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadAllRecommendations"); + timerBLoadAllExp = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadAllExperiments"); + timerBLoadAllResults = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadAllResults"); + timerBAddRecDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "addRecommendationToDB"); + timerBAddResultsDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "addResultToDB"); + timerBAddBulkResultsDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "addBulkResultsToDBAndFetchFailedResults"); + timerBAddExpDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "addExperimentToDB"); + timerBAddPerfProfileDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "addPerformanceProfileToDB"); + timerBLoadPerfProfileName = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadPerformanceProfileByName"); + timerBLoadAllPerfProfiles = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadAllPerformanceProfiles"); + timerBBoxPlots = Timer.builder("KruizeMethod").description(METHOD_METRIC_DESC).tag("method", "generatePlots"); - timerBListDS = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api","listDataSources").tag("method","GET"); - timerBImportDSMetadata = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api","importDataSourceMetadata").tag("method","POST"); - timerBImportDSMetadata = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api","importDataSourceMetadata").tag("method","GET"); + timerBListDS = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "listDataSources").tag("method", "GET"); + timerBImportDSMetadata = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "importDataSourceMetadata").tag("method", "POST"); + timerBImportDSMetadata = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "importDataSourceMetadata").tag("method", "GET"); new ClassLoaderMetrics().bindTo(meterRegistry); new ProcessorMetrics().bindTo(meterRegistry); new JvmGcMetrics().bindTo(meterRegistry); diff --git a/tests/README.md b/tests/README.md index 99a47fe2d..c5fc345dd 100644 --- a/tests/README.md +++ b/tests/README.md @@ -143,8 +143,20 @@ To run the stress test refer the Stress test [README](/tests/scripts/remote_moni To run the fault tolerant test refer the [README](/tests/scripts/remote_monitoring_tests/fault_tolerant_tests.md) +### Local monitoring tests + +Here we test Kruize [Local monitoring APIs](/design/KruizeLocalAPI.md). + +#### API tests + + The tests does the following: + - Deploys kruize in non-CRD mode using the deploy script from the autotune repo + - Validates the behaviour of list datasources, import metadata and list metadata APIs in various scenarios covering both positive and negative usecases. + + For details refer this [doc](/tests/scripts/local_monitoring_tests/Local_monitoring_tests.md) + ## Supported Clusters -- Minikube +- Minikube, Openshift ## Prerequisites for running the tests: @@ -204,6 +216,12 @@ To run remote monitoring tests, /tests/test_autotune.sh -c minikube -i kruize/autotune_operator:0.0.11_mvp --testsuite=remote_monitoring_tests --resultsdir=/home/results ``` +To run local monitoring tests, + +``` +/tests/test_autotune.sh -c minikube -i kruize/autotune_operator:0.0.21_mvp --testsuite=local_monitoring_tests --resultsdir=/home/results +``` + ## How to test a specific autotune module? To run the tests specific to a autotune module use the "testmodule" option. For example, to run all the tests for dependency analyzer module execute the below command: diff --git a/tests/scripts/common/common_functions.sh b/tests/scripts/common/common_functions.sh index 37f736475..36fde16e6 100755 --- a/tests/scripts/common/common_functions.sh +++ b/tests/scripts/common/common_functions.sh @@ -1,6 +1,6 @@ #!/bin/bash # -# Copyright (c) 2020, 2021 Red Hat, IBM Corporation and others. +# Copyright (c) 2020, 2024 Red Hat, IBM Corporation and others. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -45,7 +45,8 @@ TEST_SUITE_ARRAY=("app_autotune_yaml_tests" "autotune_id_tests" "kruize_layer_id_tests" "em_standalone_tests" -"remote_monitoring_tests") +"remote_monitoring_tests" +"local_monitoring_tests") modify_kruize_layer_tests=("add_new_tunable" "apply_null_tunable" @@ -1822,3 +1823,19 @@ function create_performance_profile() { exit 1 fi } + +# +# "local" flag is turned off by default for now. This needs to be set to true. +# +function kruize_local_patch() { + CRC_DIR="./manifests/crc/default-db-included-installation" + KRUIZE_CRC_DEPLOY_MANIFEST_OPENSHIFT="${CRC_DIR}/openshift/kruize-crc-openshift.yaml" + KRUIZE_CRC_DEPLOY_MANIFEST_MINIKUBE="${CRC_DIR}/minikube/kruize-crc-minikube.yaml" + + + if [ ${cluster_type} == "minikube" ]; then + sed -i 's/"local": "false"/"local": "true"/' ${KRUIZE_CRC_DEPLOY_MANIFEST_MINIKUBE} + elif [ ${cluster_type} == "openshift" ]; then + sed -i 's/"local": "false"/"local": "true"/' ${KRUIZE_CRC_DEPLOY_MANIFEST_OPENSHIFT} + fi +} diff --git a/tests/scripts/functional_tests.sh b/tests/scripts/functional_tests.sh index b2b086801..13a814764 100755 --- a/tests/scripts/functional_tests.sh +++ b/tests/scripts/functional_tests.sh @@ -32,6 +32,7 @@ SCRIPTS_DIR="${CURRENT_DIR}" . ${SCRIPTS_DIR}/da/kruize_layer_id_tests.sh . ${SCRIPTS_DIR}/em/em_standalone_tests.sh . ${SCRIPTS_DIR}/remote_monitoring_tests/remote_monitoring_tests.sh +. ${SCRIPTS_DIR}/local_monitoring_tests/local_monitoring_tests.sh # Iterate through the commandline options while getopts i:o:r:-: gopts diff --git a/tests/scripts/remote_monitoring_tests/helpers/__init__.py b/tests/scripts/helpers/__init__.py similarity index 100% rename from tests/scripts/remote_monitoring_tests/helpers/__init__.py rename to tests/scripts/helpers/__init__.py diff --git a/tests/scripts/remote_monitoring_tests/helpers/all_terms_list_reco_json_schema.py b/tests/scripts/helpers/all_terms_list_reco_json_schema.py similarity index 87% rename from tests/scripts/remote_monitoring_tests/helpers/all_terms_list_reco_json_schema.py rename to tests/scripts/helpers/all_terms_list_reco_json_schema.py index c55559c60..8688c9db9 100644 --- a/tests/scripts/remote_monitoring_tests/helpers/all_terms_list_reco_json_schema.py +++ b/tests/scripts/helpers/all_terms_list_reco_json_schema.py @@ -371,6 +371,49 @@ } }, "required": [] + }, + "plots": { + "type": "object", + "properties": { + "datapoints": { "type": "number" }, + "plots_data": { + "type": "object", + "patternProperties": { + "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": { + "type": "object", + "properties": { + "cpuUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + "memoryUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + }, + "required": [] + } + }, + "required": [] + } + }, + "required": ["datapoints", "plots_data"] } }, "required": [] @@ -638,6 +681,49 @@ } }, "required": [] + }, + "plots": { + "type": "object", + "properties": { + "datapoints": { "type": "number" }, + "plots_data": { + "type": "object", + "patternProperties": { + "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": { + "type": "object", + "properties": { + "cpuUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + "memoryUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + }, + "required": [] + } + }, + "required": [] + } + }, + "required": ["datapoints", "plots_data"] } }, "required": [] @@ -905,6 +991,49 @@ } }, "required": [] + }, + "plots": { + "type": "object", + "properties": { + "datapoints": { "type": "number" }, + "plots_data": { + "type": "object", + "patternProperties": { + "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": { + "type": "object", + "properties": { + "cpuUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + "memoryUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + }, + "required": [] + } + }, + "required": [] + } + }, + "required": ["datapoints", "plots_data"] } }, "required": [] diff --git a/tests/scripts/remote_monitoring_tests/helpers/fixtures.py b/tests/scripts/helpers/fixtures.py similarity index 100% rename from tests/scripts/remote_monitoring_tests/helpers/fixtures.py rename to tests/scripts/helpers/fixtures.py diff --git a/tests/scripts/local_monitoring_tests/helpers/generate_datasource_json.py b/tests/scripts/helpers/generate_datasource_json.py similarity index 85% rename from tests/scripts/local_monitoring_tests/helpers/generate_datasource_json.py rename to tests/scripts/helpers/generate_datasource_json.py index ec1c773a7..67537e601 100644 --- a/tests/scripts/local_monitoring_tests/helpers/generate_datasource_json.py +++ b/tests/scripts/helpers/generate_datasource_json.py @@ -18,7 +18,7 @@ def generate_datasource_json(csv_file, json_file): with open(json_file, 'w') as jsonfile: json.dump(datasources, jsonfile, indent=4) -csv_file_path = '../csv_data/datasources.csv' -json_file_path = '../json_files/datasources.json' +csv_file_path = '../local_monitoring_tests/csv_data/datasources.csv' +json_file_path = '../local_monitoring_tests/json_files/datasources.json' generate_datasource_json(csv_file_path, json_file_path) diff --git a/tests/scripts/remote_monitoring_tests/helpers/generate_rm_jsons.py b/tests/scripts/helpers/generate_rm_jsons.py similarity index 99% rename from tests/scripts/remote_monitoring_tests/helpers/generate_rm_jsons.py rename to tests/scripts/helpers/generate_rm_jsons.py index 833773905..1cc481a99 100644 --- a/tests/scripts/remote_monitoring_tests/helpers/generate_rm_jsons.py +++ b/tests/scripts/helpers/generate_rm_jsons.py @@ -31,7 +31,7 @@ def convert_date_format(input_date_str): output_date_str = input_date.strftime("%Y-%m-%dT%H:%M:%S.000Z") return output_date_str -def create_exp_jsons(split = False, split_count = 1, exp_json_dir = "/tmp/exp_jsons", total_exps = 10): +def create_exp_jsons(split = False, split_count = 1, exp_json_dir = "/tmp/exp_jsons", total_exps = 10, target_cluster="remote"): complete_json_data = [] single_json_data = [] multi_json_data = [] diff --git a/tests/scripts/helpers/import_metadata_json_schema.py b/tests/scripts/helpers/import_metadata_json_schema.py new file mode 100644 index 000000000..a81961494 --- /dev/null +++ b/tests/scripts/helpers/import_metadata_json_schema.py @@ -0,0 +1,36 @@ +import_metadata_json_schema = { + "type": "object", + "properties": { + "datasources": { + "type": "object", + "patternProperties": { + "^[a-zA-Z0-9_-]+$": { + "type": "object", + "properties": { + "datasource_name": { + "type": "string", + "pattern": "^[a-zA-Z0-9_-]+$" + }, + "clusters": { + "type": "object", + "patternProperties": { + "^[a-zA-Z0-9_-]+$": { + "type": "object", + "properties": { + "cluster_name": { + "type": "string", + "pattern": "^[a-zA-Z0-9_-]+$" + } + }, + "required": ["cluster_name"] + } + } + } + }, + "required": ["datasource_name", "clusters"] + } + } + } + }, + "required": ["datasources"] +} diff --git a/tests/scripts/helpers/import_metadata_json_validate.py b/tests/scripts/helpers/import_metadata_json_validate.py new file mode 100644 index 000000000..3772228ff --- /dev/null +++ b/tests/scripts/helpers/import_metadata_json_validate.py @@ -0,0 +1,68 @@ +""" +Copyright (c) 2023, 2023 Red Hat, IBM Corporation and others. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +""" +import json +import jsonschema +from jsonschema import FormatChecker +from jsonschema.exceptions import ValidationError +from helpers.import_metadata_json_schema import import_metadata_json_schema + +JSON_NULL_VALUES = ("is not of type 'string'", "is not of type 'integer'", "is not of type 'number'") +VALUE_MISSING = " cannot be empty or null!" + +def validate_import_metadata_json(import_metadata_json, json_schema): + errorMsg = "" + try: + # create a validator with the format checker + print("Validating json against the json schema...") + validator = jsonschema.Draft7Validator(json_schema, format_checker=FormatChecker()) + + # validate the JSON data against the schema + errors = "" + errors = list(validator.iter_errors(import_metadata_json)) + print("Validating json against the json schema...done") + errorMsg = validate_import_metadata_json_values(import_metadata_json) + + if errors: + custom_err = ValidationError(errorMsg) + errors.append(custom_err) + return errors + else: + return errorMsg + except ValidationError as err: + print("Received a VaidationError") + + # Check if the exception is due to empty or null required parameters and prepare the response accordingly + if any(word in err.message for word in JSON_NULL_VALUES): + errorMsg = "Parameters" + VALUE_MISSING + return errorMsg + # Modify the error response in case of additional properties error + elif str(err.message).__contains__('('): + errorMsg = str(err.message).split('(') + return errorMsg[0] + else: + return err.message + +def validate_import_metadata_json_values(metadata): + validationErrorMsg = "" + + for key in metadata.keys(): + + # Check if any of the key is empty or null + if not (str(metadata[key]) and str(metadata[key]).strip()): + validationErrorMsg = ",".join([validationErrorMsg, "Parameters" + VALUE_MISSING]) + + return validationErrorMsg.lstrip(',') + diff --git a/tests/scripts/remote_monitoring_tests/helpers/kruize.py b/tests/scripts/helpers/kruize.py similarity index 73% rename from tests/scripts/remote_monitoring_tests/helpers/kruize.py rename to tests/scripts/helpers/kruize.py index 74fc89a3b..029e6eaba 100644 --- a/tests/scripts/remote_monitoring_tests/helpers/kruize.py +++ b/tests/scripts/helpers/kruize.py @@ -239,3 +239,95 @@ def list_experiments(results=None, recommendations=None, latest=None, experiment response = requests.get(url) print("Response status code = ", response.status_code) return response + + +# Description: This function obtains the list of datasources from Kruize Autotune using datasources API +# Input Parameters: None +def list_datasources(name=None): + print("\nListing the datasources...") + query_params = {} + + if name is not None: + query_params['name'] = name + + query_string = "&".join(f"{key}={value}" for key, value in query_params.items()) + + url = URL + "/datasources" + if query_string: + url += "?" + query_string + print("URL = ", url) + response = requests.get(url) + + print("PARAMS = ", query_params) + print("Response status code = ", response.status_code) + print("\n************************************************************") + print(response.text) + print("\n************************************************************") + return response + + +# Description: This function validates the input json and imports metadata using POST dsmetadata API to Kruize Autotune +# Input Parameters: datasource input json +def import_metadata(input_json_file, invalid_header=False): + json_file = open(input_json_file, "r") + input_json = json.loads(json_file.read()) + print("\n************************************************************") + pretty_json_str = json.dumps(input_json, indent=4) + print(pretty_json_str) + print("\n************************************************************") + + # read the json + print("\nImporting the metadata...") + + url = URL + "/dsmetadata" + print("URL = ", url) + + headers = {'content-type': 'application/xml'} + if invalid_header: + print("Invalid header") + response = requests.post(url, json=input_json, headers=headers) + else: + response = requests.post(url, json=input_json) + + print("Response status code = ", response.status_code) + try: + # Parse the response content as JSON into a Python dictionary + response_json = response.json() + + # Check if the response_json is a valid JSON object or array + if isinstance(response_json, (dict, list)): + # Convert the response_json back to a JSON-formatted string with double quotes and pretty print it + pretty_response_json_str = json.dumps(response_json, indent=4) + + # Print the JSON string + print(pretty_response_json_str) + else: + print("Invalid JSON format in the response.") + print(response.text) # Print the response text as-is + except json.JSONDecodeError: + print("Response content is not valid JSON.") + print(response.text) # Print the response text as-is + return response + + +# Description: This function deletes the metadata and posts the metadata using dsmetadata API to Kruize Autotune +# Input Parameters: datasource input json +def delete_metadata(input_json_file, invalid_header=False): + json_file = open(input_json_file, "r") + input_json = json.loads(json_file.read()) + + print("\nDeleting the metadata...") + + url = URL + "/dsmetadata" + print("URL = ", url) + + headers = {'content-type': 'application/xml'} + if invalid_header: + print("Invalid header") + response = requests.delete(url, json=input_json, headers=headers) + else: + response = requests.delete(url, json=input_json) + + print(response) + print("Response status code = ", response.status_code) + return response \ No newline at end of file diff --git a/tests/scripts/helpers/list_datasources_json_schema.py b/tests/scripts/helpers/list_datasources_json_schema.py new file mode 100644 index 000000000..3b14c4069 --- /dev/null +++ b/tests/scripts/helpers/list_datasources_json_schema.py @@ -0,0 +1,34 @@ +list_datasources_json_schema = { + "type": "object", + "properties": { + "version": { + "type": "string" + }, + "datasources": { + "type": "array", + "items": { + "type": "object", + "properties": { + "name": { + "type": "string" + }, + "provider": { + "type": "string" + }, + "serviceName": { + "type": "string" + }, + "namespace": { + "type": "string" + }, + "url": { + "type": "string", + "format": "uri" + } + }, + "required": ["name", "provider", "serviceName", "namespace", "url"] + } + } + }, + "required": ["version", "datasources"] +} diff --git a/tests/scripts/helpers/list_datasources_json_validate.py b/tests/scripts/helpers/list_datasources_json_validate.py new file mode 100644 index 000000000..d5e538625 --- /dev/null +++ b/tests/scripts/helpers/list_datasources_json_validate.py @@ -0,0 +1,81 @@ +""" +Copyright (c) 2023, 2023 Red Hat, IBM Corporation and others. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +""" +import json +import jsonschema +from jsonschema import FormatChecker +from jsonschema.exceptions import ValidationError +from helpers.list_datasources_json_schema import list_datasources_json_schema + +#TODO - currently only prometheus datasurce provider is supported +DATASOURCE_TYPE_SUPPORTED = "prometheus" + +JSON_NULL_VALUES = ("is not of type 'string'", "is not of type 'integer'", "is not of type 'number'") +VALUE_MISSING = " cannot be empty or null!" + +def validate_list_datasources_json(list_datasources_json, json_schema): + errorMsg = "" + try: + # create a validator with the format checker + print("Validating json against the json schema...") + validator = jsonschema.Draft7Validator(json_schema, format_checker=FormatChecker()) + + # validate the JSON data against the schema + errors = "" + errors = list(validator.iter_errors(list_datasources_json)) + print("Validating json against the json schema...done") + errorMsg = validate_list_datasources_json_values(list_datasources_json) + + if errors: + custom_err = ValidationError(errorMsg) + errors.append(custom_err) + return errors + else: + return errorMsg + except ValidationError as err: + print("Received a VaidationError") + + # Check if the exception is due to empty or null required parameters and prepare the response accordingly + if any(word in err.message for word in JSON_NULL_VALUES): + errorMsg = "Parameters" + VALUE_MISSING + return errorMsg + # Modify the error response in case of additional properties error + elif str(err.message).__contains__('('): + errorMsg = str(err.message).split('(') + return errorMsg[0] + else: + return err.message + +def validate_list_datasources_json_values(list_datasources_json): + validationErrorMsg = "" + obj_arr = ["datasources"] + + for key in list_datasources_json.keys(): + + # Check if any of the key is empty or null + if not (str(list_datasources_json[key]) and str(list_datasources_json[key]).strip()): + validationErrorMsg = ",".join([validationErrorMsg, "Parameters" + VALUE_MISSING]) + + for obj in obj_arr: + if obj == key: + for subkey in list_datasources_json[key][0].keys(): + # Check if any of the key is empty or null + if not (str(list_datasources_json[key][0][subkey]) and str(list_datasources_json[key][0][subkey]).strip()): + print(f"FAILED - {str(list_datasources_json[key][0][subkey])} is empty or null") + validationErrorMsg = ",".join([validationErrorMsg, "Parameters" + VALUE_MISSING]) + elif str(subkey) == "provider" and str(list_datasources_json[key][0][subkey]) not in DATASOURCE_TYPE_SUPPORTED: + validationErrorMsg = ",".join([validationErrorMsg, DATASOURCE_TYPE_SUPPORTED]) + + return validationErrorMsg.lstrip(',') diff --git a/tests/scripts/remote_monitoring_tests/helpers/list_reco_json_schema.py b/tests/scripts/helpers/list_reco_json_schema.py similarity index 89% rename from tests/scripts/remote_monitoring_tests/helpers/list_reco_json_schema.py rename to tests/scripts/helpers/list_reco_json_schema.py index cc535da27..24d7ff61c 100644 --- a/tests/scripts/remote_monitoring_tests/helpers/list_reco_json_schema.py +++ b/tests/scripts/helpers/list_reco_json_schema.py @@ -371,6 +371,49 @@ } }, "required": [] + }, + "plots": { + "type": "object", + "properties": { + "datapoints": { "type": "number" }, + "plots_data": { + "type": "object", + "patternProperties": { + "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": { + "type": "object", + "properties": { + "cpuUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + "memoryUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + }, + "required": [] + } + }, + "required": [] + } + }, + "required": ["datapoints", "plots_data"] } }, "required": [] diff --git a/tests/scripts/remote_monitoring_tests/helpers/list_reco_json_validate.py b/tests/scripts/helpers/list_reco_json_validate.py similarity index 100% rename from tests/scripts/remote_monitoring_tests/helpers/list_reco_json_validate.py rename to tests/scripts/helpers/list_reco_json_validate.py diff --git a/tests/scripts/remote_monitoring_tests/helpers/long_term_list_reco_json_schema.py b/tests/scripts/helpers/long_term_list_reco_json_schema.py similarity index 89% rename from tests/scripts/remote_monitoring_tests/helpers/long_term_list_reco_json_schema.py rename to tests/scripts/helpers/long_term_list_reco_json_schema.py index 6f6ce48bf..abe4d2a9a 100644 --- a/tests/scripts/remote_monitoring_tests/helpers/long_term_list_reco_json_schema.py +++ b/tests/scripts/helpers/long_term_list_reco_json_schema.py @@ -409,6 +409,49 @@ } }, "required": [] + }, + "plots": { + "type": "object", + "properties": { + "datapoints": { "type": "number" }, + "plots_data": { + "type": "object", + "patternProperties": { + "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": { + "type": "object", + "properties": { + "cpuUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + "memoryUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + }, + "required": [] + } + }, + "required": [] + } + }, + "required": ["datapoints", "plots_data"] } }, "required": [] diff --git a/tests/scripts/remote_monitoring_tests/helpers/medium_and_long_term_list_reco_json_schema.py b/tests/scripts/helpers/medium_and_long_term_list_reco_json_schema.py similarity index 88% rename from tests/scripts/remote_monitoring_tests/helpers/medium_and_long_term_list_reco_json_schema.py rename to tests/scripts/helpers/medium_and_long_term_list_reco_json_schema.py index 3aeee3123..9f6ae28df 100644 --- a/tests/scripts/remote_monitoring_tests/helpers/medium_and_long_term_list_reco_json_schema.py +++ b/tests/scripts/helpers/medium_and_long_term_list_reco_json_schema.py @@ -390,6 +390,49 @@ } }, "required": [] + }, + "plots": { + "type": "object", + "properties": { + "datapoints": { "type": "number" }, + "plots_data": { + "type": "object", + "patternProperties": { + "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": { + "type": "object", + "properties": { + "cpuUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + "memoryUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + }, + "required": [] + } + }, + "required": [] + } + }, + "required": ["datapoints", "plots_data"] } }, "required": [] @@ -657,6 +700,49 @@ } }, "required": [] + }, + "plots": { + "type": "object", + "properties": { + "datapoints": { "type": "number" }, + "plots_data": { + "type": "object", + "patternProperties": { + "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": { + "type": "object", + "properties": { + "cpuUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + "memoryUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + }, + "required": [] + } + }, + "required": [] + } + }, + "required": ["datapoints", "plots_data"] } }, "required": [] diff --git a/tests/scripts/remote_monitoring_tests/helpers/medium_term_list_reco_json_schema.py b/tests/scripts/helpers/medium_term_list_reco_json_schema.py similarity index 89% rename from tests/scripts/remote_monitoring_tests/helpers/medium_term_list_reco_json_schema.py rename to tests/scripts/helpers/medium_term_list_reco_json_schema.py index 0552d8d94..e04cb4977 100644 --- a/tests/scripts/remote_monitoring_tests/helpers/medium_term_list_reco_json_schema.py +++ b/tests/scripts/helpers/medium_term_list_reco_json_schema.py @@ -390,6 +390,49 @@ } }, "required": [] + }, + "plots": { + "type": "object", + "properties": { + "datapoints": { "type": "number" }, + "plots_data": { + "type": "object", + "patternProperties": { + "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": { + "type": "object", + "properties": { + "cpuUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + "memoryUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + }, + "required": [] + } + }, + "required": [] + } + }, + "required": ["datapoints", "plots_data"] } }, "required": [] diff --git a/tests/scripts/remote_monitoring_tests/helpers/short_and_long_term_list_reco_json_schema.py b/tests/scripts/helpers/short_and_long_term_list_reco_json_schema.py similarity index 88% rename from tests/scripts/remote_monitoring_tests/helpers/short_and_long_term_list_reco_json_schema.py rename to tests/scripts/helpers/short_and_long_term_list_reco_json_schema.py index 1fa01cd36..264e24e48 100644 --- a/tests/scripts/remote_monitoring_tests/helpers/short_and_long_term_list_reco_json_schema.py +++ b/tests/scripts/helpers/short_and_long_term_list_reco_json_schema.py @@ -371,6 +371,49 @@ } }, "required": [] + }, + "plots": { + "type": "object", + "properties": { + "datapoints": { "type": "number" }, + "plots_data": { + "type": "object", + "patternProperties": { + "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": { + "type": "object", + "properties": { + "cpuUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + "memoryUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + }, + "required": [] + } + }, + "required": [] + } + }, + "required": ["datapoints", "plots_data"] } }, "required": [] @@ -657,6 +700,49 @@ } }, "required": [] + }, + "plots": { + "type": "object", + "properties": { + "datapoints": { "type": "number" }, + "plots_data": { + "type": "object", + "patternProperties": { + "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": { + "type": "object", + "properties": { + "cpuUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + "memoryUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + }, + "required": [] + } + }, + "required": [] + } + }, + "required": ["datapoints", "plots_data"] } }, "required": [] diff --git a/tests/scripts/remote_monitoring_tests/helpers/short_and_medium_term_list_reco_json_schema.py b/tests/scripts/helpers/short_and_medium_term_list_reco_json_schema.py similarity index 88% rename from tests/scripts/remote_monitoring_tests/helpers/short_and_medium_term_list_reco_json_schema.py rename to tests/scripts/helpers/short_and_medium_term_list_reco_json_schema.py index d7a7b37aa..d85f8e06d 100644 --- a/tests/scripts/remote_monitoring_tests/helpers/short_and_medium_term_list_reco_json_schema.py +++ b/tests/scripts/helpers/short_and_medium_term_list_reco_json_schema.py @@ -371,6 +371,49 @@ } }, "required": [] + }, + "plots": { + "type": "object", + "properties": { + "datapoints": { "type": "number" }, + "plots_data": { + "type": "object", + "patternProperties": { + "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": { + "type": "object", + "properties": { + "cpuUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + "memoryUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + }, + "required": [] + } + }, + "required": [] + } + }, + "required": ["datapoints", "plots_data"] } }, "required": [] @@ -638,6 +681,49 @@ } }, "required": [] + }, + "plots": { + "type": "object", + "properties": { + "datapoints": { "type": "number" }, + "plots_data": { + "type": "object", + "patternProperties": { + "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": { + "type": "object", + "properties": { + "cpuUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + "memoryUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + }, + "required": [] + } + }, + "required": [] + } + }, + "required": ["datapoints", "plots_data"] } }, "required": [] diff --git a/tests/scripts/remote_monitoring_tests/helpers/short_term_list_reco_json_schema.py b/tests/scripts/helpers/short_term_list_reco_json_schema.py similarity index 89% rename from tests/scripts/remote_monitoring_tests/helpers/short_term_list_reco_json_schema.py rename to tests/scripts/helpers/short_term_list_reco_json_schema.py index be87cc882..7867ff064 100644 --- a/tests/scripts/remote_monitoring_tests/helpers/short_term_list_reco_json_schema.py +++ b/tests/scripts/helpers/short_term_list_reco_json_schema.py @@ -371,6 +371,49 @@ } }, "required": [] + }, + "plots": { + "type": "object", + "properties": { + "datapoints": { "type": "number" }, + "plots_data": { + "type": "object", + "patternProperties": { + "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": { + "type": "object", + "properties": { + "cpuUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + "memoryUsage": { + "type": "object", + "properties": { + "min": { "type": "number" }, + "q1": { "type": "number" }, + "median": { "type": "number" }, + "q3": { "type": "number" }, + "max": { "type": "number" }, + "format": { "type": "string" } + }, + "required": ["min", "q1", "median", "q3", "max", "format"] + }, + }, + "required": [] + } + }, + "required": [] + } + }, + "required": ["datapoints", "plots_data"] } }, "required": [] diff --git a/tests/scripts/remote_monitoring_tests/helpers/utils.py b/tests/scripts/helpers/utils.py similarity index 94% rename from tests/scripts/remote_monitoring_tests/helpers/utils.py rename to tests/scripts/helpers/utils.py index 4c00da192..53b0899c7 100644 --- a/tests/scripts/remote_monitoring_tests/helpers/utils.py +++ b/tests/scripts/helpers/utils.py @@ -1,5 +1,5 @@ """ -Copyright (c) 2022, 2022 Red Hat, IBM Corporation and others. +Copyright (c) 2022, 2024 Red Hat, IBM Corporation and others. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. @@ -49,6 +49,7 @@ COST_RECOMMENDATIONS_AVAILABLE = "Cost Recommendations Available" PERFORMANCE_RECOMMENDATIONS_AVAILABLE = "Performance Recommendations Available" CONTAINER_AND_EXPERIMENT_NAME = " for container : %s for experiment: %s.]" +LIST_DATASOURCES_ERROR_MSG = "Given datasource name - \" %s \" either does not exist or is not valid" # Kruize Recommendations Notification codes NOTIFICATION_CODE_FOR_RECOMMENDATIONS_AVAILABLE = "111000" @@ -137,6 +138,10 @@ MEDIUM_TERM_TEST = "medium_term_test" LONG_TERM_TEST = "long_term_test" +PLOTS = "plots" +DATA_POINTS = "datapoints" +PLOTS_DATA = "plots_data" + TERMS_NOTIFICATION_CODES = { SHORT_TERM: NOTIFICATION_CODE_FOR_SHORT_TERM_RECOMMENDATIONS_AVAILABLE, MEDIUM_TERM: NOTIFICATION_CODE_FOR_MEDIUM_TERM_RECOMMENDATIONS_AVAILABLE, @@ -213,6 +218,12 @@ "memoryRSS_format": "MiB" } +# version, datasource_name +import_metadata_test_data = { + "version": "v1.0", + "datasource_name": "prometheus-1", +} + test_type = {"blank": "", "null": "null", "invalid": "xyz"} aggr_info_keys_to_skip = ["cpuRequest_sum", "cpuRequest_avg", "cpuLimit_sum", "cpuLimit_avg", "cpuUsage_sum", "cpuUsage_max", @@ -410,13 +421,13 @@ def validate_reco_json(create_exp_json, update_results_json, list_reco_json, exp update_results_kubernetes_obj = update_results_json[0]["kubernetes_objects"][i] create_exp_kubernetes_obj = create_exp_json["kubernetes_objects"][i] list_reco_kubernetes_obj = list_reco_json["kubernetes_objects"][i] - validate_kubernetes_obj(create_exp_kubernetes_obj, update_results_kubernetes_obj, update_results_json, \ + validate_kubernetes_obj(create_exp_kubernetes_obj, update_results_kubernetes_obj, update_results_json, list_reco_kubernetes_obj, expected_duration_in_hours, test_name) else: update_results_kubernetes_obj = None create_exp_kubernetes_obj = create_exp_json["kubernetes_objects"][0] list_reco_kubernetes_obj = list_reco_json["kubernetes_objects"][0] - validate_kubernetes_obj(create_exp_kubernetes_obj, update_results_kubernetes_obj, update_results_json, \ + validate_kubernetes_obj(create_exp_kubernetes_obj, update_results_kubernetes_obj, update_results_json, list_reco_kubernetes_obj, expected_duration_in_hours, test_name) @@ -480,7 +491,8 @@ def validate_kubernetes_obj(create_exp_kubernetes_obj, update_results_kubernetes expected_duration_in_hours, test_name) -def validate_container(update_results_container, update_results_json, list_reco_container, expected_duration_in_hours, test_name): +def validate_container(update_results_container, update_results_json, list_reco_container, expected_duration_in_hours, + test_name): # Validate container image name and container name if update_results_container != None and list_reco_container != None: assert list_reco_container["container_image_name"] == update_results_container["container_image_name"], \ @@ -514,8 +526,8 @@ def validate_container(update_results_container, update_results_json, list_reco_ terms_obj = list_reco_container["recommendations"]["data"][interval_end_time]["recommendation_terms"] current_config = list_reco_container["recommendations"]["data"][interval_end_time]["current"] - duration_terms = ["short_term", "medium_term", "long_term"] - for term in duration_terms: + duration_terms = {'short_term': 4, 'medium_term': 7, 'long_term': 15} + for term in duration_terms.keys(): if check_if_recommendations_are_present(terms_obj[term]): print(f"reco present for term {term}") # Validate timestamps [deprecated as monitoring end time is moved to higher level] @@ -557,13 +569,17 @@ def validate_container(update_results_container, update_results_json, list_reco_ recommendation_engines_object = None if "recommendation_engines" in terms_obj[term]: recommendation_engines_object = terms_obj[term]["recommendation_engines"] - if None != recommendation_engines_object: + if recommendation_engines_object is not None: for engine_entry in engines_list: if engine_entry in terms_obj[term]["recommendation_engines"]: engine_obj = terms_obj[term]["recommendation_engines"][engine_entry] validate_config(engine_obj["config"], metrics) validate_variation(current_config, engine_obj["config"], engine_obj["variation"]) - + # validate Plots data + validate_plots(terms_obj, duration_terms, term) + # verify that plots isn't generated in case of no recommendations + else: + assert PLOTS not in terms_obj[term], f"Expected plots to be absent in case of no recommendations" else: data = list_reco_container["recommendations"]["data"] assert len(data) == 0, f"Data is not empty! Length of data - Actual = {len(data)} expected = 0" @@ -574,6 +590,21 @@ def validate_container(update_results_container, update_results_json, list_reco_ assert result == False, f"Recommendations notifications does not contain the expected message - {NOT_ENOUGH_DATA_MSG}" +def validate_plots(terms_obj, duration_terms, term): + plots = terms_obj[term][PLOTS] + datapoint = plots[DATA_POINTS] + plots_data = plots[PLOTS_DATA] + + assert plots is not None, f"Expected plots to be available" + assert datapoint is not None, f"Expected datapoint to be available" + # validate the count of data points for the specific term + assert datapoint == duration_terms[term], f"datapoint Expected: {duration_terms[term]}, Obtained: {datapoint}" + assert len(plots_data) == duration_terms[term], f"plots_data size Expected: {duration_terms[term]}, Obtained: {len(plots_data)}" + # TODO: validate the datapoint JSON objects + # TODO: validate the actual JSONs present, how many are empty for each term, this should be passed as an input + # TODO: validate the format value against the results metrics + + def set_duration_based_on_terms(duration_in_hours, term, interval_start_time, interval_end_time): diff = time_diff_in_hours(interval_start_time, interval_end_time) duration_in_hours += diff diff --git a/tests/scripts/local_monitoring_tests/Local_monitoring_tests.md b/tests/scripts/local_monitoring_tests/Local_monitoring_tests.md new file mode 100644 index 000000000..7d23513bf --- /dev/null +++ b/tests/scripts/local_monitoring_tests/Local_monitoring_tests.md @@ -0,0 +1,95 @@ +# **Kruize Local monitoring tests** + +Kruize Local monitoring tests validates the behaviour of [Kruize local monitoring APIs](/design/KruizeLocalAPI.md) +using various positive and negative scenarios. These tests are developed using pytest framework. + +## Tests description +### **List Datasources API tests** + +Here are the test scenarios: +- List all datasources +- List datasources with name query parameter: + - /datasources?name= +- List datasources with invalid parameter value for datasource name tested with empty, NULL or invalid. + +### **Import Metadata API tests** + +Here are the test scenarios: + +- Importing metadata for a valid datasource to the API. +- Post the same datasource again +- Test with invalid values such as blank, null or an invalid value for various keys in the dsmetadata input request json +- Validate error messages when the mandatory fields are missing + +The above tests are developed using pytest framework and the tests are run using shell script wrapper that does the following: +- Deploys kruize in non-CRD mode using the [deploy script](https://github.com/kruize/autotune/blob/master/deploy.sh) from the autotune repo +- Creates a resource optimization performance profile using the [createPerformanceProfile API](/design/PerformanceProfileAPI.md) +- Runs the above tests using pytest + +## Prerequisites for running the tests: +- Minikube setup or access to Openshift cluster +- Tools like kubectl, oc, curl, jq, python +- Various python modules pytest, json, pytest-html, requests, jinja2 + (these modules will be automatically installed while the test is run) + +## How to run the test? + +Use the below command to test : + +``` +/tests/test_autotune.sh -c minikube -r [location of benchmarks] [-i kruize image] [--tctype=functional] [--testmodule=Autotune module to be tested] [--testsuite=Group of tests that you want to perform] [--testcase=Particular test case that you want to test] [-n namespace] [--resultsdir=results directory] [--skipsetup] +``` + +Where values for test_autotune.sh are: + +``` +usage: test_autotune.sh [ -c ] : cluster type. Supported type - minikube, openshift. Default - minikube + [ -i ] : optional. Kruize docker image to be used for testing, default - kruize/autotune_operator:test + [ -r ] : Location of benchmarks. Not required for local_monitoring_tests + [ --tctype ] : optional. Testcases type to run, default is functional (runs all functional tests) + [ --testmodule ]: Module to be tested. Use testmodule=help, to list the modules to be tested + [ --testsuite ] : Testsuite to run. Use testsuite=help, to list the supported testsuites + [ --testcase ] : Testcase to run. Use testcase=help along with the testsuite name to list the supported testcases in that testsuite + [ -n ] : optional. Namespace to deploy autotune + [ --resultsdir ] : optional. Results directory location, by default it creates the results directory in current working directory + [ --skipsetup ] : optional. Specifying this option skips the Kruize setup and performance profile creation in case of local_monitoring_tests + +Note: If you want to run a particular testcase then it is mandatory to specify the testsuite +Test cases supported are sanity, negative, extended and test_e2e + +``` + +To run all the local monitoring tests, + +``` +/tests/test_autotune.sh -c minikube --testsuite=local_monitoring_tests --resultsdir=/home/results +``` + +To run only the sanity local monitoring tests, + +``` +/tests/test_autotune.sh -c minikube --testsuite=local_monitoring_tests --testcase=sanity --resultsdir=/home/results +``` + +Local monitoring tests can also be run without using the test_autotune.sh. To do this, follow the below steps: + +- Deploy Kruize using the deploy.sh from the kruize autotune repo +- Create the performance profile by using the [createPerformanceProfile API](/design/PerformanceProfileAPI.md) +- cd /tests/scripts/local_monitoring_tests +- python3 -m pip install --user -r requirements.txt +- cd rest_apis +- To run all sanity tests +``` + pytest -m sanity --html=/report.html --cluster_type +``` +- To run only sanity tests for List datasources API --cluster_type +``` + pytest -m sanity --html=/report.html test_list_datasources.py +``` +- To run only a specific test within List datasources API +``` + pytest -s test_list_datasources.py::test_list_datasources_with_name --cluster_type +``` + +Note: You can check the report.html for the results as it provides better readability + diff --git a/tests/scripts/local_monitoring_tests/conftest.py b/tests/scripts/local_monitoring_tests/conftest.py new file mode 100644 index 000000000..b03f52085 --- /dev/null +++ b/tests/scripts/local_monitoring_tests/conftest.py @@ -0,0 +1,5 @@ +def pytest_addoption(parser): + parser.addoption( + '--cluster_type', action='store', default='minikube', help='Cluster type' + ) + diff --git a/tests/scripts/local_monitoring_tests/json_files/import_metadata.json b/tests/scripts/local_monitoring_tests/json_files/import_metadata.json new file mode 100644 index 000000000..8a6ff0d0d --- /dev/null +++ b/tests/scripts/local_monitoring_tests/json_files/import_metadata.json @@ -0,0 +1,5 @@ +{ + "version": "v1.0", + "datasource_name": "prometheus-1" +} + diff --git a/tests/scripts/local_monitoring_tests/json_files/import_metadata_mandatory.json b/tests/scripts/local_monitoring_tests/json_files/import_metadata_mandatory.json new file mode 100644 index 000000000..b1b5d9ef5 --- /dev/null +++ b/tests/scripts/local_monitoring_tests/json_files/import_metadata_mandatory.json @@ -0,0 +1,4 @@ +{ + "version": "v1.0", + "datasource_name": "prometheus-1" +} diff --git a/tests/scripts/local_monitoring_tests/json_files/import_metadata_template.json b/tests/scripts/local_monitoring_tests/json_files/import_metadata_template.json new file mode 100644 index 000000000..f36ad6cf1 --- /dev/null +++ b/tests/scripts/local_monitoring_tests/json_files/import_metadata_template.json @@ -0,0 +1,4 @@ +{ + "version": "{{version}}", + "datasource_name": "{{datasource_name}}" +} diff --git a/tests/scripts/local_monitoring_tests/json_files/resource_optimization_openshift.json b/tests/scripts/local_monitoring_tests/json_files/resource_optimization_openshift.json new file mode 100644 index 000000000..6949385be --- /dev/null +++ b/tests/scripts/local_monitoring_tests/json_files/resource_optimization_openshift.json @@ -0,0 +1,194 @@ +{ + "name": "resource-optimization-openshift", + "profile_version": 1, + "k8s_type": "openshift", + "slo": { + "slo_class": "resource_usage", + "direction": "minimize", + "objective_function": { + "function_type": "source" + }, + "function_variables": [ + { + "name": "cpuRequest", + "datasource": "prometheus", + "value_type": "double", + "kubernetes_object": "container", + "aggregation_functions": [ + { + "function": "avg", + "query": "avg(kube_pod_container_resource_requests{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", container=\"$CONTAINER_NAME$\", namespace=\"$NAMESPACE\", resource=\"cpu\", unit=\"core\"})" + }, + { + "function": "sum", + "query": "sum(kube_pod_container_resource_requests{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", container=\"$CONTAINER_NAME$\", namespace=\"$NAMESPACE\", resource=\"cpu\", unit=\"core\"})" + } + ] + }, + { + "name": "cpuLimit", + "datasource": "prometheus", + "value_type": "double", + "kubernetes_object": "container", + "aggregation_functions": [ + { + "function": "avg", + "query": "avg(kube_pod_container_resource_limits{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", container=\"$CONTAINER_NAME$\", namespace=\"$NAMESPACE\", resource=\"cpu\", unit=\"core\"})" + }, + { + "function": "sum", + "query": "sum(kube_pod_container_resource_limits{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", container=\"$CONTAINER_NAME$\", namespace=\"$NAMESPACE$\", resource=\"cpu\", unit=\"core\"})" + } + ] + }, + { + "name": "cpuUsage", + "datasource": "prometheus", + "value_type": "double", + "kubernetes_object": "container", + "aggregation_functions": [ + { + "function": "avg", + "query": "avg(avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=”$CONTAINER_NAME$”}[15m]))", + "versions": "<=4.8" + }, + { + "function": "avg", + "query": "avg(avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=”$CONTAINER_NAME$”}[15m]))", + "versions": ">4.9" + }, + { + "function": "min", + "query": "min(min_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=\"$CONTAINER_NAME$\"}[15m]))", + "versions": "<=4.8" + }, + { + "function": "min", + "query": "min(min_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=\"$CONTAINER_NAME$\"}[15m]))", + "versions": ">4.9" + }, + { + "function": "max", + "query": "max(max_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=\"$CONTAINER_NAME$\"}[15m]))", + "versions": "<=4.8" + }, + { + "function": "max", + "query": "max(max_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=\"$CONTAINER_NAME$\"}[15m]))", + "versions": ">4.9" + }, + { + "function": "sum", + "query": "sum(avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=\"$CONTAINER_NAME$\"}[15m]))", + "versions": "<=4.8" + }, + { + "function": "sum", + "query": "sum(avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=\"$CONTAINER_NAME$\"}[15m]))", + "versions": ">4.9" + } + ] + }, + { + "name": "cpuThrottle", + "datasource": "prometheus", + "value_type": "double", + "kubernetes_object": "container", + "aggregation_functions": [ + { + "function": "avg", + "query": "avg(rate(container_cpu_cfs_throttled_seconds_total{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=”$CONTAINER_NAME$”}[15m]))" + }, + { + "function": "max", + "query": "max(rate(container_cpu_cfs_throttled_seconds_total{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=”$CONTAINER_NAME$”}[15m]))" + }, + { + "function": "sum", + "query": "sum(rate(container_cpu_cfs_throttled_seconds_total{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=”$CONTAINER_NAME$”}[15m]))" + } + ] + }, + { + "name": "memoryRequest", + "datasource": "prometheus", + "value_type": "double", + "kubernetes_object": "container", + "aggregation_functions": [ + { + "function": "avg", + "query": "avg(kube_pod_container_resource_requests{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", container=”$CONTAINER_NAME$”, namespace=”$NAMESPACE”, resource=\"memory\", unit=\"byte\"})" + }, + { + "function": "sum", + "query": "sum(kube_pod_container_resource_requests{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", container=”$CONTAINER_NAME$”, namespace=”$NAMESPACE”, resource=\"memory\", unit=\"byte\"})" + } + ] + }, + { + "name": "memoryLimit", + "datasource": "prometheus", + "value_type": "double", + "kubernetes_object": "container", + "aggregation_functions": [ + { + "function": "avg", + "query": "avg(kube_pod_container_resource_limits{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", container=\"$CONTAINER_NAME$\", namespace=\"$NAMESPACE\", resource=\"memory\", unit=\"byte\"})" + }, + { + "function": "sum", + "query": "sum(kube_pod_container_resource_limits{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", container=”$CONTAINER_NAME$”, namespace=”$NAMESPACE”, resource=\"memory\", unit=\"byte\"})" + } + ] + }, + { + "name": "memoryUsage", + "datasource": "prometheus", + "value_type": "double", + "kubernetes_object": "container", + "aggregation_functions": [ + { + "function": "avg", + "query": "avg(avg_over_time(container_memory_working_set_bytes{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=$NAMESPACE$, container=”$CONTAINER_NAME$”}[15m]))" + }, + { + "function": "min", + "query": "min(min_over_time(container_memory_working_set_bytes{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=$NAMESPACE$, container=\"$CONTAINER_NAME$\"}[15m]))" + }, + { + "function": "max", + "query": "max(max_over_time(container_memory_working_set_bytes{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=$NAMESPACE$, container=\"$CONTAINER_NAME$\"}[15m]))" + }, + { + "function": "sum", + "query": "sum(avg_over_time(container_memory_working_set_bytes{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=$NAMESPACE$, container=\"$CONTAINER_NAME$\"}[15m]))" + } + ] + }, + { + "name": "memoryRSS", + "datasource": "prometheus", + "value_type": "double", + "kubernetes_object": "container", + "aggregation_functions": [ + { + "function": "avg", + "query": "avg(avg_over_time(container_memory_rss{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=$NAMESPACE$, container=”$CONTAINER_NAME$”}[15m]))" + }, + { + "function": "min", + "query": "min(min_over_time(container_memory_rss{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=$NAMESPACE$, container=\"$CONTAINER_NAME$\"}[15m]))" + }, + { + "function": "max", + "query": "max(max_over_time(container_memory_rss{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=$NAMESPACE$, container=\"$CONTAINER_NAME$\"}[15m]))" + }, + { + "function": "sum", + "query": "sum(avg_over_time(container_memory_rss{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=$NAMESPACE$, container=”$CONTAINER_NAME$”}[15m]))" + } + ] + } + ] + } +} diff --git a/tests/scripts/local_monitoring_tests/local_monitoring_tests.sh b/tests/scripts/local_monitoring_tests/local_monitoring_tests.sh new file mode 100644 index 000000000..a76a2dd3d --- /dev/null +++ b/tests/scripts/local_monitoring_tests/local_monitoring_tests.sh @@ -0,0 +1,159 @@ +#!/bin/bash +# +# Copyright (c) 2024, 2024 Red Hat, IBM Corporation and others. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# +##### Script to perform basic tests for EM ##### + + +# Get the absolute path of current directory +CURRENT_DIR="$(dirname "$(realpath "$0")")" +LOCAL_MONITORING_TEST_DIR="${CURRENT_DIR}/local_monitoring_tests" + +# Source the common functions scripts +. ${LOCAL_MONITORING_TEST_DIR}/../common/common_functions.sh + +# Tests to validate Local monitoring mode in Kruize +function local_monitoring_tests() { + start_time=$(get_date) + FAILED_CASES=() + TESTS_FAILED=0 + TESTS_PASSED=0 + TESTS=0 + failed=0 + marker_options="" + ((TOTAL_TEST_SUITES++)) + + python3 --version >/dev/null 2>/dev/null + err_exit "ERROR: python3 not installed" + + target="crc" + perf_profile_json="${LOCAL_MONITORING_TEST_DIR}/json_files/resource_optimization_openshift.json" + + local_monitoring_tests=("sanity" "extended" "negative") + + # check if the test case is supported + if [ ! -z "${testcase}" ]; then + check_test_case "local_monitoring" + fi + + # create the result directory for given testsuite + echo "" + TEST_SUITE_DIR="${RESULTS}/local_monitoring_tests" + KRUIZE_SETUP_LOG="${TEST_SUITE_DIR}/kruize_setup.log" + KRUIZE_POD_LOG="${TEST_SUITE_DIR}/kruize_pod.log" + + mkdir -p ${TEST_SUITE_DIR} + + # check for 'local' flag + kruize_local_patch + + # Setup kruize + if [ ${skip_setup} -eq 0 ]; then + echo "Setting up kruize..." | tee -a ${LOG} + echo "${KRUIZE_SETUP_LOG}" + setup "${KRUIZE_POD_LOG}" >> ${KRUIZE_SETUP_LOG} 2>&1 + echo "Setting up kruize...Done" | tee -a ${LOG} + + sleep 60 + + # create performance profile + create_performance_profile ${perf_profile_json} + else + echo "Skipping kruize setup..." | tee -a ${LOG} + fi + + # If testcase is not specified run all tests + if [ -z "${testcase}" ]; then + testtorun=("${local_monitoring_tests[@]}") + else + testtorun=${testcase} + fi + + # create the result directory for given testsuite + echo "" + mkdir -p ${TEST_SUITE_DIR} + + PIP_INSTALL_LOG="${TEST_SUITE_DIR}/pip_install.log" + + echo "" + echo "Installing the required python modules..." + echo "python3 -m pip install --user -r "${LOCAL_MONITORING_TEST_DIR}/requirements.txt" > ${PIP_INSTALL_LOG}" + #removing --user flag as facing error: "Can not perform a '--user' install. User site-packages are not visible in this virtualenv." + python3 -m pip install -r "${LOCAL_MONITORING_TEST_DIR}/requirements.txt" > ${PIP_INSTALL_LOG} 2>&1 + err_exit "ERROR: Installing python modules for the test run failed!" + + echo "" + echo "******************* Executing test suite ${FUNCNAME} ****************" + echo "" + + for test in "${testtorun[@]}" + do + TEST_DIR="${TEST_SUITE_DIR}/${test}" + mkdir ${TEST_DIR} + LOG="${TEST_DIR}/${test}.log" + + echo "" + echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" | tee -a ${LOG} + echo " Running Test ${test}" | tee -a ${LOG} + echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"| tee -a ${LOG} + + echo " " | tee -a ${LOG} + echo "Test description: ${local_monitoring_test_description[$test]}" | tee -a ${LOG} + echo " " | tee -a ${LOG} + + pushd ${LOCAL_MONITORING_TEST_DIR}/rest_apis > /dev/null + echo "pytest -m ${test} --junitxml=${TEST_DIR}/report-${test}.xml --html=${TEST_DIR}/report-${test}.html --cluster_type ${cluster_type}" + pytest -m ${test} --junitxml=${TEST_DIR}/report-${test}.xml --html=${TEST_DIR}/report-${test}.html --cluster_type ${cluster_type} | tee -a ${LOG} + err_exit "ERROR: Running the test using pytest failed, check ${LOG} for details!" + + popd > /dev/null + + passed=$(grep -o -E '[0-9]+ passed' ${TEST_DIR}/report-${test}.html | cut -d' ' -f1) + failed=$(grep -o -E 'check the boxes to filter the results.*' ${TEST_DIR}/report-${test}.html | grep -o -E '[0-9]+ failed' | cut -d' ' -f1) + errors=$(grep -o -E '[0-9]+ errors' ${TEST_DIR}/report-${test}.html | cut -d' ' -f1) + + TESTS_PASSED=$(($TESTS_PASSED + $passed)) + TESTS_FAILED=$(($TESTS_FAILED + $failed)) + + if [ "${errors}" -ne "0" ]; then + echo "Tests did not execute there were errors, check the logs" + exit 1 + fi + + if [ "${TESTS_FAILED}" -ne "0" ]; then + FAILED_CASES+=(${test}) + fi + + done + + TESTS=$(($TESTS_PASSED + $TESTS_FAILED)) + TOTAL_TESTS_FAILED=${TESTS_FAILED} + TOTAL_TESTS_PASSED=${TESTS_PASSED} + TOTAL_TESTS=${TESTS} + + if [ "${TESTS_FAILED}" -ne "0" ]; then + FAILED_TEST_SUITE+=(${FUNCNAME}) + fi + + end_time=$(get_date) + elapsed_time=$(time_diff "${start_time}" "${end_time}") + + # Remove the duplicates + FAILED_CASES=( $(printf '%s\n' "${FAILED_CASES[@]}" | uniq ) ) + + # print the testsuite summary + testsuitesummary ${FUNCNAME} ${elapsed_time} ${FAILED_CASES} +} diff --git a/tests/scripts/local_monitoring_tests/pytest.ini b/tests/scripts/local_monitoring_tests/pytest.ini new file mode 100644 index 000000000..48bdd36e6 --- /dev/null +++ b/tests/scripts/local_monitoring_tests/pytest.ini @@ -0,0 +1,7 @@ +# content of pytest.ini +[pytest] +markers = + sanity: mark a test as a sanity test + test_e2e: mark a test as end-to-end test + negative: mark test as a negative test + extended: mark test as a extended test diff --git a/tests/scripts/local_monitoring_tests/requirements.txt b/tests/scripts/local_monitoring_tests/requirements.txt new file mode 100644 index 000000000..b14263e72 --- /dev/null +++ b/tests/scripts/local_monitoring_tests/requirements.txt @@ -0,0 +1,4 @@ +pytest +requests +jinja2 +pytest-html==3.2.0 \ No newline at end of file diff --git a/tests/scripts/local_monitoring_tests/rest_apis/test_import_metadata.py b/tests/scripts/local_monitoring_tests/rest_apis/test_import_metadata.py new file mode 100644 index 000000000..b68627683 --- /dev/null +++ b/tests/scripts/local_monitoring_tests/rest_apis/test_import_metadata.py @@ -0,0 +1,185 @@ +""" +Copyright (c) 2024, 2024 Red Hat, IBM Corporation and others. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +""" +import pytest +import json +import sys + +sys.path.append("../../") + +from helpers.fixtures import * +from helpers.kruize import * +from helpers.utils import * +from helpers.import_metadata_json_validate import * +from jinja2 import Environment, FileSystemLoader + +mandatory_fields = [ + ("version", ERROR_STATUS_CODE, ERROR_STATUS), + ("datasource_name", ERROR_STATUS_CODE, ERROR_STATUS) +] + +csvfile = "/tmp/import_metadata_test_data.csv" + +@pytest.mark.sanity +def test_import_metadata(cluster_type): + """ + Test Description: This test validates the response status code of dsmetadata API by passing a + valid input for the json + """ + input_json_file = "../json_files/import_metadata.json" + + form_kruize_url(cluster_type) + + response = delete_metadata(input_json_file) + print("delete metadata = ", response.status_code) + + # Import metadata using the specified json + response = import_metadata(input_json_file) + metadata_json = response.json() + + # Validate the json against the json schema + errorMsg = validate_import_metadata_json(metadata_json, import_metadata_json_schema) + assert errorMsg == "" + + response = delete_metadata(input_json_file) + print("delete metadata = ", response.status_code) + + +@pytest.mark.negative +@pytest.mark.parametrize( + "test_name, expected_status_code, version, datasource_name", + generate_test_data(csvfile, import_metadata_test_data, "import_metadata")) +def test_import_metadata_invalid_test(test_name, expected_status_code, version, datasource_name, cluster_type): + """ + Test Description: This test validates the response status code of POST dsmtedata API against + invalid input (blank, null, empty) for the json parameters. + """ + print("\n****************************************************") + print("Test - ", test_name) + print("****************************************************\n") + tmp_json_file = "/tmp/import_metadata_" + test_name + ".json" + + print("tmp_json_file = ", tmp_json_file) + + form_kruize_url(cluster_type) + + environment = Environment(loader=FileSystemLoader("../json_files/")) + template = environment.get_template("import_metadata_template.json") + if "null" in test_name: + field = test_name.replace("null_", "") + json_file = "../json_files/import_metadata_template.json" + filename = "/tmp/import_metadata_template.json" + + strip_double_quotes_for_field(json_file, field, filename) + environment = Environment(loader=FileSystemLoader("/tmp/")) + template = environment.get_template("import_metadata_template.json") + + content = template.render( + version=version, + datasource_name=datasource_name, + ) + with open(tmp_json_file, mode="w", encoding="utf-8") as message: + message.write(content) + + response = delete_metadata(tmp_json_file) + print("delete metadata = ", response.status_code) + + # Import metadata using the specified json + response = import_metadata(tmp_json_file) + metadata_json = response.json() + + # temporarily moved this up to avoid failures in the subsequent tests + response_delete_metadata = delete_metadata(tmp_json_file) + print("delete metadata = ", response_delete_metadata.status_code) + + assert response.status_code == int(expected_status_code) + + +@pytest.mark.extended +@pytest.mark.parametrize("field, expected_status_code, expected_status", mandatory_fields) +def test_import_metadata_mandatory_fields(cluster_type, field, expected_status_code, expected_status): + form_kruize_url(cluster_type) + + # Import metadata using the specified json + json_file = "/tmp/import_metadata.json" + input_json_file = "../json_files/import_metadata_mandatory.json" + json_data = json.load(open(input_json_file)) + + if field == 'version': + json_data.pop("version", None) + else: + json_data.pop("datasource_name", None) + + print("\n*****************************************") + print(json_data) + print("*****************************************\n") + data = json.dumps(json_data) + with open(json_file, 'w') as file: + file.write(data) + + response = delete_metadata(json_file) + print("delete metadata = ", response.status_code) + + # Import metadata using the specified json + response = import_metadata(json_file) + metadata_json = response.json() + + assert response.status_code == expected_status_code, \ + f"Mandatory field check failed for {field} actual - {response.status_code} expected - {expected_status_code}" + assert metadata_json['status'] == expected_status + + response = delete_metadata(json_file) + print("delete metadata = ", response.status_code) + + +@pytest.mark.sanity +def test_repeated_metadata_import(cluster_type): + """ + Test Description: This test validates the response status code of /dsmetadata API by specifying the + same datasource name + """ + input_json_file = "../json_files/import_metadata.json" + json_data = json.load(open(input_json_file)) + + datasource_name = json_data['datasource_name'] + print("datasource_name = ", datasource_name) + + form_kruize_url(cluster_type) + + response = delete_metadata(input_json_file) + print("delete metadata = ", response.status_code) + + # Import metadata using the specified json + response = import_metadata(input_json_file) + metadata_json = response.json() + + assert response.status_code == SUCCESS_STATUS_CODE + + # Validate the json against the json schema + errorMsg = validate_import_metadata_json(metadata_json, import_metadata_json_schema) + assert errorMsg == "" + + # Import metadata using the specified json + response = import_metadata(input_json_file) + metadata_json = response.json() + + assert response.status_code == SUCCESS_STATUS_CODE + + # Validate the json against the json schema + errorMsg = validate_import_metadata_json(metadata_json, import_metadata_json_schema) + assert errorMsg == "" + + response = delete_metadata(input_json_file) + print("delete metadata = ", response.status_code) \ No newline at end of file diff --git a/tests/scripts/local_monitoring_tests/rest_apis/test_list_datasources.py b/tests/scripts/local_monitoring_tests/rest_apis/test_list_datasources.py new file mode 100644 index 000000000..95ed0710a --- /dev/null +++ b/tests/scripts/local_monitoring_tests/rest_apis/test_list_datasources.py @@ -0,0 +1,95 @@ +""" +Copyright (c) 2024, 2024 Red Hat, IBM Corporation and others. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +""" +import pytest +import json +import sys + +sys.path.append("../../") + +from helpers.fixtures import * +from helpers.kruize import * +from helpers.utils import * +from helpers.list_datasources_json_validate import * + + +@pytest.mark.sanity +def test_list_datasources_without_parameters(cluster_type): + """ + Test Description: This test validates datasources API without parameters + """ + form_kruize_url(cluster_type) + + # Get the datasources name + datasource_name = None + response = list_datasources(datasource_name) + + list_datasources_json = response.json() + + assert response.status_code == SUCCESS_200_STATUS_CODE + + # Validate the json against the json schema + errorMsg = validate_list_datasources_json(list_datasources_json, list_datasources_json_schema) + assert errorMsg == "" + + +@pytest.mark.sanity +def test_list_datasources_with_name(cluster_type): + """ + Test Description: This test validates datasources API with 'name' parameter + """ + form_kruize_url(cluster_type) + + # Get the datasources name + datasource_name = "prometheus-1" + response = list_datasources(datasource_name) + + list_datasources_json = response.json() + + assert response.status_code == SUCCESS_200_STATUS_CODE + + # Validate the json against the json schema + errorMsg = validate_list_datasources_json(list_datasources_json, list_datasources_json_schema) + assert errorMsg == "" + + +@pytest.mark.negative +@pytest.mark.parametrize("test_name, expected_status_code, datasource_name", + [ + ("blank_name", 400, ""), + ("null_name", 400, "null"), + ("invalid_name", 400, "xyz") + ] +) +def test_list_datasources_invalid_datasource_name(test_name, expected_status_code, datasource_name, cluster_type): + """ + Test Description: This test validates the response status code of list datasources API against + invalid input (blank, null, empty) for the json parameters. + """ + print("\n****************************************************") + print("Test datasource_name = ", datasource_name) + print("****************************************************\n") + + form_kruize_url(cluster_type) + + # Get the datasource name + name = datasource_name + response = list_datasources(name) + + list_datasources_json = response.json() + assert response.status_code == ERROR_STATUS_CODE + assert list_datasources_json['message'] == LIST_DATASOURCES_ERROR_MSG % name + + diff --git a/tests/scripts/remote_monitoring_tests/Remote_monitoring_tests.md b/tests/scripts/remote_monitoring_tests/Remote_monitoring_tests.md index 0f2b27b11..d401f9006 100644 --- a/tests/scripts/remote_monitoring_tests/Remote_monitoring_tests.md +++ b/tests/scripts/remote_monitoring_tests/Remote_monitoring_tests.md @@ -75,6 +75,22 @@ Here are the test scenarios: - for non-contiguous data: - similar tests as mentioned above for contiguous + +### **Update Recommendation API tests** + + +Here are the test scenarios: + +- Update recommendations with valid results and plots available +- Update recommendations with no plots available when no recommendations available for medium and long term +- Update recommendations with just interval_end_time in input +- Update recommendations without experiment name or end_time +- Update recommendations without end_time +- Update recommendations with invalid end_time format +- Update recommendations with unknown experiment_name +- Update recommendations with unknown end_time +- Update recommendations with end_time preceding start_time + The above tests are developed using pytest framework and the tests are run using shell script wrapper that does the following: - Deploys kruize in non-CRD mode using the [deploy script](https://github.com/kruize/autotune/blob/master/deploy.sh) from the autotune repo - Creates a resource optimization performance profile using the [createPerformanceProfile API](/design/PerformanceProfileAPI.md) @@ -97,7 +113,7 @@ Use the below command to test : Where values for test_autotune.sh are: ``` -usage: test_autotune.sh [ -c ] : cluster type. Supported type - minikube +usage: test_autotune.sh [ -c ] : cluster type. Supported type - minikube, openshift. Default - minikube [ -i ] : optional. Kruize docker image to be used for testing, default - kruize/autotune_operator:test [ -r ] : Location of benchmarks. Not required for remote_monitoring_tests [ --tctype ] : optional. Testcases type to run, default is functional (runs all functional tests) diff --git a/tests/scripts/remote_monitoring_tests/fault_tolerant_tests/kruize_pod_restart_test.py b/tests/scripts/remote_monitoring_tests/fault_tolerant_tests/kruize_pod_restart_test.py index 41bf659fd..50118a982 100644 --- a/tests/scripts/remote_monitoring_tests/fault_tolerant_tests/kruize_pod_restart_test.py +++ b/tests/scripts/remote_monitoring_tests/fault_tolerant_tests/kruize_pod_restart_test.py @@ -18,7 +18,7 @@ import json import os import time -sys.path.append("..") +sys.path.append("../../") from helpers.kruize import * from helpers.utils import * from helpers.generate_rm_jsons import * diff --git a/tests/scripts/remote_monitoring_tests/rest_apis/test_create_experiment.py b/tests/scripts/remote_monitoring_tests/rest_apis/test_create_experiment.py index b72032891..a0825f02c 100644 --- a/tests/scripts/remote_monitoring_tests/rest_apis/test_create_experiment.py +++ b/tests/scripts/remote_monitoring_tests/rest_apis/test_create_experiment.py @@ -1,4 +1,22 @@ +""" +Copyright (c) 2022, 2024 Red Hat, IBM Corporation and others. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +""" import pytest +import sys +sys.path.append("../../") + from helpers.fixtures import * from helpers.kruize import * from helpers.utils import * diff --git a/tests/scripts/remote_monitoring_tests/rest_apis/test_e2e_workflow.py b/tests/scripts/remote_monitoring_tests/rest_apis/test_e2e_workflow.py index c501f9b27..a956ac732 100644 --- a/tests/scripts/remote_monitoring_tests/rest_apis/test_e2e_workflow.py +++ b/tests/scripts/remote_monitoring_tests/rest_apis/test_e2e_workflow.py @@ -1,7 +1,25 @@ +""" +Copyright (c) 2022, 2024 Red Hat, IBM Corporation and others. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +""" import copy import json import pytest +import sys +sys.path.append("../../") + from helpers.fixtures import * from helpers.generate_rm_jsons import * from helpers.kruize import * @@ -104,6 +122,7 @@ def test_list_recommendations_multiple_exps_from_diff_json_files(cluster_type): assert data[0]['experiment_name'] == experiment_name assert data[0]['kubernetes_objects'][0]['containers'][0]['recommendations']['notifications'][NOTIFICATION_CODE_FOR_RECOMMENDATIONS_AVAILABLE][ 'message'] == RECOMMENDATIONS_AVAILABLE + response = list_recommendations(experiment_name) if response.status_code == SUCCESS_200_STATUS_CODE: recommendation_json = response.json() diff --git a/tests/scripts/remote_monitoring_tests/rest_apis/test_list_recommendations.py b/tests/scripts/remote_monitoring_tests/rest_apis/test_list_recommendations.py index 1ce577fce..ba767c9af 100644 --- a/tests/scripts/remote_monitoring_tests/rest_apis/test_list_recommendations.py +++ b/tests/scripts/remote_monitoring_tests/rest_apis/test_list_recommendations.py @@ -1,7 +1,24 @@ +""" +Copyright (c) 2022, 2024 Red Hat, IBM Corporation and others. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +""" import datetime import json import pytest +import sys +sys.path.append("../../") from helpers.all_terms_list_reco_json_schema import all_terms_list_reco_json_schema from helpers.fixtures import * @@ -386,7 +403,6 @@ def test_list_recommendations_single_exp_multiple_results(cluster_type): assert data[0]['kubernetes_objects'][0]['containers'][0]['recommendations']['notifications'][ NOTIFICATION_CODE_FOR_RECOMMENDATIONS_AVAILABLE]['message'] == RECOMMENDATIONS_AVAILABLE - response = list_recommendations(experiment_name) list_reco_json = response.json() @@ -1054,7 +1070,8 @@ def test_list_recommendations_for_diff_reco_terms_with_only_latest(test_name, nu exp_found = False for list_reco in list_reco_json: if create_exp_json[0]['experiment_name'] == list_reco['experiment_name']: - validate_reco_json(create_exp_json[0], update_results_json, list_reco, expected_duration_in_hours, test_name) + validate_reco_json(create_exp_json[0], update_results_json, list_reco, expected_duration_in_hours, + test_name) exp_found = True continue diff --git a/tests/scripts/remote_monitoring_tests/rest_apis/test_update_recommendations.py b/tests/scripts/remote_monitoring_tests/rest_apis/test_update_recommendations.py index 9273c271d..da643a78a 100644 --- a/tests/scripts/remote_monitoring_tests/rest_apis/test_update_recommendations.py +++ b/tests/scripts/remote_monitoring_tests/rest_apis/test_update_recommendations.py @@ -1,4 +1,21 @@ +""" +Copyright (c) 2022, 2024 Red Hat, IBM Corporation and others. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +""" import pytest +import sys +sys.path.append("../../") from helpers.fixtures import * from helpers.kruize import * from helpers.list_reco_json_validate import * @@ -12,6 +29,7 @@ def test_update_valid_recommendations_after_results_after_create_exp(cluster_typ update results for 24 hrs + update recommendation using start and end time as a parameter Expected : recommendation should be available for the timestamp provided + Expected : plots data should be available ''' input_json_file = "../json_files/create_exp.json" result_json_file = "../json_files/update_results.json" @@ -27,7 +45,7 @@ def test_update_valid_recommendations_after_results_after_create_exp(cluster_typ # Create experiment using the specified json num_exps = 1 - num_res = 100 + num_res = 2 for i in range(num_exps): create_exp_json_file = "/tmp/create_exp_" + str(i) + ".json" generate_json(find, input_json_file, create_exp_json_file, i) @@ -77,7 +95,7 @@ def test_update_valid_recommendations_after_results_after_create_exp(cluster_typ assert data['message'] == UPDATE_RESULTS_SUCCESS_MSG # Expecting that we have recommendations - if j > 96: + if j > 1: response = update_recommendations(experiment_name, None, end_time) data = response.json() assert response.status_code == SUCCESS_STATUS_CODE @@ -119,7 +137,133 @@ def test_update_valid_recommendations_after_results_after_create_exp(cluster_typ update_results_json = [] update_results_json.append(result_json_arr[len(result_json_arr) - 1]) - expected_duration_in_hours = SHORT_TERM_DURATION_IN_HRS_MAX + expected_duration_in_hours = SHORT_TERM_DURATION_IN_HRS_MIN + validate_reco_json(create_exp_json[0], update_results_json, list_reco_json[0], expected_duration_in_hours) + + # Delete all the experiments + for i in range(num_exps): + json_file = "/tmp/create_exp_" + str(i) + ".json" + response = delete_experiment(json_file) + print("delete exp = ", response.status_code) + assert response.status_code == SUCCESS_STATUS_CODE + + +@pytest.mark.sanity +def test_plots_with_no_recommendations_in_some_terms(cluster_type): + ''' + Creates Experiment + + update results for 30 mins + + update recommendation using start and end time as a parameter + Expected : recommendation should be available for the timestamp provided + Expected : plots data should not be available for medium and long term + ''' + input_json_file = "../json_files/create_exp.json" + result_json_file = "../json_files/update_results.json" + + find = [] + json_data = json.load(open(input_json_file)) + + find.append(json_data[0]['experiment_name']) + find.append(json_data[0]['kubernetes_objects'][0]['name']) + find.append(json_data[0]['kubernetes_objects'][0]['namespace']) + + form_kruize_url(cluster_type) + + # Create experiment using the specified json + num_exps = 1 + num_res = 2 + for i in range(num_exps): + create_exp_json_file = "/tmp/create_exp_" + str(i) + ".json" + generate_json(find, input_json_file, create_exp_json_file, i) + + # Delete the experiment + response = delete_experiment(create_exp_json_file) + print("delete exp = ", response.status_code) + + # Create the experiment + response = create_experiment(create_exp_json_file) + + data = response.json() + print("message = ", data['message']) + assert response.status_code == SUCCESS_STATUS_CODE + assert data['status'] == SUCCESS_STATUS + assert data['message'] == CREATE_EXP_SUCCESS_MSG + + # Update results for the experiment + update_results_json_file = "/tmp/update_results_" + str(i) + ".json" + + result_json_arr = [] + # Get the experiment name + json_data = json.load(open(create_exp_json_file)) + experiment_name = json_data[0]['experiment_name'] + interval_start_time = get_datetime() + for j in range(num_res): + update_timestamps = True + generate_json(find, result_json_file, update_results_json_file, i, update_timestamps) + result_json = read_json_data_from_file(update_results_json_file) + if j == 0: + start_time = interval_start_time + else: + start_time = end_time + + result_json[0]['interval_start_time'] = start_time + end_time = increment_timestamp_by_given_mins(start_time, 15) + result_json[0]['interval_end_time'] = end_time + + write_json_data_to_file(update_results_json_file, result_json) + result_json_arr.append(result_json[0]) + response = update_results(update_results_json_file) + + data = response.json() + print("message = ", data['message']) + assert response.status_code == SUCCESS_STATUS_CODE + assert data['status'] == SUCCESS_STATUS + assert data['message'] == UPDATE_RESULTS_SUCCESS_MSG + + # Expecting that we have recommendations after minimum of two datapoints + if j > 1: + response = update_recommendations(experiment_name, None, end_time) + data = response.json() + assert response.status_code == SUCCESS_STATUS_CODE + assert data[0]['experiment_name'] == experiment_name + assert data[0]['kubernetes_objects'][0]['containers'][0]['recommendations']['notifications']['111000'][ + 'message'] == 'Recommendations Are Available' + response = list_recommendations(experiment_name) + if response.status_code == SUCCESS_200_STATUS_CODE: + recommendation_json = response.json() + recommendation_section = recommendation_json[0]["kubernetes_objects"][0]["containers"][0][ + "recommendations"] + high_level_notifications = recommendation_section["notifications"] + # Check if duration + assert INFO_RECOMMENDATIONS_AVAILABLE_CODE in high_level_notifications + data_section = recommendation_section["data"] + short_term_recommendation = data_section[str(end_time)]["recommendation_terms"]["short_term"] + short_term_notifications = short_term_recommendation["notifications"] + for notification in short_term_notifications.values(): + assert notification["type"] != "error" + + response = update_recommendations(experiment_name, None, end_time) + data = response.json() + assert response.status_code == SUCCESS_STATUS_CODE + assert data[0]['experiment_name'] == experiment_name + assert data[0]['kubernetes_objects'][0]['containers'][0]['recommendations']['notifications']['111000'][ + 'message'] == 'Recommendations Are Available' + + # Invoke list recommendations for the specified experiment + response = list_recommendations(experiment_name) + assert response.status_code == SUCCESS_200_STATUS_CODE + list_reco_json = response.json() + + # Validate the json against the json schema + errorMsg = validate_list_reco_json(list_reco_json, list_reco_json_schema) + assert errorMsg == "" + + # Validate the json values + create_exp_json = read_json_data_from_file(create_exp_json_file) + update_results_json = [] + update_results_json.append(result_json_arr[len(result_json_arr) - 1]) + + expected_duration_in_hours = SHORT_TERM_DURATION_IN_HRS_MIN validate_reco_json(create_exp_json[0], update_results_json, list_reco_json[0], expected_duration_in_hours) # Delete all the experiments diff --git a/tests/scripts/remote_monitoring_tests/rest_apis/test_update_results.py b/tests/scripts/remote_monitoring_tests/rest_apis/test_update_results.py index 65ef4d5f7..44a77a98b 100644 --- a/tests/scripts/remote_monitoring_tests/rest_apis/test_update_results.py +++ b/tests/scripts/remote_monitoring_tests/rest_apis/test_update_results.py @@ -1,4 +1,21 @@ +""" +Copyright (c) 2022, 2024 Red Hat, IBM Corporation and others. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +""" import pytest +import sys +sys.path.append("../../") from helpers.fixtures import * from helpers.kruize import * from helpers.utils import * diff --git a/tests/test_autotune.sh b/tests/test_autotune.sh index 5bf18e2e4..c99bece9f 100755 --- a/tests/test_autotune.sh +++ b/tests/test_autotune.sh @@ -214,7 +214,7 @@ if [ ! -z "${testcase}" ]; then fi # check for benchmarks directory path -if [ ! "${testsuite}" == "remote_monitoring_tests" ]; then +if [[ "${testsuite}" != "remote_monitoring_tests" && "${testsuite}" != "local_monitoring_tests" ]]; then if [ -z "${APP_REPO}" ]; then echo "Error: Do specify the benchmarks directory path" usage @@ -256,7 +256,8 @@ if [ "${setup}" -ne "0" ]; then exit 0 fi else - if [ ${testsuite} == "remote_monitoring_tests" ]; then + #TODO: the target for local monitoring is temporarily set to "crc" for the demo + if [ ${testsuite} == "remote_monitoring_tests" ] || [ ${testsuite} == "local_monitoring_tests" ] ; then target="crc" else target="autotune" diff --git a/tests/test_plans/test_plan_rel_0.0.21.md b/tests/test_plans/test_plan_rel_0.0.21.md new file mode 100644 index 000000000..202fb470b --- /dev/null +++ b/tests/test_plans/test_plan_rel_0.0.21.md @@ -0,0 +1,146 @@ +# KRUIZE TEST PLAN RELEASE 0.0.21 + +- [INTRODUCTION](#introduction) +- [FEATURES TO BE TESTED](#features-to-be-tested) +- [BUG FIXES TO BE TESTED](#bug-fixes-to-be-tested) +- [TEST ENVIRONMENT](#test-environment) +- [TEST DELIVERABLES](#test-deliverables) + - [New Test Cases Developed](#new-test-cases-developed) + - [Regression Testing](#regresion-testing) +- [SCALABILITY TESTING](#scalability-testing) +- [RELEASE TESTING](#release-testing) +- [TEST METRICS](#test-metrics) +- [RISKS AND CONTINGENCIES](#risks-and-contingencies) +- [APPROVALS](#approvals) + +----- + +## INTRODUCTION + +This document describes the test plan for Kruize remote monitoring release 0.0.21 + +---- + +## FEATURES TO BE TESTED + +* Kruize local changes + +Kruize local changes have been included in this release which allows a user to add datasources, import datasource metadata, create an experiment and generate recommendations + using the metric results from the specified datasource. Refer [doc](https://github.com/kruize/autotune/pull/1174/files#diff-a23fa581de2556a8ab7cec3efa3b03833fdfa86d42d96209cf691b8f288210f8) for further details. + +* Kruize Security vulnerability issues + + Security Vulnerabilities in the Kruize dependencies have been fixed through the below issues: + + + * [1150](https://github.com/kruize/autotune/pull/1150) + * [1153](https://github.com/kruize/autotune/pull/1153) + + +* Kruize logging using CloudWatch + +Send kruize logs to CloudWatch so that these logs can be viewed using tools like kibana to debug issues + + +------ + +## BUG FIXES TO BE TESTED + +* [1156](https://github.com/kruize/autotune/pull/1156) - Notification is not displayed when the CPU usage is less than a millicore +* [1165](https://github.com/kruize/autotune/pull/1165) - Fix the missing validation for Update recommendation API + +--- + +## TEST ENVIRONMENT + +* Minikube Cluster +* Openshift Cluster + +--- + +## TEST DELIVERABLES + +### New Test Cases Developed + +| # | ISSUE (NEW FEATURE) | TEST DESCRIPTION | TEST DELIVERABLES | RESULTS | COMMENTS | +| --- |--------------------------------------------------------------------------------------------------------------------------------------| ---------------- | ----------------- | ----- | --- | +| 1 | [Kruize local changes](https://github.com/kruize/autotune/issues/) | Test scenarios identified - [1134](https://github.com/kruize/autotune/issues/1134), [1129](https://github.com/kruize/autotune/issues/1129), [1160](https://github.com/kruize/autotune/issues/1160) |Kruize local is PoC, tests will be implemented while productizing | Kruize local workflow tested manually | PASSED on Openshift | Debugging generate recommendations issue on minikube +| 2 | [Kruize CloudWatch logging](https://github.com/kruize/autotune/pull/1173) | Kruize logging to CloudWatch is tested by using a CloudWatch in AWS cluster manually | Manual test | PASSED | | +| 3 | [Notifications are not displayed when the CPU usage is less than a millicore or zero](https://github.com/kruize/autotune/pull/1156) | Kruize Functional testsuite will be updated to post results with cpu usage of less than millicore or zero to validate these notifications | Functional tests included in the same PR | PASSED | | + +### Regression Testing + +| # | ISSUE (BUG/NEW FEATURE) | TEST CASE | RESULTS | COMMENTS | +| --- |--------------------------------| ---------------- | -------- | --- | +| 1 | Kruize remote monitoring tests | Functional test suite | PASSED | | +| 1 | Kruize fault tolerant tests | Functional test suite | PASSED | | +| 1 | Kruize stress tests | Functional test suite | PASSED | | +| 2 | Kruize local monitoring demo | kruize demo | Tested it manually | Authentication failure on Openshift has been fixed, recommendations issue on minikube is being debugged +| 3 | Short Scalability test | 5k exps / 15 days | PASSED | + +--- + +## SCALABILITY TESTING + +Evaluate Kruize Scalability on OCP, with 5k experiments by uploading resource usage data for 15 days and update recommendations. +Changes do not have scalability implications. Short scalability test will be run as part of the release testing + +Short Scalability run +- 5K exps / 15 days of results / 2 containers per exp +- Kruize replicas - 10 +- OCP - Scalelab cluster + +Kruize Release | Exps / Results / Recos | Execution time | Latency (Max/ Avg) in seconds ||| Postgres DB size(MB) | Kruize Max CPU | Kruize Max Memory (GB) +-- | -- | -- | -- | -- | -- | --| -- | -- +  |   |   | UpdateRecommendations | UpdateResults | LoadResultsByExpName |   |   |   +0.0.20.3_mvp | 5K / 72L / 3L | 3h 49 mins | 0.62 / 0.39 | 0.24 / 0.17 | 0.34 / 0.25 | 21302.32 | 4.8 | 40.6 +0.0.20.3_mvp (With Box plots) | 5K / 72L / 3L | 3h 50mins | 0.61 / 0.39 | 025 / 0.18 | 0.34 / 0.25 | 21855.04 | 4.7 | 35.1 +0.0.21_mvp | 5K / 72L / 3L | 3h 50 mins | 0.62 / 0.39 | 0.25 / 0.17 | 0.34 / 0.25 | 21417.14 | 6.04 | 35.37 +0.0.21_mvp (With Box plots) | 5K / 72L / 3L | 3h 53 mins | 0.63 / 0.39 | 0.25 / 0.17 | 0.35 / 0. 25 | 21868.5 | 4.4 | 40.71 + +---- +## RELEASE TESTING + +As part of the release testing, following tests will be executed: +- [Kruize Remote monitoring Functional tests](/tests/scripts/remote_monitoring_tests/Remote_monitoring_tests.md) +- Kruize Local monitoring workflow - Tested manually +- [Fault tolerant test](/tests/scripts/remote_monitoring_tests/fault_tolerant_tests.md) +- [Stress test](/tests/scripts/remote_monitoring_tests/README.md) +- [Scalability test (On openshift)](/tests/scripts/remote_monitoring_tests/scalability_test.md) - scalability test with 5000 exps / 15 days usage data +- [Kruize remote monitoring demo (On minikube)](https://github.com/kruize/kruize-demos/blob/main/monitoring/remote_monitoring_demo/README.md) + + +| # | TEST SUITE | EXPECTED RESULTS | ACTUAL RESULTS | COMMENTS | +| --- | ---------- | ---------------- | -------------- | -------- | +| 1 | Kruize Remote monitoring Functional testsuite | TOTAL - 356, PASSED - 313 / FAILED - 43 | TOTAL - 356, PASSED - 313 / FAILED - 43 | No new regressions seen, existing issues - [559](https://github.com/kruize/autotune/issues/559), [610](https://github.com/kruize/autotune/issues/610) | +| 2 | Kruize Local monitoring workflow | PASSED | PASSED on Openshift, recommendations issue on minikube | PoC code, tested it manually | +| 3 | Fault tolerant test | PASSED | PASSED | | +| 4 | Stress test | PASSED | FAILED | [Intermittent failure](https://github.com/kruize/autotune/issues/1106) | +| 5 | Scalability test (short run)| PASSED | PASSED | | +| 6 | Kruize remote monitoring demo | PASSED | PASSED | | + +--- + +## TEST METRICS + +### Test Completion Criteria + +* All must_fix defects identified for the release are fixed +* New features work as expected and tests have been added to validate these +* No new regressions in the functional tests +* All non-functional tests work as expected without major issues +* Documentation updates have been completed + +---- + +## RISKS AND CONTINGENCIES + +* None + +---- +## APPROVALS + +Sign-off + +---- +