diff --git a/design/KruizeLocalAPI.md b/design/KruizeLocalAPI.md
new file mode 100644
index 000000000..1d5e22f39
--- /dev/null
+++ b/design/KruizeLocalAPI.md
@@ -0,0 +1,2247 @@
+# Local Monitoring Mode - Proof of Concept
+
+This article describes how to quickly get started with the Kruize Local Monitoring Mode use case REST API using curl command.
+Documentation still in progress stay tuned.
+
+# Table of Contents
+
+1. [Resource Analysis Terms and Defaults](#resource-analysis-terms-and-defaults)
+
+- [Terms, Duration & Threshold Table](#terms-duration--threshold-table)
+
+2. [API's](#apis)
+- [List Datasources API](#list-datasources-api)
+ - Introduction
+ - Example Request and Response
+ - Invalid Scenarios
+
+- [Import Metadata API](#import-metadata-api)
+ - Introduction
+ - Example Request and Response
+ - Invalid Scenarios
+
+- [List Metadata API](#list-metadata-api)
+ - Introduction
+ - Example Request and Response
+ - Invalid Scenarios
+
+- [Delete Metadata API](#delete-metadata-api)
+ - Introduction
+ - Example Request and Response
+ - Invalid Scenarios
+
+- [Create Experiment API](#create-experiment-api)
+ - Introduction
+ - Example Request and Response
+ - Invalid Scenarios
+
+- [List Experiments API](#list-experiments-api)
+ - Introduction
+ - Example Request and Response
+ - Invalid Scenarios
+
+- [Generate Recommendations API](#generate-recommendations-api)
+ - Introduction
+ - Example Request and Response
+ - Invalid Scenarios
+
+
+
+## Resource Analysis Terms and Defaults
+
+When analyzing resource utilization in Kubernetes, it's essential to define terms that specify the duration of past data
+considered for recommendations and the threshold for obtaining additional data. These terms help in categorizing and
+fine-tuning resource allocation.
+
+Below are the default terms used in resource analysis, along with their respective durations and thresholds:
+
+
+
+### Terms, Duration & Threshold Table
+
+| Term | Minimum Data Threshold | Duration |
+|--------|------------------------|----------|
+| Short | 30 mins | 1 day |
+| Medium | 2 Days | 7 days |
+| Long | 8 Days | 15 days |
+
+**Minimum Data Threshold**: The "minimum data threshold" represents the minimum amount of data needed for generating a
+recommendation associated with a given duration term.
+
+**Duration**: The "duration" in the term analysis refers to the amount of historical data taken into account when
+assessing resource utilization.
+
+Read more about the Term Threshold scenarios [here](TermThresholdDesign.md)
+
+### Profile Algorithm's (How Kruize calculate's the recommendations)
+
+**Profile:**
+
+This column represents different profiles or criteria that the recommendation algorithm takes into account when making
+recommendations.
+
+**CPU (Percentile):**
+
+It indicates the percentile value for the timeseries CPU usage data that the algorithm considers for each profile.
+
+**Memory (Percentile):**
+
+Similarly, this column denotes the percentile value for the timeseries memory usage data that is used by the algorithm
+for each profile.
+
+#### Profiles
+
+**Cost Profile:**
+For the "Cost" profile, Kruize's recommendation algorithm will consider the 60th percentile for CPU usage and the 100th
+percentile for memory usage when making recommendations. This means that cost-related recommendations will be based on
+CPU usage that falls at or above the 60th percentile and memory usage at the 100th percentile.
+
+**Performance Profile:**
+In the "Performance" profile, the algorithm takes into account the 98th percentile for CPU usage and the 100th
+percentile for memory usage. Consequently, recommendations related to performance will be generated when CPU usage is at
+or above the 98th percentile, and memory usage is at the 100th percentile.
+
+| Profile | CPU (Percentile) | Memory (percentile) |
+|-------------|------------------|---------------------|
+| Cost | 60 th | 100 th |
+| Performance | 98 th | 100 th |
+
+
+
+## API's
+
+
+
+### List Datasources API
+
+This is quick guide instructions to list available datasources as follows.
+
+**Request without Parameter**
+
+`GET /datasources`
+
+`curl -H 'Accept: application/json' http://:/datasources`
+
+If no parameter is passed API returns all the datasources available.
+
+**Response**
+
+
+Example Response
+
+### Example Response
+
+```json
+{
+ "version": "v1.0",
+ "datasources": [
+ {
+ "name": "prometheus-1",
+ "provider": "prometheus",
+ "serviceName": "prometheus-k8s",
+ "namespace": "monitoring",
+ "url": "http://prometheus-k8s.monitoring.svc.cluster.local:9090"
+ }
+ ]
+}
+```
+
+
+
+**Request with datasource name parameter**
+
+`GET /datasources`
+
+`curl -H 'Accept: application/json' http://:/datasources?name=`
+
+Returns the datasource details of the specified datasource
+
+**Response for datasource name - `prometheus-1`**
+
+
+Example Response
+
+### Example Response
+
+```json
+{
+ "version": "v1.0",
+ "datasources": [
+ {
+ "name": "prometheus-1",
+ "provider": "prometheus",
+ "serviceName": "prometheus-k8s",
+ "namespace": "monitoring",
+ "url": "http://prometheus-k8s.monitoring.svc.cluster.local:9090"
+ }
+ ]
+}
+```
+
+
+
+
+
+### Import Metadata API
+
+This is quick guide instructions to import metadata using input JSON as follows.
+
+**Request**
+`POST /dsmetadata`
+
+`curl -H 'Accept: application/json' -X POST --data 'copy paste below JSON' http://:/dsmetadata`
+
+
+
+Example Request
+
+### Example Request
+
+```json
+{
+ "version": "v1.0",
+ "datasource_name": "prometheus-1"
+}
+```
+
+
+
+
+**Response**
+
+
+Example Response
+
+### Example Response
+
+```json
+{
+ "datasources": {
+ "prometheus-1": {
+ "datasource_name": "prometheus-1",
+ "clusters": {
+ "default": {
+ "cluster_name": "default"
+ }
+ }
+ }
+ }
+}
+```
+
+
+
+
+
+### List Metadata API
+
+This is quick guide instructions to retrieve metadata for a specific datasource as follows.
+
+**Request Parameters**
+
+| Parameter | Type | Required | Description |
+|--------------|--------|----------|-------------------------------------------|
+| datasource | string | Yes | The name of the datasource. |
+| cluster_name | string | optional | The name of the cluster |
+| namespace | string | optional | The namespace |
+| verbose | string | optional | Flag to retrieve container-level metadata |
+
+In the context of `GET /dsmetadata` REST API, the term `verbose` refers to a parameter or option that controls
+granularity of metadata included in the API response. When the verbose parameter is set to true, the API response
+includes granular container-level details in the metadata, offering a more comprehensive view of the clusters, namespaces,
+workloads and containers associated with the specified datasource. When the verbose parameter is not provided or set to
+false, the API response provides basic information like list of clusters, namespaces associated with the specified datasource.
+
+**Request with datasource name parameter**
+
+`GET /dsmetadata`
+
+`curl -H 'Accept: application/json' http://:/dsmetadata?datasource=`
+
+Returns the list of cluster details of the specified datasource
+
+**Response for datasource name - `prometheus-1`**
+
+***Note:***
+- Currently, only `default` cluster is supported for POC.
+- When the `verbose` parameter is not provided, is set to `false` by default - the response provides basic information
+about the clusters of the specified datasource.
+
+
+Example Response
+
+### Example Response
+
+```json
+{
+ "datasources": {
+ "prometheus-1": {
+ "datasource_name": "prometheus-1",
+ "clusters": {
+ "default": {
+ "cluster_name": "default"
+ }
+ }
+ }
+ }
+}
+```
+
+
+
+
+**Request with verbose set to true and with datasource name parameter**
+
+`GET /dsmetadata`
+
+`curl -H 'Accept: application/json' "http://:/dsmetadata?datasource=&verbose=true"`
+
+Returns the metadata of all the containers present in the specified datasource
+
+***Note : When we don't pass `verbose` in the query URL, it is set to `false` by default.***
+
+**Response for datasource name - `prometheus-1` and verbose - `true`**
+
+With `verbose` parameter set to `true`, the response includes detailed metadata about all namespaces, workloads and
+containers in addition to cluster information with the specified datasource.
+
+
+Example Response
+
+### Example Response
+
+```json
+{
+ "datasources": {
+ "prometheus-1": {
+ "datasource_name": "prometheus-1",
+ "clusters": {
+ "default": {
+ "cluster_name": "default",
+ "namespaces": {
+ "default": {
+ "namespace": "default"
+ },
+ "cadvisor": {
+ "namespace": "cadvisor",
+ "workloads": {
+ "cadvisor": {
+ "workload_name": "cadvisor",
+ "workload_type": "daemonset",
+ "containers": {
+ "cadvisor": {
+ "container_name": "cadvisor",
+ "container_image_name": "gcr.io/cadvisor/cadvisor:v0.45.0"
+ }
+ }
+ }
+ }
+ },
+ "kube-node-lease": {
+ "namespace": "kube-node-lease"
+ },
+ "kube-system": {
+ "namespace": "kube-system",
+ "workloads": {
+ "coredns": {
+ "workload_name": "coredns",
+ "workload_type": "deployment",
+ "containers": {
+ "coredns": {
+ "container_name": "coredns",
+ "container_image_name": "k8s.gcr.io/coredns/coredns:v1.8.6"
+ }
+ }
+ },
+ "kube-proxy": {
+ "workload_name": "kube-proxy",
+ "workload_type": "daemonset",
+ "containers": {
+ "kube-proxy": {
+ "container_name": "kube-proxy",
+ "container_image_name": "k8s.gcr.io/kube-proxy:v1.24.3"
+ }
+ }
+ }
+ }
+ },
+ "monitoring": {
+ "namespace": "monitoring",
+ "workloads": {
+ "kube-state-metrics": {
+ "workload_name": "kube-state-metrics",
+ "workload_type": "deployment",
+ "containers": {
+ "kube-state-metrics": {
+ "container_name": "kube-state-metrics",
+ "container_image_name": "k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0"
+ },
+ "kube-rbac-proxy-self": {
+ "container_name": "kube-rbac-proxy-self",
+ "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0"
+ },
+ "kube-rbac-proxy-main": {
+ "container_name": "kube-rbac-proxy-main",
+ "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0"
+ }
+ }
+ },
+ "node-exporter": {
+ "workload_name": "node-exporter",
+ "workload_type": "daemonset",
+ "containers": {
+ "node-exporter": {
+ "container_name": "node-exporter",
+ "container_image_name": "quay.io/prometheus/node-exporter:v1.1.2"
+ },
+ "kube-rbac-proxy": {
+ "container_name": "kube-rbac-proxy",
+ "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0"
+ }
+ }
+ },
+ "postgres-deployment": {
+ "workload_name": "postgres-deployment",
+ "workload_type": "deployment",
+ "containers": {
+ "postgres": {
+ "container_name": "postgres",
+ "container_image_name": "quay.io/kruizehub/postgres:15.2"
+ }
+ }
+ },
+ "alertmanager-main": {
+ "workload_name": "alertmanager-main",
+ "workload_type": "statefulset",
+ "containers": {
+ "config-reloader": {
+ "container_name": "config-reloader",
+ "container_image_name": "quay.io/prometheus-operator/prometheus-config-reloader:v0.47.0"
+ },
+ "alertmanager": {
+ "container_name": "alertmanager",
+ "container_image_name": "quay.io/prometheus/alertmanager:v0.21.0"
+ }
+ }
+ },
+ "prometheus-adapter": {
+ "workload_name": "prometheus-adapter",
+ "workload_type": "deployment",
+ "containers": {
+ "prometheus-adapter": {
+ "container_name": "prometheus-adapter",
+ "container_image_name": "directxman12/k8s-prometheus-adapter:v0.8.4"
+ }
+ }
+ },
+ "kruize": {
+ "workload_name": "kruize",
+ "workload_type": "deployment",
+ "containers": {
+ "kruize": {
+ "container_name": "kruize",
+ "container_image_name": "quay.io/kruize/autotune_operator:0.0.21_mvp"
+ }
+ }
+ },
+ "grafana": {
+ "workload_name": "grafana",
+ "workload_type": "deployment",
+ "containers": {
+ "grafana": {
+ "container_name": "grafana",
+ "container_image_name": "grafana/grafana:7.5.4"
+ }
+ }
+ },
+ "prometheus-k8s": {
+ "workload_name": "prometheus-k8s",
+ "workload_type": "statefulset",
+ "containers": {
+ "config-reloader": {
+ "container_name": "config-reloader",
+ "container_image_name": "quay.io/prometheus-operator/prometheus-config-reloader:v0.47.0"
+ },
+ "prometheus": {
+ "container_name": "prometheus",
+ "container_image_name": "quay.io/prometheus/prometheus:v2.26.0"
+ }
+ }
+ },
+ "blackbox-exporter": {
+ "workload_name": "blackbox-exporter",
+ "workload_type": "deployment",
+ "containers": {
+ "kube-rbac-proxy": {
+ "container_name": "kube-rbac-proxy",
+ "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0"
+ },
+ "module-configmap-reloader": {
+ "container_name": "module-configmap-reloader",
+ "container_image_name": "jimmidyson/configmap-reload:v0.5.0"
+ },
+ "blackbox-exporter": {
+ "container_name": "blackbox-exporter",
+ "container_image_name": "quay.io/prometheus/blackbox-exporter:v0.18.0"
+ }
+ }
+ },
+ "prometheus-operator": {
+ "workload_name": "prometheus-operator",
+ "workload_type": "deployment",
+ "containers": {
+ "kube-rbac-proxy": {
+ "container_name": "kube-rbac-proxy",
+ "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0"
+ },
+ "prometheus-operator": {
+ "container_name": "prometheus-operator",
+ "container_image_name": "quay.io/prometheus-operator/prometheus-operator:v0.47.0"
+ }
+ }
+ }
+ }
+ },
+ "kube-public": {
+ "namespace": "kube-public"
+ }
+ }
+ }
+ }
+ }
+ }
+}
+```
+
+
+
+
+
+**Request with datasource name and cluster name parameter**
+
+`GET /dsmetadata`
+
+`curl -H 'Accept: application/json' "http://:/dsmetadata?datasource=&cluster_name="`
+
+Returns the list of namespaces present in the specified cluster name and datasource
+
+**Response for datasource name - `prometheus-1` and cluster name - `default`**
+
+With `verbose` parameter set to `false`, the response includes list of namespaces present in the specified cluster name
+and datasource.
+
+
+Example Response
+
+### Example Response
+
+```json
+{
+ "datasources": {
+ "prometheus-1": {
+ "datasource_name": "prometheus-1",
+ "clusters": {
+ "default": {
+ "cluster_name": "default",
+ "namespaces": {
+ "default": {
+ "namespace": "default"
+ },
+ "cadvisor": {
+ "namespace": "cadvisor"
+ },
+ "kube-node-lease": {
+ "namespace": "kube-node-lease"
+ },
+ "kube-system": {
+ "namespace": "kube-system"
+ },
+ "monitoring": {
+ "namespace": "monitoring"
+ },
+ "kube-public": {
+ "namespace": "kube-public"
+ }
+ }
+ }
+ }
+ }
+ }
+}
+```
+
+
+
+
+
+**Request with datasource name, cluster name and verbose parameters**
+
+`GET /dsmetadata`
+
+`curl -H 'Accept: application/json' "http://:/dsmetadata?datasource=&cluster_name=&verbose=true"`
+
+Returns the container-level metadata of all the namespaces present in the specified cluster name and datasource
+
+**Response for datasource name - `prometheus-1`, cluster name - `default` and verbose - `true`**
+
+With `verbose` parameter set to `true`, the response includes detailed metadata about workloads and containers
+in addition to namespace information with the specified cluster name and datasource.
+
+
+Example Response
+
+### Example Response
+
+```json
+{
+ "datasources": {
+ "prometheus-1": {
+ "datasource_name": "prometheus-1",
+ "clusters": {
+ "default": {
+ "cluster_name": "default",
+ "namespaces": {
+ "default": {
+ "namespace": "default"
+ },
+ "cadvisor": {
+ "namespace": "cadvisor",
+ "workloads": {
+ "cadvisor": {
+ "workload_name": "cadvisor",
+ "workload_type": "daemonset",
+ "containers": {
+ "cadvisor": {
+ "container_name": "cadvisor",
+ "container_image_name": "gcr.io/cadvisor/cadvisor:v0.45.0"
+ }
+ }
+ }
+ }
+ },
+ "kube-node-lease": {
+ "namespace": "kube-node-lease"
+ },
+ "kube-system": {
+ "namespace": "kube-system",
+ "workloads": {
+ "coredns": {
+ "workload_name": "coredns",
+ "workload_type": "deployment",
+ "containers": {
+ "coredns": {
+ "container_name": "coredns",
+ "container_image_name": "k8s.gcr.io/coredns/coredns:v1.8.6"
+ }
+ }
+ },
+ "kube-proxy": {
+ "workload_name": "kube-proxy",
+ "workload_type": "daemonset",
+ "containers": {
+ "kube-proxy": {
+ "container_name": "kube-proxy",
+ "container_image_name": "k8s.gcr.io/kube-proxy:v1.24.3"
+ }
+ }
+ }
+ }
+ },
+ "monitoring": {
+ "namespace": "monitoring",
+ "workloads": {
+ "kube-state-metrics": {
+ "workload_name": "kube-state-metrics",
+ "workload_type": "deployment",
+ "containers": {
+ "kube-state-metrics": {
+ "container_name": "kube-state-metrics",
+ "container_image_name": "k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0"
+ },
+ "kube-rbac-proxy-self": {
+ "container_name": "kube-rbac-proxy-self",
+ "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0"
+ },
+ "kube-rbac-proxy-main": {
+ "container_name": "kube-rbac-proxy-main",
+ "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0"
+ }
+ }
+ },
+ "node-exporter": {
+ "workload_name": "node-exporter",
+ "workload_type": "daemonset",
+ "containers": {
+ "node-exporter": {
+ "container_name": "node-exporter",
+ "container_image_name": "quay.io/prometheus/node-exporter:v1.1.2"
+ },
+ "kube-rbac-proxy": {
+ "container_name": "kube-rbac-proxy",
+ "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0"
+ }
+ }
+ },
+ "postgres-deployment": {
+ "workload_name": "postgres-deployment",
+ "workload_type": "deployment",
+ "containers": {
+ "postgres": {
+ "container_name": "postgres",
+ "container_image_name": "quay.io/kruizehub/postgres:15.2"
+ }
+ }
+ },
+ "alertmanager-main": {
+ "workload_name": "alertmanager-main",
+ "workload_type": "statefulset",
+ "containers": {
+ "config-reloader": {
+ "container_name": "config-reloader",
+ "container_image_name": "quay.io/prometheus-operator/prometheus-config-reloader:v0.47.0"
+ },
+ "alertmanager": {
+ "container_name": "alertmanager",
+ "container_image_name": "quay.io/prometheus/alertmanager:v0.21.0"
+ }
+ }
+ },
+ "prometheus-adapter": {
+ "workload_name": "prometheus-adapter",
+ "workload_type": "deployment",
+ "containers": {
+ "prometheus-adapter": {
+ "container_name": "prometheus-adapter",
+ "container_image_name": "directxman12/k8s-prometheus-adapter:v0.8.4"
+ }
+ }
+ },
+ "kruize": {
+ "workload_name": "kruize",
+ "workload_type": "deployment",
+ "containers": {
+ "kruize": {
+ "container_name": "kruize",
+ "container_image_name": "quay.io/kruize/autotune_operator:0.0.21_mvp"
+ }
+ }
+ },
+ "grafana": {
+ "workload_name": "grafana",
+ "workload_type": "deployment",
+ "containers": {
+ "grafana": {
+ "container_name": "grafana",
+ "container_image_name": "grafana/grafana:7.5.4"
+ }
+ }
+ },
+ "prometheus-k8s": {
+ "workload_name": "prometheus-k8s",
+ "workload_type": "statefulset",
+ "containers": {
+ "config-reloader": {
+ "container_name": "config-reloader",
+ "container_image_name": "quay.io/prometheus-operator/prometheus-config-reloader:v0.47.0"
+ },
+ "prometheus": {
+ "container_name": "prometheus",
+ "container_image_name": "quay.io/prometheus/prometheus:v2.26.0"
+ }
+ }
+ },
+ "blackbox-exporter": {
+ "workload_name": "blackbox-exporter",
+ "workload_type": "deployment",
+ "containers": {
+ "kube-rbac-proxy": {
+ "container_name": "kube-rbac-proxy",
+ "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0"
+ },
+ "module-configmap-reloader": {
+ "container_name": "module-configmap-reloader",
+ "container_image_name": "jimmidyson/configmap-reload:v0.5.0"
+ },
+ "blackbox-exporter": {
+ "container_name": "blackbox-exporter",
+ "container_image_name": "quay.io/prometheus/blackbox-exporter:v0.18.0"
+ }
+ }
+ },
+ "prometheus-operator": {
+ "workload_name": "prometheus-operator",
+ "workload_type": "deployment",
+ "containers": {
+ "kube-rbac-proxy": {
+ "container_name": "kube-rbac-proxy",
+ "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0"
+ },
+ "prometheus-operator": {
+ "container_name": "prometheus-operator",
+ "container_image_name": "quay.io/prometheus-operator/prometheus-operator:v0.47.0"
+ }
+ }
+ }
+ }
+ },
+ "kube-public": {
+ "namespace": "kube-public"
+ }
+ }
+ }
+ }
+ }
+ }
+}
+```
+
+
+
+
+**Request with datasource name, cluster name and namespace parameters**
+
+`GET /dsmetadata`
+
+`curl -H 'Accept: application/json' "http://:/dsmetadata?datasource=&cluster_name=&namespace="`
+
+Returns the container-level metadata of the specified namespace, cluster name and datasource
+
+***Note : `verbose` in the query URL to fetch container-level metadata is set to `true` by default***
+
+**Response for datasource name - `prometheus-1`, cluster name - `default` and namespace - `monitoring`**
+
+The response includes granular metadata about workloads and associated containers within specified namespace, cluster
+and datasource.
+
+
+Example Response
+
+### Example Response
+
+```json
+{
+ "datasources": {
+ "prometheus-1": {
+ "datasource_name": "prometheus-1",
+ "clusters": {
+ "default": {
+ "cluster_name": "default",
+ "namespaces": {
+ "monitoring": {
+ "namespace": "monitoring",
+ "workloads": {
+ "kube-state-metrics": {
+ "workload_name": "kube-state-metrics",
+ "workload_type": "deployment",
+ "containers": {
+ "kube-state-metrics": {
+ "container_name": "kube-state-metrics",
+ "container_image_name": "k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0"
+ },
+ "kube-rbac-proxy-self": {
+ "container_name": "kube-rbac-proxy-self",
+ "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0"
+ },
+ "kube-rbac-proxy-main": {
+ "container_name": "kube-rbac-proxy-main",
+ "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0"
+ }
+ }
+ },
+ "node-exporter": {
+ "workload_name": "node-exporter",
+ "workload_type": "daemonset",
+ "containers": {
+ "node-exporter": {
+ "container_name": "node-exporter",
+ "container_image_name": "quay.io/prometheus/node-exporter:v1.1.2"
+ },
+ "kube-rbac-proxy": {
+ "container_name": "kube-rbac-proxy",
+ "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0"
+ }
+ }
+ },
+ "postgres-deployment": {
+ "workload_name": "postgres-deployment",
+ "workload_type": "deployment",
+ "containers": {
+ "postgres": {
+ "container_name": "postgres",
+ "container_image_name": "quay.io/kruizehub/postgres:15.2"
+ }
+ }
+ },
+ "alertmanager-main": {
+ "workload_name": "alertmanager-main",
+ "workload_type": "statefulset",
+ "containers": {
+ "config-reloader": {
+ "container_name": "config-reloader",
+ "container_image_name": "quay.io/prometheus-operator/prometheus-config-reloader:v0.47.0"
+ },
+ "alertmanager": {
+ "container_name": "alertmanager",
+ "container_image_name": "quay.io/prometheus/alertmanager:v0.21.0"
+ }
+ }
+ },
+ "prometheus-adapter": {
+ "workload_name": "prometheus-adapter",
+ "workload_type": "deployment",
+ "containers": {
+ "prometheus-adapter": {
+ "container_name": "prometheus-adapter",
+ "container_image_name": "directxman12/k8s-prometheus-adapter:v0.8.4"
+ }
+ }
+ },
+ "kruize": {
+ "workload_name": "kruize",
+ "workload_type": "deployment",
+ "containers": {
+ "kruize": {
+ "container_name": "kruize",
+ "container_image_name": "quay.io/kruize/autotune_operator:0.0.21_mvp"
+ }
+ }
+ },
+ "grafana": {
+ "workload_name": "grafana",
+ "workload_type": "deployment",
+ "containers": {
+ "grafana": {
+ "container_name": "grafana",
+ "container_image_name": "grafana/grafana:7.5.4"
+ }
+ }
+ },
+ "prometheus-k8s": {
+ "workload_name": "prometheus-k8s",
+ "workload_type": "statefulset",
+ "containers": {
+ "config-reloader": {
+ "container_name": "config-reloader",
+ "container_image_name": "quay.io/prometheus-operator/prometheus-config-reloader:v0.47.0"
+ },
+ "prometheus": {
+ "container_name": "prometheus",
+ "container_image_name": "quay.io/prometheus/prometheus:v2.26.0"
+ }
+ }
+ },
+ "blackbox-exporter": {
+ "workload_name": "blackbox-exporter",
+ "workload_type": "deployment",
+ "containers": {
+ "kube-rbac-proxy": {
+ "container_name": "kube-rbac-proxy",
+ "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0"
+ },
+ "module-configmap-reloader": {
+ "container_name": "module-configmap-reloader",
+ "container_image_name": "jimmidyson/configmap-reload:v0.5.0"
+ },
+ "blackbox-exporter": {
+ "container_name": "blackbox-exporter",
+ "container_image_name": "quay.io/prometheus/blackbox-exporter:v0.18.0"
+ }
+ }
+ },
+ "prometheus-operator": {
+ "workload_name": "prometheus-operator",
+ "workload_type": "deployment",
+ "containers": {
+ "kube-rbac-proxy": {
+ "container_name": "kube-rbac-proxy",
+ "container_image_name": "quay.io/brancz/kube-rbac-proxy:v0.8.0"
+ },
+ "prometheus-operator": {
+ "container_name": "prometheus-operator",
+ "container_image_name": "quay.io/prometheus-operator/prometheus-operator:v0.47.0"
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+}
+```
+
+
+
+
+
+
+
+### Delete Metadata API
+
+This is quick guide instructions to delete metadata using input JSON as follows.
+
+**Request**
+`DELETE /dsmetadata`
+
+`curl -H 'Accept: application/json' -X DELETE --data 'copy paste below JSON' http://:/dsmetadata`
+
+
+
+Example Request
+
+### Example Request
+
+```json
+{
+ "version": "v1.0",
+ "datasource_name": "prometheus-1"
+}
+```
+
+
+
+
+**Response**
+
+
+Example Response
+
+### Example Response
+
+```json
+{
+ "message": "Datasource metadata deleted successfully. View imported metadata at GET /dsmetadata",
+ "httpcode": 201,
+ "documentationLink": "",
+ "status": "SUCCESS"
+}
+```
+
+
+
+
+
+
+
+### Create Experiment API
+
+This is quick guide instructions to create experiments using input JSON as follows. For a more detailed guide,
+see [Create Experiment](/design/CreateExperiment.md)
+
+**Request**
+`POST /createExperiment`
+
+`curl -H 'Accept: application/json' -X POST --data 'copy paste below JSON' http://:/createExperiment`
+
+
+
+Example Request for datasource - `prometheus-1`
+
+### Example Request
+
+```json
+[
+ {
+ "version": "v2.0",
+ "experiment_name": "default|default|deployment|tfb-qrh-deployment",
+ "cluster_name": "default",
+ "performance_profile": "resource-optimization-openshift",
+ "mode": "monitor",
+ "target_cluster": "local",
+ "kubernetes_objects": [
+ {
+ "type": "deployment",
+ "name": "tfb-qrh-deployment",
+ "namespace": "default",
+ "containers": [
+ {
+ "container_image_name": "kruize/tfb-db:1.15",
+ "container_name": "tfb-server-0"
+ },
+ {
+ "container_image_name": "kruize/tfb-qrh:1.13.2.F_et17",
+ "container_name": "tfb-server-1"
+ }
+ ]
+ }
+ ],
+ "trial_settings": {
+ "measurement_duration": "15min"
+ },
+ "recommendation_settings": {
+ "threshold": "0.1"
+ },
+ "datasource": "prometheus-1"
+ }
+]
+```
+
+
+
+
+**Response**
+
+
+Example Response
+
+### Example Response
+
+```json
+{
+ "message": "Experiment registered successfully with Autotune. View registered experiments at /listExperiments",
+ "httpcode": 201,
+ "documentationLink": "",
+ "status": "SUCCESS"
+}
+```
+
+
+
+
+
+### List Experiments API
+
+**Request with experiment name parameter**
+
+`GET /listExperiments`
+
+`curl -H 'Accept: application/json' http://:/listExperiments?experiment_name=`
+
+Returns the experiment details of the specified experiment
+
+
+**Request with recommendations set to true**
+
+`GET /listExperiments`
+
+`curl -H 'Accept: application/json' http://:/listExperiments?recommendations=true`
+
+Returns the latest recommendations of all the experiments
+
+**Response for experiment name - `default|default_0|deployment|tfb-qrh-deployment_0`**
+
+
+Example Response
+
+### Example Response
+
+```json
+[
+ {
+ "version": "v2.0",
+ "experiment_id": "f0007796e65c999d843bebd447c2fbaa6aaf9127c614da55e333cd6bdb628a74",
+ "experiment_name": "default|default_0|deployment|tfb-qrh-deployment_0",
+ "cluster_name": "default",
+ "datasource": "prometheus-1",
+ "mode": "monitor",
+ "target_cluster": "local",
+ "status": "IN_PROGRESS",
+ "performance_profile": "resource-optimization-openshift",
+ "trial_settings": {
+ "measurement_duration": "15min"
+ },
+ "recommendation_settings": {
+ "threshold": "0.1"
+ },
+ "experiment_usecase_type": {
+ "remote_monitoring": false,
+ "local_monitoring": true,
+ "local_experiment": false
+ },
+ "validation_data": {
+ "success": true,
+ "message": "Registered successfully with Kruize! View registered experiments at /listExperiments",
+ "errorCode": 201
+ },
+ "kubernetes_objects": [
+ {
+ "type": "deployment",
+ "name": "tfb-qrh-deployment_0",
+ "namespace": "default_0",
+ "containers": {
+ "tfb-server-1": {
+ "container_image_name": "kruize/tfb-qrh:1.13.2.F_et17",
+ "container_name": "tfb-server-1",
+ "recommendations": {
+ "version": "1.0",
+ "notifications": {
+ "112101": {
+ "type": "info",
+ "message": "Cost Recommendations Available",
+ "code": 112101
+ }
+ },
+ "data": {
+ "2023-04-02T08:00:00.680Z": {
+ "cost": {
+ "short_term": {
+ "monitoring_start_time": "2023-04-01T06:45:00.000Z",
+ "monitoring_end_time": "2023-04-02T08:00:00.680Z",
+ "duration_in_hours": 24.0,
+ "pods_count": 27,
+ "confidence_level": 0.0,
+ "current": {
+ "requests": {
+ "memory": {
+ "amount": 490.93,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 1.46,
+ "format": "cores"
+ }
+ },
+ "limits": {
+ "memory": {
+ "amount": 712.21,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 1.54,
+ "format": "cores"
+ }
+ }
+ },
+ "config": {
+ "requests": {
+ "memory": {
+ "amount": 1197.9840000000002,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 7.68,
+ "format": "cores"
+ }
+ },
+ "limits": {
+ "memory": {
+ "amount": 1197.9840000000002,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 7.68,
+ "format": "cores"
+ }
+ }
+ },
+ "variation": {
+ "requests": {
+ "memory": {
+ "amount": 707.0540000000001,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 6.22,
+ "format": "cores"
+ }
+ },
+ "limits": {
+ "memory": {
+ "amount": 485.7740000000001,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 6.14,
+ "format": "cores"
+ }
+ }
+ },
+ "notifications": {}
+ },
+ "medium_term": {
+ "pods_count": 0,
+ "confidence_level": 0.0,
+ "notifications": {
+ "120001": {
+ "type": "info",
+ "message": "There is not enough data available to generate a recommendation.",
+ "code": 120001
+ }
+ }
+ },
+ "long_term": {
+ "pods_count": 0,
+ "confidence_level": 0.0,
+ "notifications": {
+ "120001": {
+ "type": "info",
+ "message": "There is not enough data available to generate a recommendation.",
+ "code": 120001
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ },
+ "tfb-server-0": {
+ "container_image_name": "kruize/tfb-db:1.15",
+ "container_name": "tfb-server-0",
+ "recommendations": {
+ "version": "1.0",
+ "notifications": {
+ "120001": {
+ "type": "info",
+ "message": "There is not enough data available to generate a recommendation.",
+ "code": 120001
+ }
+ },
+ "data": {}
+ }
+ }
+ }
+ }
+ ]
+ },
+ ...
+ ...
+ ...
+ {
+ "version": "v2.0",
+ "experiment_id": "ab0a31a522cebdde52561482300d078ed1448fa7b75834fa216677d1d9d5cda6",
+ "experiment_name": "default|default_1|deployment|tfb-qrh-deployment_1",
+ "cluster_name": "default",
+ "datasource": "prometheus-1",
+ "mode": "monitor",
+ "target_cluster": "local",
+ "status": "IN_PROGRESS",
+ "performance_profile": "resource-optimization-openshift",
+ "trial_settings": {
+ "measurement_duration": "15min"
+ },
+ "recommendation_settings": {
+ "threshold": "0.1"
+ },
+ "experiment_usecase_type": {
+ "remote_monitoring": false,
+ "local_monitoring": true,
+ "local_experiment": false
+ },
+ "validation_data": {
+ "success": true,
+ "message": "Registered successfully with Kruize! View registered experiments at /listExperiments",
+ "errorCode": 201
+ },
+ "kubernetes_objects": [
+ {
+ "type": "deployment",
+ "name": "tfb-qrh-deployment_1",
+ "namespace": "default_1",
+ "containers": {
+ "tfb-server-1": {
+ "container_image_name": "kruize/tfb-qrh:1.13.2.F_et17",
+ "container_name": "tfb-server-1",
+ "recommendations": {
+ "version": "1.0",
+ "notifications": {
+ "120001": {
+ "type": "info",
+ "message": "There is not enough data available to generate a recommendation.",
+ "code": 120001
+ }
+ },
+ "data": {}
+ }
+ },
+ "tfb-server-0": {
+ "container_image_name": "kruize/tfb-db:1.15",
+ "container_name": "tfb-server-0",
+ "recommendations": {
+ "version": "1.0",
+ "notifications": {
+ "120001": {
+ "type": "info",
+ "message": "There is not enough data available to generate a recommendation.",
+ "code": 120001
+ }
+ },
+ "data": {}
+ }
+ }
+ }
+ }
+ ]
+ }
+]
+```
+
+
+
+
+
+**Request with recommendations set to true with experiment name parameter**
+
+`GET /listExperiments`
+
+`curl -H 'Accept: application/json' http://:/listExperiments?recommendations=true&experiment_name=`
+
+Returns the latest recommendations of the specified experiment with no results
+
+
+**Request with recommendations set to true and latest set to false**
+
+`GET /listExperiments`
+
+`curl -H 'Accept: application/json' http://:/listExperiments?recommendations=true&latest=false`
+
+Returns all the recommendations of all the experiments
+
+**Response for experiment name - `default|default_0|deployment|tfb-qrh-deployment_0`**
+
+
+Example Response
+
+### Example Response
+
+```json
+[
+ {
+ "version": "v2.0",
+ "experiment_id": "f0007796e65c999d843bebd447c2fbaa6aaf9127c614da55e333cd6bdb628a74",
+ "experiment_name": "default|default_0|deployment|tfb-qrh-deployment_0",
+ "cluster_name": "default",
+ "datasource": "prometheus-1",
+ "mode": "monitor",
+ "target_cluster": "local",
+ "status": "IN_PROGRESS",
+ "performance_profile": "resource-optimization-openshift",
+ "trial_settings": {
+ "measurement_duration": "15min"
+ },
+ "recommendation_settings": {
+ "threshold": "0.1"
+ },
+ "experiment_usecase_type": {
+ "remote_monitoring": false,
+ "local_monitoring": true,
+ "local_experiment": false
+ },
+ "validation_data": {
+ "success": true,
+ "message": "Registered successfully with Kruize! View registered experiments at /listExperiments",
+ "errorCode": 201
+ },
+ "kubernetes_objects": [
+ {
+ "type": "deployment",
+ "name": "tfb-qrh-deployment_0",
+ "namespace": "default_0",
+ "containers": {
+ "tfb-server-1": {
+ "container_image_name": "kruize/tfb-qrh:1.13.2.F_et17",
+ "container_name": "tfb-server-1",
+ "recommendations": {
+ "version": "1.0",
+ "notifications": {
+ "112101": {
+ "type": "info",
+ "message": "Cost Recommendations Available",
+ "code": 112101
+ }
+ },
+ "data": {
+ "2023-04-02T06:00:00.770Z": {
+ "cost": {
+ "short_term": {
+ "monitoring_start_time": "2023-04-01T04:45:00.000Z",
+ "monitoring_end_time": "2023-04-02T06:00:00.770Z",
+ "duration_in_hours": 24,
+ "pods_count": 27,
+ "confidence_level": 0,
+ "current": {
+ "requests": {
+ "memory": {
+ "amount": 490.93,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 1.46,
+ "format": "cores"
+ }
+ },
+ "limits": {
+ "memory": {
+ "amount": 712.21,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 1.54,
+ "format": "cores"
+ }
+ }
+ },
+ "config": {
+ "requests": {
+ "memory": {
+ "amount": 1197.9840000000002,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 7.68,
+ "format": "cores"
+ }
+ },
+ "limits": {
+ "memory": {
+ "amount": 1197.9840000000002,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 7.68,
+ "format": "cores"
+ }
+ }
+ },
+ "variation": {
+ "requests": {
+ "memory": {
+ "amount": 707.0540000000001,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 6.22,
+ "format": "cores"
+ }
+ },
+ "limits": {
+ "memory": {
+ "amount": 485.7740000000001,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 6.14,
+ "format": "cores"
+ }
+ }
+ },
+ "notifications": {}
+ },
+ "medium_term": {
+ "pods_count": 0,
+ "confidence_level": 0,
+ "notifications": {
+ "120001": {
+ "type": "info",
+ "message": "There is not enough data available to generate a recommendation.",
+ "code": 120001
+ }
+ }
+ },
+ "long_term": {
+ "pods_count": 0,
+ "confidence_level": 0,
+ "notifications": {
+ "120001": {
+ "type": "info",
+ "message": "There is not enough data available to generate a recommendation.",
+ "code": 120001
+ }
+ }
+ }
+ }
+ },
+ ...
+ ...
+ ...
+ "2023-04-02T04:30:00.000Z": {
+ "cost": {
+ "short_term": {
+ "monitoring_start_time": "2023-04-01T03:15:00.000Z",
+ "monitoring_end_time": "2023-04-02T04:30:00.000Z",
+ "duration_in_hours": 24,
+ "pods_count": 27,
+ "confidence_level": 0,
+ "current": {
+ "requests": {
+ "memory": {
+ "amount": 490.93,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 1.46,
+ "format": "cores"
+ }
+ },
+ "limits": {
+ "memory": {
+ "amount": 712.21,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 1.54,
+ "format": "cores"
+ }
+ }
+ },
+ "config": {
+ "requests": {
+ "memory": {
+ "amount": 1197.9840000000002,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 7.68,
+ "format": "cores"
+ }
+ },
+ "limits": {
+ "memory": {
+ "amount": 1197.9840000000002,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 7.68,
+ "format": "cores"
+ }
+ }
+ },
+ "variation": {
+ "requests": {
+ "memory": {
+ "amount": 707.0540000000001,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 6.22,
+ "format": "cores"
+ }
+ },
+ "limits": {
+ "memory": {
+ "amount": 485.7740000000001,
+ "format": "MiB"
+ },
+ "cpu": {
+ "amount": 6.14,
+ "format": "cores"
+ }
+ }
+ },
+ "notifications": {}
+ },
+ "medium_term": {
+ "pods_count": 0,
+ "confidence_level": 0,
+ "notifications": {
+ "120001": {
+ "type": "info",
+ "message": "There is not enough data available to generate a recommendation.",
+ "code": 120001
+ }
+ }
+ },
+ "long_term": {
+ "pods_count": 0,
+ "confidence_level": 0,
+ "notifications": {
+ "120001": {
+ "type": "info",
+ "message": "There is not enough data available to generate a recommendation.",
+ "code": 120001
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ },
+ "tfb-server-0": {
+ "container_image_name": "kruize/tfb-db:1.15",
+ "container_name": "tfb-server-0",
+ "recommendations": {
+ "version": "1.0",
+ "notifications": {
+ "120001": {
+ "type": "info",
+ "message": "There is not enough data available to generate a recommendation.",
+ "code": 120001
+ }
+ },
+ "data": {}
+ }
+ }
+ }
+ }
+ ]
+ },
+ ...
+ ...
+ ...
+]
+```
+
+
+
+
+
+**Request with recommendations set to true, latest set to false and with experiment name parameter**
+
+`GET /listExperiments`
+
+`curl -H 'Accept: application/json' http://:/listExperiments?recommendations=true&latest=false&experiment_name=`
+
+Returns all the recommendations of the specified experiment
+
+
+**List Experiments also allows the user to send a request body to fetch the records based on `cluster_name` and `kubernetes_object`.**
+
+*Note: This request body can be sent along with other query params which are mentioned above.*
+
+`curl -H 'Accept: application/json' -X GET --data 'copy paste below JSON' http://:/listExperiments`
+
+
+
+Example Request
+
+### Example Request
+
+```json
+{
+ "cluster_name": "default",
+ "kubernetes_objects": [
+ {
+ "type": "deployment",
+ "name": "tfb-qrh-deployment",
+ "namespace": "default",
+ "containers": [
+ {
+ "container_image_name": "kruize/tfb-db:1.15",
+ "container_name": "tfb-server-1"
+ }
+ ]
+ }
+ ]
+}
+```
+
+
+
+---
+
+
+### Generate Recommendations API
+
+**Note: This API is specific to the Local Monitoring use case.**
+Generates the recommendation for a specific experiment based on provided parameters similar to update recommendations API.
+This can be called directly after creating the experiment and doesn't require the update results API as metrics are
+fetched from the provided `datasource` (E.g. Prometheus) instead of the database.
+
+**Request Parameters**
+
+| Parameter | Type | Required | Description |
+|---------------------|--------|----------|--------------------------------------------------------------------------------------------------------------------------------------------|
+| experiment_name | string | Yes | The name of the experiment. |
+| interval_end_time | string | optional | The end time of the interval in the format `yyyy-MM-ddTHH:mm:sssZ`. This should be the date on which recommendation needs to be generated. |
+| interval_start_time | string | optional | The start time of the interval in the format `yyyy-MM-ddTHH:mm:sssZ`. |
+
+The recommendation API requires only one mandatory field i.e. `experiment_name`. Other optional parameter like `interval_end_time` will be fetched from the provided datasource.
+Similarly, `interval_start_time` will be calculated based on `interval_end_time`, if not provided. By utilizing
+these parameters, the API generates recommendations based on short-term, medium-term, and long-term factors. For
+instance, if the long-term setting is configured for `15 days` and the interval_end_time is set to `Jan 15 2023 00:00:
+00.000Z`, the API retrieves data from the past 15 days, starting from January 1st. Using this data, the API generates
+three recommendations for `Jan 15th 2023`.
+
+It is important to ensure that the difference between `interval_end_time` and `interval_start_time` should not exceed 15
+days. This restriction is in place to prevent potential timeouts, as generating recommendations beyond this threshold
+might require more time.
+
+**Request with experiment name and interval_end_time parameters**
+
+`POST /generateRecommendations?experiment_name=?&interval_end_time=?`
+
+`POST /generateRecommendations?experiment_name=?&interval_end_time=?&interval_start_time=?`
+
+example
+
+`curl --location --request POST 'http://:/generateRecommendations?interval_end_time=2023-01-02T00:15:00.000Z&experiment_name=temp_1'`
+
+success status code : 201
+
+**Response**
+
+The response will contain a array of JSON object with the recommendations for the specified experiment.
+
+
+Example Response Body
+
+```json
+[
+ {
+ "cluster_name": "default",
+ "kubernetes_objects": [
+ {
+ "type": "deployment",
+ "name": "tfb-qrh-deployment_5",
+ "namespace": "default_5",
+ "containers": [
+ {
+ "container_image_name": "kruize/tfb-qrh:1.13.2.F_et17",
+ "container_name": "tfb-server-1",
+ "recommendations": {
+ "version": "1.0",
+ "notifications": {
+ "111000": {
+ "type": "info",
+ "message": "Recommendations Are Available",
+ "code": 111000
+ }
+ },
+ "data": {
+ "2023-04-02T13:30:00.680Z": {
+ "notifications": {
+ "111101": {
+ "type": "info",
+ "message": "Short Term Recommendations Available",
+ "code": 111101
+ }
+ },
+ "monitoring_end_time": "2023-04-02T13:30:00.680Z",
+ "current": {
+ "limits": {
+ "memory": {
+ "amount": 1.048576E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": 0.5,
+ "format": "cores"
+ }
+ },
+ "requests": {
+ "memory": {
+ "amount": 5.264900096E7,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": 5.37,
+ "format": "cores"
+ }
+ }
+ },
+ "recommendation_terms": {
+ "short_term": {
+ "duration_in_hours": 24.0,
+ "notifications": {
+ "112101": {
+ "type": "info",
+ "message": "Cost Recommendations Available",
+ "code": 112101
+ },
+ "112102": {
+ "type": "info",
+ "message": "Performance Recommendations Available",
+ "code": 112102
+ }
+ },
+ "monitoring_start_time": "2023-04-01T12:00:00.000Z",
+ "recommendation_engines": {
+ "cost": {
+ "pods_count": 7,
+ "confidence_level": 0.0,
+ "config": {
+ "limits": {
+ "memory": {
+ "amount": 2.497708032E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": 0.9299999999999999,
+ "format": "cores"
+ }
+ },
+ "requests": {
+ "memory": {
+ "amount": 2.497708032E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": 0.9299999999999999,
+ "format": "cores"
+ }
+ }
+ },
+ "variation": {
+ "limits": {
+ "memory": {
+ "amount": 1.449132032E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": -4.44,
+ "format": "cores"
+ }
+ },
+ "requests": {
+ "memory": {
+ "amount": 1.9712180223999997902848E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": -4.44,
+ "format": "cores"
+ }
+ }
+ },
+ "notifications": {}
+ },
+ "performance": {
+ "pods_count": 27,
+ "confidence_level": 0.0,
+ "config": {
+ "limits": {
+ "memory": {
+ "amount": 2.497708032E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": 0.9299999999999999,
+ "format": "cores"
+ }
+ },
+ "requests": {
+ "memory": {
+ "amount": 2.497708032E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": 0.9299999999999999,
+ "format": "cores"
+ }
+ }
+ },
+ "variation": {
+ "limits": {
+ "memory": {
+ "amount": 1.449132032E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": -4.44,
+ "format": "cores"
+ }
+ },
+ "requests": {
+ "memory": {
+ "amount": 1.9712180223999997902848E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": -4.44,
+ "format": "cores"
+ }
+ }
+ },
+ "notifications": {}
+ }
+ }
+ },
+ "medium_term": {
+ "duration_in_hours": 33.8,
+ "notifications": {
+ "120001": {
+ "type": "info",
+ "message": "There is not enough data available to generate a recommendation.",
+ "code": 120001
+ }
+ }
+ },
+ "long_term": {
+ "duration_in_hours": 33.8,
+ "notifications": {
+ "120001": {
+ "type": "info",
+ "message": "There is not enough data available to generate a recommendation.",
+ "code": 120001
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ },
+ {
+ "container_image_name": "kruize/tfb-db:1.15",
+ "container_name": "tfb-server-0",
+ "recommendations": {
+ "version": "1.0",
+ "notifications": {
+ "120001": {
+ "type": "info",
+ "message": "There is not enough data available to generate a recommendation.",
+ "code": 120001
+ }
+ },
+ "data": {}
+ }
+ }
+ ]
+ }
+ ],
+ "version": "v2.0",
+ "experiment_name": "temp_1"
+ }
+]
+```
+
+
+
+**Request without interval_end_time parameter**
+
+`POST /generateRecommendations?experiment_name=?`
+
+example
+
+`curl --location --request POST 'http://:/generateRecommendations?experiment_name=temp_1'`
+
+success status code : 201
+
+**Response**
+
+The response will contain an array of JSON object with the recommendations for the specified experiment.
+
+When `interval_end_time` is not specified, Kruize will determine the latest timestamp from the specified datasource
+(E.g. Prometheus) by checking the latest active container CPU usage.
+
+
+Example Response Body
+
+```json
+[
+ {
+ "cluster_name": "default",
+ "kubernetes_objects": [
+ {
+ "type": "deployment",
+ "name": "tfb-qrh-deployment_5",
+ "namespace": "default_5",
+ "containers": [
+ {
+ "container_image_name": "kruize/tfb-qrh:1.13.2.F_et17",
+ "container_name": "tfb-server-1",
+ "recommendations": {
+ "version": "1.0",
+ "notifications": {
+ "111000": {
+ "type": "info",
+ "message": "Recommendations Are Available",
+ "code": 111000
+ }
+ },
+ "data": {
+ "2023-04-02T13:30:00.680Z": {
+ "notifications": {
+ "111101": {
+ "type": "info",
+ "message": "Short Term Recommendations Available",
+ "code": 111101
+ }
+ },
+ "monitoring_end_time": "2023-04-02T13:30:00.680Z",
+ "current": {
+ "limits": {
+ "memory": {
+ "amount": 1.048576E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": 0.5,
+ "format": "cores"
+ }
+ },
+ "requests": {
+ "memory": {
+ "amount": 5.264900096E7,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": 5.37,
+ "format": "cores"
+ }
+ }
+ },
+ "recommendation_terms": {
+ "short_term": {
+ "duration_in_hours": 24.0,
+ "notifications": {
+ "112101": {
+ "type": "info",
+ "message": "Cost Recommendations Available",
+ "code": 112101
+ },
+ "112102": {
+ "type": "info",
+ "message": "Performance Recommendations Available",
+ "code": 112102
+ }
+ },
+ "monitoring_start_time": "2023-04-01T12:00:00.000Z",
+ "recommendation_engines": {
+ "cost": {
+ "pods_count": 7,
+ "confidence_level": 0.0,
+ "config": {
+ "limits": {
+ "memory": {
+ "amount": 2.497708032E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": 0.9299999999999999,
+ "format": "cores"
+ }
+ },
+ "requests": {
+ "memory": {
+ "amount": 2.497708032E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": 0.9299999999999999,
+ "format": "cores"
+ }
+ }
+ },
+ "variation": {
+ "limits": {
+ "memory": {
+ "amount": 1.449132032E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": -4.44,
+ "format": "cores"
+ }
+ },
+ "requests": {
+ "memory": {
+ "amount": 1.9712180223999997902848E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": -4.44,
+ "format": "cores"
+ }
+ }
+ },
+ "notifications": {}
+ },
+ "performance": {
+ "pods_count": 27,
+ "confidence_level": 0.0,
+ "config": {
+ "limits": {
+ "memory": {
+ "amount": 2.497708032E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": 0.9299999999999999,
+ "format": "cores"
+ }
+ },
+ "requests": {
+ "memory": {
+ "amount": 2.497708032E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": 0.9299999999999999,
+ "format": "cores"
+ }
+ }
+ },
+ "variation": {
+ "limits": {
+ "memory": {
+ "amount": 1.449132032E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": -4.44,
+ "format": "cores"
+ }
+ },
+ "requests": {
+ "memory": {
+ "amount": 1.9712180223999997902848E8,
+ "format": "bytes"
+ },
+ "cpu": {
+ "amount": -4.44,
+ "format": "cores"
+ }
+ }
+ },
+ "notifications": {}
+ }
+ }
+ },
+ "medium_term": {
+ "duration_in_hours": 33.8,
+ "notifications": {
+ "120001": {
+ "type": "info",
+ "message": "There is not enough data available to generate a recommendation.",
+ "code": 120001
+ }
+ }
+ },
+ "long_term": {
+ "duration_in_hours": 33.8,
+ "notifications": {
+ "120001": {
+ "type": "info",
+ "message": "There is not enough data available to generate a recommendation.",
+ "code": 120001
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ },
+ {
+ "container_image_name": "kruize/tfb-db:1.15",
+ "container_name": "tfb-server-0",
+ "recommendations": {
+ "version": "1.0",
+ "notifications": {
+ "120001": {
+ "type": "info",
+ "message": "There is not enough data available to generate a recommendation.",
+ "code": 120001
+ }
+ },
+ "data": {}
+ }
+ }
+ ]
+ }
+ ],
+ "version": "v2.0",
+ "experiment_name": "temp_1"
+ }
+]
+```
+
+
+
+
+**Error Responses**
+
+| HTTP Status Code | Description |
+|------------------|----------------------------------------------------------------------------------------------------|
+| 400 | experiment_name is mandatory. |
+| 400 | Given timestamp - \" 2023-011-02T00:00:00.000Z \" is not a valid timestamp format. |
+| 400 | Not Found: experiment_name does not exist: exp_1. |
+| 400 | No metrics available from `2024-01-15T00:00:00.000Z` to `2023-12-31T00:00:00.000Z`. |
+| 400 | The gap between the interval_start_time and interval_end_time must be within a maximum of 15 days! |
+| 400 | The Start time should precede the End time! | |
+| 500 | Internal Server Error |
+
diff --git a/design/KruizePromQL.md b/design/KruizePromQL.md
index 3c708659d..54ad31118 100644
--- a/design/KruizePromQL.md
+++ b/design/KruizePromQL.md
@@ -1,6 +1,7 @@
# Custom Prometheus Queries for Kruize
-These are the custom Prometheus queries that you can use while running Kruize. These queries provide valuable insights into the performance of Kruize APIs and KruizeDB methods.
+These are the custom Prometheus queries that you can use while running Kruize. These queries provide valuable insights
+into the performance of Kruize APIs and KruizeDB methods.
## KruizeAPI Metrics
@@ -16,24 +17,34 @@ The following are the available Kruize APIs that you can monitor:
To monitor the performance of these APIs, you can use the following metrics:
-- `kruizeAPI_count`: This metric provides the count of invocations for a specific API. It measures how many times the API has been called.
-- `kruizeAPI_sum`: This metric provides the sum of the time taken by a specific API. It measures the total time consumed by the API across all invocations.
-- `kruizeAPI_max`: This metric provides the maximum time taken by a specific API. It measures the highest execution time observed for the API.
+- `kruizeAPI_count`: This metric provides the count of invocations for a specific API. It measures how many times the
+ API has been called.
+- `kruizeAPI_sum`: This metric provides the sum of the time taken by a specific API. It measures the total time consumed
+ by the API across all invocations.
+- `kruizeAPI_max`: This metric provides the maximum time taken by a specific API. It measures the highest execution time
+ observed for the API.
Here are some sample metrics for the mentioned APIs which can run in Prometheus:
-- `kruizeAPI_count{api="createExperiment", application="Kruize", method="POST", status="success"}`: Returns the count of successful invocations for the `createExperiment` API.
-- `kruizeAPI_count{api="createExperiment", application="Kruize", method="POST", status="failure"}`: Returns the count of failed invocations for the `createExperiment` API.
-- `kruizeAPI_sum{api="createExperiment", application="Kruize", method="POST", status="success"}`: Returns the sum of the time taken by the successful invocations of `createExperiment` API.
-- `kruizeAPI_max{api="createExperiment", application="Kruize", method="POST", status="success"}`: Returns the maximum time taken by the successful invocation of `createExperiment` API.
+- `kruizeAPI_count{api="createExperiment", application="Kruize", method="POST", status="success"}`: Returns the count of
+ successful invocations for the `createExperiment` API.
+- `kruizeAPI_count{api="createExperiment", application="Kruize", method="POST", status="failure"}`: Returns the count of
+ failed invocations for the `createExperiment` API.
+- `kruizeAPI_sum{api="createExperiment", application="Kruize", method="POST", status="success"}`: Returns the sum of the
+ time taken by the successful invocations of `createExperiment` API.
+- `kruizeAPI_max{api="createExperiment", application="Kruize", method="POST", status="success"}`: Returns the maximum
+ time taken by the successful invocation of `createExperiment` API.
-By changing the value of the `api` and `method` label, you can gather metrics for other Kruize APIs such as `listRecommendations`, `listExperiments`, and `updateResults`.
+By changing the value of the `api` and `method` label, you can gather metrics for other Kruize APIs such
+as `listRecommendations`, `listExperiments`, and `updateResults`.
Here is a sample command to collect the metric through `curl`
-- `curl --silent -G -kH "Authorization: Bearer ${TOKEN}" --data-urlencode 'query=kruizeAPI_sum{api="listRecommendations", application="Kruize", method="GET", status="success"}' ${PROMETHEUS_URL} | jq` :
-Returns the sum of the time taken by `listRecommendations` API.
-
+
+- `curl --silent -G -kH "Authorization: Bearer ${TOKEN}" --data-urlencode 'query=kruizeAPI_sum{api="listRecommendations", application="Kruize", method="GET", status="success"}' ${PROMETHEUS_URL} | jq` :
+ Returns the sum of the time taken by `listRecommendations` API.
+
Sample Output:
+
```
{
"status": "success",
@@ -72,7 +83,8 @@ The following are the available Kruize DB methods that you can monitor:
- `addExperimentToDB`: Method for adding an experiment to the database.
- `addResultToDB`: Method for adding experiment results to the database.
-- `addBulkResultsToDBAndFetchFailedResults`: Method for adding bulk experiment results to the database and fetch the failed results.
+- `addBulkResultsToDBAndFetchFailedResults`: Method for adding bulk experiment results to the database and fetch the
+ failed results.
- `addRecommendationToDB`: Method for adding a recommendation to the database.
- `loadExperimentByName`: Method for loading an experiment by name.
- `loadResultsByExperimentName`: Method for loading experiment results by experiment name.
@@ -82,28 +94,51 @@ The following are the available Kruize DB methods that you can monitor:
- `loadPerformanceProfileByName`: Method to load a specific performance profile.
- `loadAllPerformanceProfiles`: Method to load all performance profiles.
+## KruizeMethod Metrics
+
+The following are the available Kruize methods that you can monitor:
+
+- `generatePlots`: Method to generate box plot metrics for all terms.
+
+Sample Output:
+
+```
+KruizeMethod_max{application="Kruize",method="generatePlots",status="success",} 0.036112854
+KruizeMethod_count{application="Kruize",method="generatePlots",status="success",} 2.0
+KruizeMethod_sum{application="Kruize",method="generatePlots",status="success",} 0.050705769
+```
+
## Time taken for KruizeDB metrics
To monitor the performance of these methods, you can use the following metrics:
-- `kruizeDB_count`: This metric provides the count of calls made to the specific DB method. It measures how many times the DB method has been called.
-- `kruizeDB_sum`: This metric provides the sum of the time taken by a specific DB method. It measures the total time consumed by the DB method across all invocations.
-- `kruizeDB_max`: This metric provides the maximum time taken by a specific DB method. It measures the highest execution time observed for the DB method.
+- `kruizeDB_count`: This metric provides the count of calls made to the specific DB method. It measures how many times
+ the DB method has been called.
+- `kruizeDB_sum`: This metric provides the sum of the time taken by a specific DB method. It measures the total time
+ consumed by the DB method across all invocations.
+- `kruizeDB_max`: This metric provides the maximum time taken by a specific DB method. It measures the highest execution
+ time observed for the DB method.
Here are some sample metrics for the mentioned DB methods which can run in Prometheus:
-- `kruizeDB_count{application="Kruize", method="addExperimentToDB", status="success"}`: Number of successful invocations of `addExperimentToDB` method.
-- `kruizeDB_count{application="Kruize", method="addExperimentToDB", status="failure"}`: Number of failed invocations of `addExperimentToDB` method.
-- `kruizeDB_sum{application="Kruize", method="addExperimentToDB", status="success"}`: Total time taken by the `addExperimentToDB` method which were success.
-- `kruizeDB_max{application="Kruize", method="addExperimentToDB", status="success"}`: Maximum time taken by the `addExperimentToDB` method which were success.
+- `kruizeDB_count{application="Kruize", method="addExperimentToDB", status="success"}`: Number of successful invocations
+ of `addExperimentToDB` method.
+- `kruizeDB_count{application="Kruize", method="addExperimentToDB", status="failure"}`: Number of failed invocations
+ of `addExperimentToDB` method.
+- `kruizeDB_sum{application="Kruize", method="addExperimentToDB", status="success"}`: Total time taken by
+ the `addExperimentToDB` method which were success.
+- `kruizeDB_max{application="Kruize", method="addExperimentToDB", status="success"}`: Maximum time taken by
+ the `addExperimentToDB` method which were success.
By changing the value of the `method` label, you can gather metrics for other KruizeDB metrics.
Here is a sample command to collect the metric through `curl`
+
- `curl --silent -G -kH "Authorization: Bearer ${TOKEN}" --data-urlencode 'query=kruizeDB_sum{application="Kruize", method="loadRecommendationsByExperimentName", status="success"}' ${PROMETHEUS_URL} | jq` :
Returns the sum of the time taken by `loadRecommendationsByExperimentName` method.
Sample Output:
+
```
{
"status": "success",
@@ -139,15 +174,20 @@ Sample Output:
# Kruize Metrics Collection and Analysis
-To facilitate the performance analysis of the Kruize application, we provide a comprehensive script, [kruize_metrics.py](../scripts/kruize_metrics.py), which enables the collection of Kruize metrics in CSV format.
-This script generates two distinct output files: increase_kruizemetrics.csv and total_kruizemetrics.csv. Notably, the PostgresDB metrics maintain consistency across both files.
+To facilitate the performance analysis of the Kruize application, we provide a comprehensive
+script, [kruize_metrics.py](../scripts/kruize_metrics.py), which enables the collection of Kruize metrics in CSV format.
+This script generates two distinct output files: increase_kruizemetrics.csv and total_kruizemetrics.csv. Notably, the
+PostgresDB metrics maintain consistency across both files.
### Output Files and Format
-- `increase_kruizemetrics.csv`: This file leverages increase() queries to ascertain the total incremental changes in Kruize metric values over time.
-- `total_kruizemetrics.csv`: This file employs the original queries to compute cumulative metric values since the inception of the Kruize application.
+- `increase_kruizemetrics.csv`: This file leverages increase() queries to ascertain the total incremental changes in
+ Kruize metric values over time.
+- `total_kruizemetrics.csv`: This file employs the original queries to compute cumulative metric values since the
+ inception of the Kruize application.
-Each column within the CSV files corresponds to specific API and DB metrics, capturing counts, sums, and maximum values for both successful and failed operations.
+Each column within the CSV files corresponds to specific API and DB metrics, capturing counts, sums, and maximum values
+for both successful and failed operations.
### Some key columns for insightful analysis:
@@ -175,19 +215,25 @@ Each column within the CSV files corresponds to specific API and DB metrics, cap
| kruizeDB_size | Current size of the Kruize database. |
| kruizeDB_results | Total count of results available in the database across all experiments. |
-
# Initial Analysis Insights
Upon analyzing the collected metrics, several crucial insights emerge:
-- `Database Growth`: As the number of experiments and associated results increases, there is a proportional growth in the size of the database.
+- `Database Growth`: As the number of experiments and associated results increases, there is a proportional growth in
+ the size of the database.
-- `Update Recommendations Time`: Currently, the time required for updating recommendations exhibits an increasing trend with the growth in results. This aspect necessitates closer attention and potential optimization efforts.
+- `Update Recommendations Time`: Currently, the time required for updating recommendations exhibits an increasing trend
+ with the growth in results. This aspect necessitates closer attention and potential optimization efforts.
-- `Stable Update Results Time`: The time taken for updating experiment results is expected to remain relatively stable. Any deviations from this expected pattern warrant further investigation for potential performance issues.
+- `Stable Update Results Time`: The time taken for updating experiment results is expected to remain relatively stable.
+ Any deviations from this expected pattern warrant further investigation for potential performance issues.
-- `DB Method Aggregation`: While individual DB method metrics provide valuable insights, it is important to understand how they collectively contribute to the overall API metrics. A comprehensive analysis of both individual and aggregated DB metrics is essential for a holistic performance assessment.
+- `DB Method Aggregation`: While individual DB method metrics provide valuable insights, it is important to understand
+ how they collectively contribute to the overall API metrics. A comprehensive analysis of both individual and
+ aggregated DB metrics is essential for a holistic performance assessment.
-- `Max Value Analysis`: Evaluating the maximum values allows for the identification of peak performance periods for each method, aiding in the identification of potential performance bottlenecks.
+- `Max Value Analysis`: Evaluating the maximum values allows for the identification of peak performance periods for each
+ method, aiding in the identification of potential performance bottlenecks.
-By conducting a thorough analysis based on these initial insights, users can effectively monitor and optimize the performance of the Kruize application, thereby ensuring a seamless and efficient user experience.
+By conducting a thorough analysis based on these initial insights, users can effectively monitor and optimize the
+performance of the Kruize application, thereby ensuring a seamless and efficient user experience.
diff --git a/manifests/crc/BYODB-installation/minikube/kruize-crc-minikube.yaml b/manifests/crc/BYODB-installation/minikube/kruize-crc-minikube.yaml
index 9df33af34..abda3c3f3 100644
--- a/manifests/crc/BYODB-installation/minikube/kruize-crc-minikube.yaml
+++ b/manifests/crc/BYODB-installation/minikube/kruize-crc-minikube.yaml
@@ -32,7 +32,7 @@ data:
"monitoringendpoint": "prometheus-k8s",
"savetodb": "true",
"dbdriver": "jdbc:postgresql://",
- "plots": "false",
+ "plots": "true",
"local": "false",
"logAllHttpReqAndResp": "true",
"hibernate": {
@@ -78,7 +78,7 @@ spec:
spec:
containers:
- name: kruize
- image: kruize/autotune_operator:0.0.21_rm
+ image: kruize/autotune_operator:0.0.22_rm
imagePullPolicy: Always
volumeMounts:
- name: config-volume
diff --git a/manifests/crc/BYODB-installation/openshift/kruize-crc-openshift.yaml b/manifests/crc/BYODB-installation/openshift/kruize-crc-openshift.yaml
index e17500e2c..c4db437ad 100644
--- a/manifests/crc/BYODB-installation/openshift/kruize-crc-openshift.yaml
+++ b/manifests/crc/BYODB-installation/openshift/kruize-crc-openshift.yaml
@@ -32,7 +32,7 @@ data:
"monitoringendpoint": "prometheus-k8s",
"savetodb": "true",
"dbdriver": "jdbc:postgresql://",
- "plots": "false",
+ "plots": "true",
"local": "false",
"logAllHttpReqAndResp": "true",
"hibernate": {
@@ -78,7 +78,7 @@ spec:
spec:
containers:
- name: kruize
- image: kruize/autotune_operator:0.0.21_rm
+ image: kruize/autotune_operator:0.0.22_rm
imagePullPolicy: Always
volumeMounts:
- name: config-volume
diff --git a/manifests/crc/default-db-included-installation/minikube/kruize-crc-minikube.yaml b/manifests/crc/default-db-included-installation/minikube/kruize-crc-minikube.yaml
index 8456e7608..b778eab77 100644
--- a/manifests/crc/default-db-included-installation/minikube/kruize-crc-minikube.yaml
+++ b/manifests/crc/default-db-included-installation/minikube/kruize-crc-minikube.yaml
@@ -110,7 +110,7 @@ data:
"monitoringendpoint": "prometheus-k8s",
"savetodb": "true",
"dbdriver": "jdbc:postgresql://",
- "plots": "false",
+ "plots": "true",
"local": "false",
"logAllHttpReqAndResp": "true",
"hibernate": {
@@ -165,7 +165,7 @@ spec:
spec:
containers:
- name: kruize
- image: kruize/autotune_operator:0.0.21_rm
+ image: kruize/autotune_operator:0.0.22_rm
imagePullPolicy: Always
volumeMounts:
- name: config-volume
@@ -230,7 +230,7 @@ spec:
spec:
containers:
- name: kruizecronjob
- image: kruize/autotune_operator:0.0.21_rm
+ image: kruize/autotune_operator:0.0.22_rm
imagePullPolicy: Always
volumeMounts:
- name: config-volume
@@ -356,7 +356,7 @@ spec:
spec:
containers:
- name: kruizedeletejob
- image: kruize/autotune_operator:0.0.21_rm
+ image: kruize/autotune_operator:0.0.22_rm
imagePullPolicy: Always
volumeMounts:
- name: config-volume
diff --git a/manifests/crc/default-db-included-installation/openshift/kruize-crc-openshift.yaml b/manifests/crc/default-db-included-installation/openshift/kruize-crc-openshift.yaml
index 5932e8448..dd742a7cf 100644
--- a/manifests/crc/default-db-included-installation/openshift/kruize-crc-openshift.yaml
+++ b/manifests/crc/default-db-included-installation/openshift/kruize-crc-openshift.yaml
@@ -91,7 +91,7 @@ data:
"monitoringendpoint": "prometheus-k8s",
"savetodb": "true",
"dbdriver": "jdbc:postgresql://",
- "plots": "false",
+ "plots": "true",
"local": "false",
"logAllHttpReqAndResp": "true",
"hibernate": {
@@ -211,7 +211,7 @@ spec:
serviceAccountName: kruize-sa
containers:
- name: kruize
- image: kruize/autotune_operator:0.0.21_rm
+ image: kruize/autotune_operator:0.0.22_rm
imagePullPolicy: Always
volumeMounts:
- name: config-volume
@@ -283,7 +283,7 @@ spec:
spec:
containers:
- name: kruizecronjob
- image: kruize/autotune_operator:0.0.21_rm
+ image: kruize/autotune_operator:0.0.22_rm
imagePullPolicy: Always
volumeMounts:
- name: config-volume
@@ -324,7 +324,7 @@ spec:
spec:
containers:
- name: kruizedeletejob
- image: kruize/autotune_operator:0.0.21_rm
+ image: kruize/autotune_operator:0.0.22_rm
imagePullPolicy: Always
volumeMounts:
- name: config-volume
diff --git a/pom.xml b/pom.xml
index 48c11eba5..fef5e3f55 100644
--- a/pom.xml
+++ b/pom.xml
@@ -6,7 +6,7 @@
org.autotune
autotune
- 0.0.21_mvp
+ 0.0.22_mvp
4.13.2
20240303
diff --git a/src/main/java/com/autotune/analyzer/plots/PlotManager.java b/src/main/java/com/autotune/analyzer/plots/PlotManager.java
index 2b3d867c1..16a5efc2c 100644
--- a/src/main/java/com/autotune/analyzer/plots/PlotManager.java
+++ b/src/main/java/com/autotune/analyzer/plots/PlotManager.java
@@ -1,16 +1,19 @@
package com.autotune.analyzer.plots;
+import com.autotune.analyzer.recommendations.model.CostBasedRecommendationModel;
import com.autotune.analyzer.recommendations.term.Terms;
import com.autotune.analyzer.utils.AnalyzerConstants;
-import com.autotune.common.data.metrics.MetricResults;
import com.autotune.common.data.result.IntervalResults;
import com.autotune.common.utils.CommonUtils;
+import com.autotune.utils.KruizeConstants;
+import org.json.JSONArray;
+import org.json.JSONException;
+import org.json.JSONObject;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.sql.Timestamp;
import java.util.*;
-import java.util.stream.Collectors;
import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationEngine.PercentileConstants.*;
@@ -35,7 +38,7 @@ public PlotData.PlotsData generatePlots() {
sortedResultsHashMap.putAll(containerResultsMap);
// Retrieve entries within the specified range
- Map resultInRange = sortedResultsHashMap.subMap(monitoringEndTime, true, monitoringStartTime, true);
+ Map resultInRange = sortedResultsHashMap.subMap(monitoringEndTime, true, monitoringStartTime, false);
int delimiterNumber = (int) (resultInRange.size() / recommendationTerm.getPlots_datapoints());
@@ -58,8 +61,10 @@ public PlotData.PlotsData generatePlots() {
calendar.add(Calendar.MILLISECOND, (int) millisecondsToAdd);
// Convert the modified Calendar back to a Timestamp
Timestamp newTimestamp = new Timestamp(calendar.getTimeInMillis());
- PlotData.UsageData cpuUsage = getUsageData(sortedResultsHashMap.subMap(newTimestamp, true, incrementStartTime, true), AnalyzerConstants.MetricName.cpuUsage, "cores");
- PlotData.UsageData memoryUsage = getUsageData(sortedResultsHashMap.subMap(newTimestamp, true, incrementStartTime, true), AnalyzerConstants.MetricName.memoryUsage, "MiB");
+ PlotData.UsageData cpuUsage = getUsageData(sortedResultsHashMap.subMap(newTimestamp, true,
+ incrementStartTime,false), AnalyzerConstants.MetricName.cpuUsage);
+ PlotData.UsageData memoryUsage = getUsageData(sortedResultsHashMap.subMap(newTimestamp, true,
+ incrementStartTime, false), AnalyzerConstants.MetricName.memoryUsage);
plotsDataMap.put(newTimestamp, new PlotData.PlotPoint(cpuUsage, memoryUsage));
incrementStartTime = newTimestamp;
}
@@ -67,28 +72,80 @@ public PlotData.PlotsData generatePlots() {
return new PlotData.PlotsData(recommendationTerm.getPlots_datapoints(), plotsDataMap);
}
- PlotData.UsageData getUsageData(Map resultInRange, AnalyzerConstants.MetricName metricName, String format) {
- // Extract CPU values
- List cpuValues = resultInRange.values().stream()
- .filter(intervalResults -> intervalResults.getMetricResultsMap().containsKey(metricName))
- .mapToDouble(intervalResults -> {
- MetricResults metricResults = intervalResults.getMetricResultsMap().get(metricName);
- return (metricResults != null && metricResults.getAggregationInfoResult() != null) ? metricResults.getAggregationInfoResult().getSum() : 0.0;
- })
- .boxed() // Convert double to Double
- .collect(Collectors.toList());
- if (cpuValues.size() > 0) {
- double q1 = CommonUtils.percentile(TWENTYFIVE_PERCENTILE, cpuValues);
- double q3 = CommonUtils.percentile(SEVENTYFIVE_PERCENTILE, cpuValues);
- double median = CommonUtils.percentile(FIFTY_PERCENTILE, cpuValues);
- // Find max and min
- double max = Collections.max(cpuValues);
- double min = Collections.min(cpuValues);
- return new PlotData.UsageData(min, q1, median, q3, max, format);
- } else {
- return null;
+ PlotData.UsageData getUsageData(Map resultInRange, AnalyzerConstants.MetricName metricName) {
+ // stream through the results value and extract the CPU values
+ try {
+ if (metricName.equals(AnalyzerConstants.MetricName.cpuUsage)) {
+ JSONArray cpuValues = CostBasedRecommendationModel.getCPUUsageList(resultInRange);
+ LOGGER.debug("cpuValues : {}", cpuValues);
+ if (!cpuValues.isEmpty()) {
+ // Extract "max" values from cpuUsageList
+ List cpuMaxValues = new ArrayList<>();
+ List cpuMinValues = new ArrayList<>();
+ for (int i = 0; i < cpuValues.length(); i++) {
+ JSONObject jsonObject = cpuValues.getJSONObject(i);
+ double maxValue = jsonObject.getDouble(KruizeConstants.JSONKeys.MAX);
+ double minValue = jsonObject.getDouble(KruizeConstants.JSONKeys.MIN);
+ cpuMaxValues.add(maxValue);
+ cpuMinValues.add(minValue);
+ }
+ LOGGER.debug("cpuMaxValues : {}", cpuMaxValues);
+ LOGGER.debug("cpuMinValues : {}", cpuMinValues);
+ return getPercentileData(cpuMaxValues, cpuMinValues, resultInRange, metricName);
+ }
+
+ } else {
+ // loop through the results value and extract the memory values
+ CostBasedRecommendationModel costBasedRecommendationModel = new CostBasedRecommendationModel();
+ List memUsageMinList = new ArrayList<>();
+ List memUsageMaxList = new ArrayList<>();
+ boolean memDataAvailable = false;
+ for (IntervalResults intervalResults: resultInRange.values()) {
+ JSONObject jsonObject = costBasedRecommendationModel.calculateMemoryUsage(intervalResults);
+ if (!jsonObject.isEmpty()) {
+ memDataAvailable = true;
+ Double memUsageMax = jsonObject.getDouble(KruizeConstants.JSONKeys.MAX);
+ Double memUsageMin = jsonObject.getDouble(KruizeConstants.JSONKeys.MIN);
+ memUsageMaxList.add(memUsageMax);
+ memUsageMinList.add(memUsageMin);
+ }
+ }
+ LOGGER.debug("memValues Max : {}, Min : {}", memUsageMaxList, memUsageMinList);
+ if (memDataAvailable)
+ return getPercentileData(memUsageMaxList, memUsageMinList, resultInRange, metricName);
+ }
+ } catch (JSONException e) {
+ LOGGER.error("Exception occurred while extracting metric values: {}", e.getMessage());
}
+ return null;
+ }
-
+ private PlotData.UsageData getPercentileData(List metricValuesMax, List metricValuesMin, Map resultInRange, AnalyzerConstants.MetricName metricName) {
+ try {
+ if (!metricValuesMax.isEmpty()) {
+ double q1 = CommonUtils.percentile(TWENTYFIVE_PERCENTILE, metricValuesMax);
+ double q3 = CommonUtils.percentile(SEVENTYFIVE_PERCENTILE, metricValuesMax);
+ double median = CommonUtils.percentile(FIFTY_PERCENTILE, metricValuesMax);
+ // Find max and min
+ double max = Collections.max(metricValuesMax);
+ double min;
+ // check for non zero values
+ boolean nonZeroCheck = metricValuesMin.stream().noneMatch(value -> value.equals(0.0));
+ if (nonZeroCheck) {
+ min = Collections.min(metricValuesMin);
+ } else {
+ min = 0.0;
+ }
+
+ LOGGER.debug("q1 : {}, q3 : {}, median : {}, max : {}, min : {}", q1, q3, median, max, min);
+ String format = CostBasedRecommendationModel.getFormatValue(resultInRange, metricName);
+ return new PlotData.UsageData(min, q1, median, q3, max, format);
+ } else {
+ return null;
+ }
+ } catch (Exception e) {
+ LOGGER.error("Exception occurred while generating percentiles: {}", e.getMessage());
+ }
+ return null;
}
}
diff --git a/src/main/java/com/autotune/analyzer/recommendations/engine/RecommendationEngine.java b/src/main/java/com/autotune/analyzer/recommendations/engine/RecommendationEngine.java
index 322a7c087..e1c9ec837 100644
--- a/src/main/java/com/autotune/analyzer/recommendations/engine/RecommendationEngine.java
+++ b/src/main/java/com/autotune/analyzer/recommendations/engine/RecommendationEngine.java
@@ -31,11 +31,13 @@
import com.autotune.operator.KruizeDeploymentInfo;
import com.autotune.utils.GenericRestApiClient;
import com.autotune.utils.KruizeConstants;
+import com.autotune.utils.MetricsConfig;
import com.autotune.utils.Utils;
import com.google.gson.Gson;
import com.google.gson.JsonArray;
import com.google.gson.JsonElement;
import com.google.gson.JsonObject;
+import io.micrometer.core.instrument.Timer;
import org.json.JSONObject;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@@ -586,13 +588,26 @@ private boolean generateRecommendationsBasedOnTerms(ContainerData containerData,
mappedRecommendationForTerm.addNotification(recommendationNotification);
}
mappedRecommendationForTerm.setMonitoringStartTime(monitoringStartTime);
- }
- Terms.setDurationBasedOnTerm(containerData, mappedRecommendationForTerm, recommendationTerm);
- if (KruizeDeploymentInfo.plots == true) {
- if (null != monitoringStartTime) {
- mappedRecommendationForTerm.setPlots(new PlotManager(containerData.getResults(), terms, monitoringStartTime, monitoringEndTime).generatePlots());
+ // generate plots when minimum data is available for the term
+ if (KruizeDeploymentInfo.plots) {
+ if (null != monitoringStartTime) {
+ Timer.Sample timerBoxPlots = null;
+ String status = "success"; // TODO avoid this constant at multiple place
+ try {
+ timerBoxPlots = Timer.start(MetricsConfig.meterRegistry());
+ mappedRecommendationForTerm.setPlots(new PlotManager(containerData.getResults(), terms, monitoringStartTime, monitoringEndTime).generatePlots());
+ } catch (Exception e) {
+ status = String.format("Box plots Failed due to - %s", e.getMessage());
+ } finally {
+ if (timerBoxPlots != null) {
+ MetricsConfig.timerBoxPlots = MetricsConfig.timerBBoxPlots.tag("status", status).register(MetricsConfig.meterRegistry());
+ timerBoxPlots.stop(MetricsConfig.timerBoxPlots);
+ }
+ }
+ }
}
}
+ Terms.setDurationBasedOnTerm(containerData, mappedRecommendationForTerm, recommendationTerm);
timestampRecommendation.setRecommendationForTermHashMap(recommendationTerm, mappedRecommendationForTerm);
}
@@ -1407,7 +1422,7 @@ private String getResults(Map mainKruizeExperimentMAP, Kru
* @param interval_start_time The start time of the interval for fetching metrics.
* @param dataSourceInfo The datasource object to fetch metrics from.
* @throws Exception if an error occurs during the fetching process.
- * TODO: Need to add right abstractions for this
+ * TODO: Need to add right abstractions for this
*/
public void fetchMetricsBasedOnDatasource(KruizeObject kruizeObject, Timestamp interval_end_time, Timestamp interval_start_time, DataSourceInfo dataSourceInfo) throws Exception {
try {
@@ -1492,10 +1507,10 @@ public void fetchMetricsBasedOnDatasource(KruizeObject kruizeObject, Timestamp i
if (secondMethodName.equals(KruizeConstants.JSONKeys.SUM))
secondMethodName = KruizeConstants.JSONKeys.AVG;
promQL = String.format(metricEntry.getValue(), methodName, secondMethodName, namespace, containerName, measurementDurationMinutesInDouble.intValue());
- format = KruizeConstants.JSONKeys.GIBIBYTE;
+ format = KruizeConstants.JSONKeys.BYTES;
} else if (metricEntry.getKey() == AnalyzerConstants.MetricName.memoryLimit || metricEntry.getKey() == AnalyzerConstants.MetricName.memoryRequest) {
promQL = String.format(metricEntry.getValue(), methodName, namespace, containerName);
- format = KruizeConstants.JSONKeys.GIBIBYTE;
+ format = KruizeConstants.JSONKeys.BYTES;
}
// If promQL is determined, fetch metrics from the datasource
if (promQL != null) {
@@ -1570,7 +1585,8 @@ public void fetchMetricsBasedOnDatasource(KruizeObject kruizeObject, Timestamp i
}
}
containerData.setResults(containerDataResults);
- setInterval_end_time(Collections.max(containerDataResults.keySet())); //TODO Temp fix invalide date is set if experiment having two container with different last seen date
+ if (containerDataResults.size() > 0)
+ setInterval_end_time(Collections.max(containerDataResults.keySet())); //TODO Temp fix invalide date is set if experiment having two container with different last seen date
}
}
} catch (Exception e) {
diff --git a/src/main/java/com/autotune/analyzer/recommendations/model/CostBasedRecommendationModel.java b/src/main/java/com/autotune/analyzer/recommendations/model/CostBasedRecommendationModel.java
index c9c72a51d..ae506b2e0 100644
--- a/src/main/java/com/autotune/analyzer/recommendations/model/CostBasedRecommendationModel.java
+++ b/src/main/java/com/autotune/analyzer/recommendations/model/CostBasedRecommendationModel.java
@@ -8,12 +8,17 @@
import com.autotune.common.data.metrics.MetricResults;
import com.autotune.common.data.result.IntervalResults;
import com.autotune.common.utils.CommonUtils;
+import com.autotune.utils.KruizeConstants;
+import org.json.JSONArray;
+import org.json.JSONObject;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.sql.Timestamp;
import java.util.*;
import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+import java.util.stream.Stream;
import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationEngine.PercentileConstants.COST_CPU_PERCENTILE;
import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationEngine.PercentileConstants.COST_MEMORY_PERCENTILE;
@@ -42,49 +47,21 @@ public RecommendationConfigItem getCPURequestRecommendation(Map cpuUsageList = filteredResultsMap.values()
- .stream()
- .map(e -> {
- Optional cpuUsageResults = Optional.ofNullable(e.getMetricResultsMap().get(AnalyzerConstants.MetricName.cpuUsage));
- Optional cpuThrottleResults = Optional.ofNullable(e.getMetricResultsMap().get(AnalyzerConstants.MetricName.cpuThrottle));
- double cpuUsageAvg = cpuUsageResults.map(m -> m.getAggregationInfoResult().getAvg()).orElse(0.0);
- double cpuUsageMax = cpuUsageResults.map(m -> m.getAggregationInfoResult().getMax()).orElse(0.0);
- double cpuUsageSum = cpuUsageResults.map(m -> m.getAggregationInfoResult().getSum()).orElse(0.0);
- double cpuThrottleAvg = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getAvg()).orElse(0.0);
- double cpuThrottleMax = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getMax()).orElse(0.0);
- double cpuThrottleSum = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getSum()).orElse(0.0);
- double cpuRequestInterval = 0.0;
- double cpuUsagePod = 0;
- int numPods = 0;
-
- // Use the Max value when available, if not use the Avg
- double cpuUsage = (cpuUsageMax > 0) ? cpuUsageMax : cpuUsageAvg;
- double cpuThrottle = (cpuThrottleMax > 0) ? cpuThrottleMax : cpuThrottleAvg;
- double cpuUsageTotal = cpuUsage + cpuThrottle;
-
- // Usage is less than 1 core, set it to the observed value.
- if (CPU_ONE_CORE > cpuUsageTotal) {
- cpuRequestInterval = cpuUsageTotal;
- } else {
- // Sum/Avg should give us the number of pods
- if (0 != cpuUsageAvg) {
- numPods = (int) Math.ceil(cpuUsageSum / cpuUsageAvg);
- if (0 < numPods) {
- cpuUsagePod = (cpuUsageSum + cpuThrottleSum) / numPods;
- }
- }
- cpuRequestInterval = Math.max(cpuUsagePod, cpuUsageTotal);
- }
- return cpuRequestInterval;
- })
- .collect(Collectors.toList());
+ JSONArray cpuUsageList = getCPUUsageList(filteredResultsMap);
+ // Extract 'max' values from cpuUsageList
+ List cpuMaxValues = new ArrayList<>();
+ for (int i = 0; i < cpuUsageList.length(); i++) {
+ JSONObject jsonObject = cpuUsageList.getJSONObject(i);
+ double maxValue = jsonObject.getDouble(KruizeConstants.JSONKeys.MAX);
+ cpuMaxValues.add(maxValue);
+ }
- Double cpuRequest = 0.0;
- Double cpuRequestMax = Collections.max(cpuUsageList);
+ Double cpuRequest;
+ Double cpuRequestMax = Collections.max(cpuMaxValues);
if (null != cpuRequestMax && CPU_ONE_CORE > cpuRequestMax) {
cpuRequest = cpuRequestMax;
} else {
- cpuRequest = CommonUtils.percentile(COST_CPU_PERCENTILE, cpuUsageList);
+ cpuRequest = CommonUtils.percentile(COST_CPU_PERCENTILE, cpuMaxValues);
}
// TODO: This code below should be optimised with idle detection (0 cpu usage in recorded data) in recommendation ALGO
@@ -116,23 +93,64 @@ else if (CPU_ONE_MILLICORE >= cpuRequest) {
}
}
+ format = getFormatValue(filteredResultsMap, AnalyzerConstants.MetricName.cpuUsage);
+
+ recommendationConfigItem = new RecommendationConfigItem(cpuRequest, format);
+ return recommendationConfigItem;
+ }
+
+ public static JSONArray getCPUUsageList(Map filteredResultsMap) {
+ JSONArray cpuRequestIntervalArray = new JSONArray();
for (IntervalResults intervalResults : filteredResultsMap.values()) {
- MetricResults cpuUsageResults = intervalResults.getMetricResultsMap().get(AnalyzerConstants.MetricName.cpuUsage);
- if (cpuUsageResults != null) {
- MetricAggregationInfoResults aggregationInfoResult = cpuUsageResults.getAggregationInfoResult();
- if (aggregationInfoResult != null) {
- format = aggregationInfoResult.getFormat();
- if (format != null && !format.isEmpty()) {
- break;
+ JSONObject cpuRequestInterval = new JSONObject();
+ Optional cpuUsageResults = Optional.ofNullable(intervalResults.getMetricResultsMap().get(AnalyzerConstants.MetricName.cpuUsage));
+ Optional cpuThrottleResults = Optional.ofNullable(intervalResults.getMetricResultsMap().get(AnalyzerConstants.MetricName.cpuThrottle));
+ double cpuUsageAvg = cpuUsageResults.map(m -> m.getAggregationInfoResult().getAvg()).orElse(0.0);
+ double cpuUsageMax = cpuUsageResults.map(m -> m.getAggregationInfoResult().getMax()).orElse(0.0);
+ double cpuUsageSum = cpuUsageResults.map(m -> m.getAggregationInfoResult().getSum()).orElse(0.0);
+ double cpuUsageMin = cpuUsageResults.map(m -> m.getAggregationInfoResult().getMin()).orElse(0.0);
+ double cpuThrottleAvg = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getAvg()).orElse(0.0);
+ double cpuThrottleMax = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getMax()).orElse(0.0);
+ double cpuThrottleSum = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getSum()).orElse(0.0);
+ double cpuThrottleMin = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getMin()).orElse(0.0);
+
+ double cpuRequestIntervalMax;
+ double cpuRequestIntervalMin;
+ double cpuUsagePod = 0;
+ int numPods;
+
+ // Use the Max value when available, if not use the Avg
+ double cpuUsage = (cpuUsageMax > 0) ? cpuUsageMax : cpuUsageAvg;
+ double cpuThrottle = (cpuThrottleMax > 0) ? cpuThrottleMax : cpuThrottleAvg;
+ double cpuUsageTotal = cpuUsage + cpuThrottle;
+
+ // Usage is less than 1 core, set it to the observed value.
+ if (CPU_ONE_CORE > cpuUsageTotal) {
+ cpuRequestIntervalMax = cpuUsageTotal;
+ } else {
+ // Sum/Avg should give us the number of pods
+ if (0 != cpuUsageAvg) {
+ numPods = (int) Math.ceil(cpuUsageSum / cpuUsageAvg);
+ if (0 < numPods) {
+ cpuUsagePod = (cpuUsageSum + cpuThrottleSum) / numPods;
}
}
+ cpuRequestIntervalMax = Math.max(cpuUsagePod, cpuUsageTotal);
}
+ double cpuMinTotal = cpuUsageMin + cpuThrottleMin;
+ // traverse over a stream of positive values and find the minimum value
+ cpuRequestIntervalMin = Stream.of(cpuUsagePod, cpuUsageTotal, cpuMinTotal)
+ .filter(value -> value > 0.0)
+ .min(Double::compare)
+ .orElse(0.0);
+
+ cpuRequestInterval.put(KruizeConstants.JSONKeys.MIN, cpuRequestIntervalMin);
+ cpuRequestInterval.put(KruizeConstants.JSONKeys.MAX, cpuRequestIntervalMax);
+ LOGGER.debug("cpuRequestInterval : {}", cpuRequestInterval);
+ cpuRequestIntervalArray.put(cpuRequestInterval);
}
-
- recommendationConfigItem = new RecommendationConfigItem(cpuRequest, format);
- return recommendationConfigItem;
+ return cpuRequestIntervalArray;
}
-
@Override
public RecommendationConfigItem getMemoryRequestRecommendation(Map filteredResultsMap,
ArrayList notifications) {
@@ -143,10 +161,13 @@ public RecommendationConfigItem getMemoryRequestRecommendation(Map memUsageList = filteredResultsMap.values()
- .stream()
- .map(CostBasedRecommendationModel::calculateMemoryUsage)
- .collect(Collectors.toList());
+ CostBasedRecommendationModel costBasedRecommendationModel = new CostBasedRecommendationModel();
+ List memUsageList = new ArrayList<>();
+ for (IntervalResults intervalResults: filteredResultsMap.values()) {
+ JSONObject jsonObject = costBasedRecommendationModel.calculateMemoryUsage(intervalResults);
+ Double memUsage = jsonObject.getDouble(KruizeConstants.JSONKeys.MAX);
+ memUsageList.add(memUsage);
+ }
List spikeList = filteredResultsMap.values()
.stream()
@@ -169,8 +190,16 @@ public RecommendationConfigItem getMemoryRequestRecommendation(Map filteredResultsMap, AnalyzerConstants.MetricName metricName) {
+ String format = "";
for (IntervalResults intervalResults : filteredResultsMap.values()) {
- MetricResults memoryUsageResults = intervalResults.getMetricResultsMap().get(AnalyzerConstants.MetricName.memoryUsage);
+ MetricResults memoryUsageResults = intervalResults.getMetricResultsMap().get(metricName);
if (memoryUsageResults != null) {
MetricAggregationInfoResults aggregationInfoResult = memoryUsageResults.getAggregationInfoResult();
if (aggregationInfoResult != null) {
@@ -181,9 +210,7 @@ public RecommendationConfigItem getMemoryRequestRecommendation(Map cpuUsageResults = Optional.ofNullable(intervalResults.getMetricResultsMap().get(AnalyzerConstants.MetricName.cpuUsage));
double cpuUsageAvg = cpuUsageResults.map(m -> m.getAggregationInfoResult().getAvg()).orElse(0.0);
double cpuUsageSum = cpuUsageResults.map(m -> m.getAggregationInfoResult().getSum()).orElse(0.0);
Optional memoryUsageResults = Optional.ofNullable(intervalResults.getMetricResultsMap().get(AnalyzerConstants.MetricName.memoryUsage));
double memUsageAvg = memoryUsageResults.map(m -> m.getAggregationInfoResult().getAvg()).orElse(0.0);
double memUsageMax = memoryUsageResults.map(m -> m.getAggregationInfoResult().getMax()).orElse(0.0);
+ double memUsageMin = memoryUsageResults.map(m -> m.getAggregationInfoResult().getMin()).orElse(0.0);
double memUsageSum = memoryUsageResults.map(m -> m.getAggregationInfoResult().getSum()).orElse(0.0);
double memUsage = 0;
int numPods = 0;
@@ -216,9 +246,18 @@ private static double calculateMemoryUsage(IntervalResults intervalResults) {
if (0 < numPods) {
memUsage = (memUsageSum / numPods);
}
- memUsage = Math.max(memUsage, memUsageMax);
-
- return memUsage;
+ memUsageMax = Math.max(memUsage, memUsageMax);
+ // traverse over a stream of positive values and find the minimum value
+ memUsageMin = Stream.of(memUsage, memUsageMax, memUsageMin)
+ .filter(value -> value > 0.0)
+ .min(Double::compare)
+ .orElse(0.0);
+
+ jsonObject.put(KruizeConstants.JSONKeys.MIN, memUsageMin);
+ jsonObject.put(KruizeConstants.JSONKeys.MAX, memUsageMax);
+
+ LOGGER.debug("memRequestInterval : {}", jsonObject);
+ return jsonObject;
}
private static double calculateIntervalSpike(IntervalResults intervalResults) {
diff --git a/src/main/java/com/autotune/analyzer/recommendations/model/PerformanceBasedRecommendationModel.java b/src/main/java/com/autotune/analyzer/recommendations/model/PerformanceBasedRecommendationModel.java
index 3409df3b3..27febf51a 100644
--- a/src/main/java/com/autotune/analyzer/recommendations/model/PerformanceBasedRecommendationModel.java
+++ b/src/main/java/com/autotune/analyzer/recommendations/model/PerformanceBasedRecommendationModel.java
@@ -9,12 +9,16 @@
import com.autotune.common.data.metrics.MetricResults;
import com.autotune.common.data.result.IntervalResults;
import com.autotune.common.utils.CommonUtils;
+import com.autotune.utils.KruizeConstants;
+import org.json.JSONArray;
+import org.json.JSONObject;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.sql.Timestamp;
import java.util.*;
import java.util.stream.Collectors;
+import java.util.stream.IntStream;
import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationEngine.PercentileConstants.PERFORMANCE_CPU_PERCENTILE;
import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationEngine.PercentileConstants.PERFORMANCE_MEMORY_PERCENTILE;
@@ -45,49 +49,22 @@ public RecommendationConfigItem getCPURequestRecommendation(Map cpuUsageList = filteredResultsMap.values()
- .stream()
- .map(e -> {
- Optional cpuUsageResults = Optional.ofNullable(e.getMetricResultsMap().get(AnalyzerConstants.MetricName.cpuUsage));
- Optional cpuThrottleResults = Optional.ofNullable(e.getMetricResultsMap().get(AnalyzerConstants.MetricName.cpuThrottle));
- double cpuUsageAvg = cpuUsageResults.map(m -> m.getAggregationInfoResult().getAvg()).orElse(0.0);
- double cpuUsageMax = cpuUsageResults.map(m -> m.getAggregationInfoResult().getMax()).orElse(0.0);
- double cpuUsageSum = cpuUsageResults.map(m -> m.getAggregationInfoResult().getSum()).orElse(0.0);
- double cpuThrottleAvg = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getAvg()).orElse(0.0);
- double cpuThrottleMax = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getMax()).orElse(0.0);
- double cpuThrottleSum = cpuThrottleResults.map(m -> m.getAggregationInfoResult().getSum()).orElse(0.0);
- double cpuRequestInterval = 0.0;
- double cpuUsagePod = 0;
- int numPods = 0;
-
- // Use the Max value when available, if not use the Avg
- double cpuUsage = (cpuUsageMax > 0) ? cpuUsageMax : cpuUsageAvg;
- double cpuThrottle = (cpuThrottleMax > 0) ? cpuThrottleMax : cpuThrottleAvg;
- double cpuUsageTotal = cpuUsage + cpuThrottle;
-
- // Usage is less than 1 core, set it to the observed value.
- if (CPU_ONE_CORE > cpuUsageTotal) {
- cpuRequestInterval = cpuUsageTotal;
- } else {
- // Sum/Avg should give us the number of pods
- if (0 != cpuUsageAvg) {
- numPods = (int) Math.ceil(cpuUsageSum / cpuUsageAvg);
- if (0 < numPods) {
- cpuUsagePod = (cpuUsageSum + cpuThrottleSum) / numPods;
- }
- }
- cpuRequestInterval = Math.max(cpuUsagePod, cpuUsageTotal);
- }
- return cpuRequestInterval;
- })
- .collect(Collectors.toList());
+ JSONArray cpuUsageList = CostBasedRecommendationModel.getCPUUsageList(filteredResultsMap);
+ LOGGER.debug("cpuUsageList : {}", cpuUsageList);
+ // Extract "max" values from cpuUsageList
+ List cpuMaxValues = new ArrayList<>();
+ for (int i = 0; i < cpuUsageList.length(); i++) {
+ JSONObject jsonObject = cpuUsageList.getJSONObject(i);
+ double maxValue = jsonObject.getDouble(KruizeConstants.JSONKeys.MAX);
+ cpuMaxValues.add(maxValue);
+ }
Double cpuRequest = 0.0;
- Double cpuRequestMax = Collections.max(cpuUsageList);
+ Double cpuRequestMax = Collections.max(cpuMaxValues);
if (null != cpuRequestMax && CPU_ONE_CORE > cpuRequestMax) {
cpuRequest = cpuRequestMax;
} else {
- cpuRequest = CommonUtils.percentile(PERFORMANCE_CPU_PERCENTILE, cpuUsageList);
+ cpuRequest = CommonUtils.percentile(PERFORMANCE_CPU_PERCENTILE, cpuMaxValues);
}
// TODO: This code below should be optimised with idle detection (0 cpu usage in recorded data) in recommendation ALGO
diff --git a/src/main/java/com/autotune/utils/KruizeConstants.java b/src/main/java/com/autotune/utils/KruizeConstants.java
index 29612d53a..6938b1d52 100644
--- a/src/main/java/com/autotune/utils/KruizeConstants.java
+++ b/src/main/java/com/autotune/utils/KruizeConstants.java
@@ -178,7 +178,7 @@ public static final class JSONKeys {
public static final String MEDIAN = "median";
public static final String RANGE = "range";
public static final String CORES = "cores";
- public static final String GIBIBYTE = "GiB";
+ public static final String BYTES = "bytes";
// Datasource JSON keys
public static final String DATASOURCES = "datasources";
diff --git a/src/main/java/com/autotune/utils/MetricsConfig.java b/src/main/java/com/autotune/utils/MetricsConfig.java
index 8baa24c5d..320eaf389 100644
--- a/src/main/java/com/autotune/utils/MetricsConfig.java
+++ b/src/main/java/com/autotune/utils/MetricsConfig.java
@@ -8,57 +8,56 @@
import io.micrometer.core.instrument.config.NamingConvention;
import io.micrometer.prometheus.PrometheusConfig;
import io.micrometer.prometheus.PrometheusMeterRegistry;
-import io.micrometer.core.instrument.MeterRegistry;
-import org.eclipse.jetty.util.thread.ThreadPool;
public class MetricsConfig {
- private static MetricsConfig INSTANCE;
public static Timer timerListRec, timerListExp, timerCreateExp, timerUpdateResults, timerUpdateRecomendations;
- public static Timer timerLoadRecExpName, timerLoadResultsExpName, timerLoadExpName, timerLoadRecExpNameDate;
+ public static Timer timerLoadRecExpName, timerLoadResultsExpName, timerLoadExpName, timerLoadRecExpNameDate, timerBoxPlots;
public static Timer timerLoadAllRec, timerLoadAllExp, timerLoadAllResults;
- public static Timer timerAddRecDB , timerAddResultsDB , timerAddExpDB, timerAddBulkResultsDB;
- public static Timer timerAddPerfProfileDB , timerLoadPerfProfileName , timerLoadAllPerfProfiles;
- public static Timer.Builder timerBListRec, timerBListExp, timerBCreateExp, timerBUpdateResults, timerBUpdateRecommendations ;
- public static Timer.Builder timerBLoadRecExpName, timerBLoadResultsExpName, timerBLoadExpName, timerBLoadRecExpNameDate;
+ public static Timer timerAddRecDB, timerAddResultsDB, timerAddExpDB, timerAddBulkResultsDB;
+ public static Timer timerAddPerfProfileDB, timerLoadPerfProfileName, timerLoadAllPerfProfiles;
+ public static Timer.Builder timerBListRec, timerBListExp, timerBCreateExp, timerBUpdateResults, timerBUpdateRecommendations;
+ public static Timer.Builder timerBLoadRecExpName, timerBLoadResultsExpName, timerBLoadExpName, timerBLoadRecExpNameDate, timerBBoxPlots;
public static Timer.Builder timerBLoadAllRec, timerBLoadAllExp, timerBLoadAllResults;
- public static Timer.Builder timerBAddRecDB, timerBAddResultsDB , timerBAddExpDB, timerBAddBulkResultsDB;
+ public static Timer.Builder timerBAddRecDB, timerBAddResultsDB, timerBAddExpDB, timerBAddBulkResultsDB;
public static Timer.Builder timerBAddPerfProfileDB, timerBLoadPerfProfileName, timerBLoadAllPerfProfiles;
- public String API_METRIC_DESC = "Time taken for Kruize APIs";
- public String DB_METRIC_DESC = "Time taken for KruizeDB methods";
public static PrometheusMeterRegistry meterRegistry;
-
public static Timer timerListDS, timerImportDSMetadata;
public static Timer.Builder timerBListDS, timerBImportDSMetadata;
+ private static MetricsConfig INSTANCE;
+ public String API_METRIC_DESC = "Time taken for Kruize APIs";
+ public String DB_METRIC_DESC = "Time taken for KruizeDB methods";
+ public String METHOD_METRIC_DESC = "Time taken for Kruize methods";
private MetricsConfig() {
meterRegistry = new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
meterRegistry.config().commonTags("application", "Kruize");
- timerBListRec = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api","listRecommendations").tag("method","GET");
- timerBListExp = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api","listExperiments").tag("method","GET");
- timerBCreateExp = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api","createExperiment").tag("method","POST");
- timerBUpdateResults = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api","updateResults").tag("method","POST");
- timerBUpdateRecommendations = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api","updateRecommendations").tag("method","POST");
+ timerBListRec = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "listRecommendations").tag("method", "GET");
+ timerBListExp = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "listExperiments").tag("method", "GET");
+ timerBCreateExp = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "createExperiment").tag("method", "POST");
+ timerBUpdateResults = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "updateResults").tag("method", "POST");
+ timerBUpdateRecommendations = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "updateRecommendations").tag("method", "POST");
- timerBLoadRecExpName = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadRecommendationsByExperimentName");
- timerBLoadRecExpNameDate = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadRecommendationsByExperimentNameAndDate");
- timerBLoadResultsExpName = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadResultsByExperimentName");
- timerBLoadExpName = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadExperimentByName");
- timerBLoadAllRec = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadAllRecommendations");
- timerBLoadAllExp = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadAllExperiments");
- timerBLoadAllResults = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadAllResults");
- timerBAddRecDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","addRecommendationToDB");
- timerBAddResultsDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","addResultToDB");
- timerBAddBulkResultsDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","addBulkResultsToDBAndFetchFailedResults");
- timerBAddExpDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","addExperimentToDB");
- timerBAddPerfProfileDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","addPerformanceProfileToDB");
- timerBLoadPerfProfileName = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadPerformanceProfileByName");
- timerBLoadAllPerfProfiles = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method","loadAllPerformanceProfiles");
+ timerBLoadRecExpName = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadRecommendationsByExperimentName");
+ timerBLoadRecExpNameDate = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadRecommendationsByExperimentNameAndDate");
+ timerBLoadResultsExpName = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadResultsByExperimentName");
+ timerBLoadExpName = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadExperimentByName");
+ timerBLoadAllRec = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadAllRecommendations");
+ timerBLoadAllExp = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadAllExperiments");
+ timerBLoadAllResults = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadAllResults");
+ timerBAddRecDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "addRecommendationToDB");
+ timerBAddResultsDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "addResultToDB");
+ timerBAddBulkResultsDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "addBulkResultsToDBAndFetchFailedResults");
+ timerBAddExpDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "addExperimentToDB");
+ timerBAddPerfProfileDB = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "addPerformanceProfileToDB");
+ timerBLoadPerfProfileName = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadPerformanceProfileByName");
+ timerBLoadAllPerfProfiles = Timer.builder("kruizeDB").description(DB_METRIC_DESC).tag("method", "loadAllPerformanceProfiles");
+ timerBBoxPlots = Timer.builder("KruizeMethod").description(METHOD_METRIC_DESC).tag("method", "generatePlots");
- timerBListDS = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api","listDataSources").tag("method","GET");
- timerBImportDSMetadata = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api","importDataSourceMetadata").tag("method","POST");
- timerBImportDSMetadata = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api","importDataSourceMetadata").tag("method","GET");
+ timerBListDS = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "listDataSources").tag("method", "GET");
+ timerBImportDSMetadata = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "importDataSourceMetadata").tag("method", "POST");
+ timerBImportDSMetadata = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "importDataSourceMetadata").tag("method", "GET");
new ClassLoaderMetrics().bindTo(meterRegistry);
new ProcessorMetrics().bindTo(meterRegistry);
new JvmGcMetrics().bindTo(meterRegistry);
diff --git a/tests/README.md b/tests/README.md
index 99a47fe2d..c5fc345dd 100644
--- a/tests/README.md
+++ b/tests/README.md
@@ -143,8 +143,20 @@ To run the stress test refer the Stress test [README](/tests/scripts/remote_moni
To run the fault tolerant test refer the [README](/tests/scripts/remote_monitoring_tests/fault_tolerant_tests.md)
+### Local monitoring tests
+
+Here we test Kruize [Local monitoring APIs](/design/KruizeLocalAPI.md).
+
+#### API tests
+
+ The tests does the following:
+ - Deploys kruize in non-CRD mode using the deploy script from the autotune repo
+ - Validates the behaviour of list datasources, import metadata and list metadata APIs in various scenarios covering both positive and negative usecases.
+
+ For details refer this [doc](/tests/scripts/local_monitoring_tests/Local_monitoring_tests.md)
+
## Supported Clusters
-- Minikube
+- Minikube, Openshift
## Prerequisites for running the tests:
@@ -204,6 +216,12 @@ To run remote monitoring tests,
/tests/test_autotune.sh -c minikube -i kruize/autotune_operator:0.0.11_mvp --testsuite=remote_monitoring_tests --resultsdir=/home/results
```
+To run local monitoring tests,
+
+```
+/tests/test_autotune.sh -c minikube -i kruize/autotune_operator:0.0.21_mvp --testsuite=local_monitoring_tests --resultsdir=/home/results
+```
+
## How to test a specific autotune module?
To run the tests specific to a autotune module use the "testmodule" option. For example, to run all the tests for dependency analyzer module execute the below command:
diff --git a/tests/scripts/common/common_functions.sh b/tests/scripts/common/common_functions.sh
index 37f736475..36fde16e6 100755
--- a/tests/scripts/common/common_functions.sh
+++ b/tests/scripts/common/common_functions.sh
@@ -1,6 +1,6 @@
#!/bin/bash
#
-# Copyright (c) 2020, 2021 Red Hat, IBM Corporation and others.
+# Copyright (c) 2020, 2024 Red Hat, IBM Corporation and others.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -45,7 +45,8 @@ TEST_SUITE_ARRAY=("app_autotune_yaml_tests"
"autotune_id_tests"
"kruize_layer_id_tests"
"em_standalone_tests"
-"remote_monitoring_tests")
+"remote_monitoring_tests"
+"local_monitoring_tests")
modify_kruize_layer_tests=("add_new_tunable"
"apply_null_tunable"
@@ -1822,3 +1823,19 @@ function create_performance_profile() {
exit 1
fi
}
+
+#
+# "local" flag is turned off by default for now. This needs to be set to true.
+#
+function kruize_local_patch() {
+ CRC_DIR="./manifests/crc/default-db-included-installation"
+ KRUIZE_CRC_DEPLOY_MANIFEST_OPENSHIFT="${CRC_DIR}/openshift/kruize-crc-openshift.yaml"
+ KRUIZE_CRC_DEPLOY_MANIFEST_MINIKUBE="${CRC_DIR}/minikube/kruize-crc-minikube.yaml"
+
+
+ if [ ${cluster_type} == "minikube" ]; then
+ sed -i 's/"local": "false"/"local": "true"/' ${KRUIZE_CRC_DEPLOY_MANIFEST_MINIKUBE}
+ elif [ ${cluster_type} == "openshift" ]; then
+ sed -i 's/"local": "false"/"local": "true"/' ${KRUIZE_CRC_DEPLOY_MANIFEST_OPENSHIFT}
+ fi
+}
diff --git a/tests/scripts/functional_tests.sh b/tests/scripts/functional_tests.sh
index b2b086801..13a814764 100755
--- a/tests/scripts/functional_tests.sh
+++ b/tests/scripts/functional_tests.sh
@@ -32,6 +32,7 @@ SCRIPTS_DIR="${CURRENT_DIR}"
. ${SCRIPTS_DIR}/da/kruize_layer_id_tests.sh
. ${SCRIPTS_DIR}/em/em_standalone_tests.sh
. ${SCRIPTS_DIR}/remote_monitoring_tests/remote_monitoring_tests.sh
+. ${SCRIPTS_DIR}/local_monitoring_tests/local_monitoring_tests.sh
# Iterate through the commandline options
while getopts i:o:r:-: gopts
diff --git a/tests/scripts/remote_monitoring_tests/helpers/__init__.py b/tests/scripts/helpers/__init__.py
similarity index 100%
rename from tests/scripts/remote_monitoring_tests/helpers/__init__.py
rename to tests/scripts/helpers/__init__.py
diff --git a/tests/scripts/remote_monitoring_tests/helpers/all_terms_list_reco_json_schema.py b/tests/scripts/helpers/all_terms_list_reco_json_schema.py
similarity index 87%
rename from tests/scripts/remote_monitoring_tests/helpers/all_terms_list_reco_json_schema.py
rename to tests/scripts/helpers/all_terms_list_reco_json_schema.py
index c55559c60..8688c9db9 100644
--- a/tests/scripts/remote_monitoring_tests/helpers/all_terms_list_reco_json_schema.py
+++ b/tests/scripts/helpers/all_terms_list_reco_json_schema.py
@@ -371,6 +371,49 @@
}
},
"required": []
+ },
+ "plots": {
+ "type": "object",
+ "properties": {
+ "datapoints": { "type": "number" },
+ "plots_data": {
+ "type": "object",
+ "patternProperties": {
+ "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": {
+ "type": "object",
+ "properties": {
+ "cpuUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ "memoryUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ },
+ "required": []
+ }
+ },
+ "required": []
+ }
+ },
+ "required": ["datapoints", "plots_data"]
}
},
"required": []
@@ -638,6 +681,49 @@
}
},
"required": []
+ },
+ "plots": {
+ "type": "object",
+ "properties": {
+ "datapoints": { "type": "number" },
+ "plots_data": {
+ "type": "object",
+ "patternProperties": {
+ "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": {
+ "type": "object",
+ "properties": {
+ "cpuUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ "memoryUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ },
+ "required": []
+ }
+ },
+ "required": []
+ }
+ },
+ "required": ["datapoints", "plots_data"]
}
},
"required": []
@@ -905,6 +991,49 @@
}
},
"required": []
+ },
+ "plots": {
+ "type": "object",
+ "properties": {
+ "datapoints": { "type": "number" },
+ "plots_data": {
+ "type": "object",
+ "patternProperties": {
+ "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": {
+ "type": "object",
+ "properties": {
+ "cpuUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ "memoryUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ },
+ "required": []
+ }
+ },
+ "required": []
+ }
+ },
+ "required": ["datapoints", "plots_data"]
}
},
"required": []
diff --git a/tests/scripts/remote_monitoring_tests/helpers/fixtures.py b/tests/scripts/helpers/fixtures.py
similarity index 100%
rename from tests/scripts/remote_monitoring_tests/helpers/fixtures.py
rename to tests/scripts/helpers/fixtures.py
diff --git a/tests/scripts/local_monitoring_tests/helpers/generate_datasource_json.py b/tests/scripts/helpers/generate_datasource_json.py
similarity index 85%
rename from tests/scripts/local_monitoring_tests/helpers/generate_datasource_json.py
rename to tests/scripts/helpers/generate_datasource_json.py
index ec1c773a7..67537e601 100644
--- a/tests/scripts/local_monitoring_tests/helpers/generate_datasource_json.py
+++ b/tests/scripts/helpers/generate_datasource_json.py
@@ -18,7 +18,7 @@ def generate_datasource_json(csv_file, json_file):
with open(json_file, 'w') as jsonfile:
json.dump(datasources, jsonfile, indent=4)
-csv_file_path = '../csv_data/datasources.csv'
-json_file_path = '../json_files/datasources.json'
+csv_file_path = '../local_monitoring_tests/csv_data/datasources.csv'
+json_file_path = '../local_monitoring_tests/json_files/datasources.json'
generate_datasource_json(csv_file_path, json_file_path)
diff --git a/tests/scripts/remote_monitoring_tests/helpers/generate_rm_jsons.py b/tests/scripts/helpers/generate_rm_jsons.py
similarity index 99%
rename from tests/scripts/remote_monitoring_tests/helpers/generate_rm_jsons.py
rename to tests/scripts/helpers/generate_rm_jsons.py
index 833773905..1cc481a99 100644
--- a/tests/scripts/remote_monitoring_tests/helpers/generate_rm_jsons.py
+++ b/tests/scripts/helpers/generate_rm_jsons.py
@@ -31,7 +31,7 @@ def convert_date_format(input_date_str):
output_date_str = input_date.strftime("%Y-%m-%dT%H:%M:%S.000Z")
return output_date_str
-def create_exp_jsons(split = False, split_count = 1, exp_json_dir = "/tmp/exp_jsons", total_exps = 10):
+def create_exp_jsons(split = False, split_count = 1, exp_json_dir = "/tmp/exp_jsons", total_exps = 10, target_cluster="remote"):
complete_json_data = []
single_json_data = []
multi_json_data = []
diff --git a/tests/scripts/helpers/import_metadata_json_schema.py b/tests/scripts/helpers/import_metadata_json_schema.py
new file mode 100644
index 000000000..a81961494
--- /dev/null
+++ b/tests/scripts/helpers/import_metadata_json_schema.py
@@ -0,0 +1,36 @@
+import_metadata_json_schema = {
+ "type": "object",
+ "properties": {
+ "datasources": {
+ "type": "object",
+ "patternProperties": {
+ "^[a-zA-Z0-9_-]+$": {
+ "type": "object",
+ "properties": {
+ "datasource_name": {
+ "type": "string",
+ "pattern": "^[a-zA-Z0-9_-]+$"
+ },
+ "clusters": {
+ "type": "object",
+ "patternProperties": {
+ "^[a-zA-Z0-9_-]+$": {
+ "type": "object",
+ "properties": {
+ "cluster_name": {
+ "type": "string",
+ "pattern": "^[a-zA-Z0-9_-]+$"
+ }
+ },
+ "required": ["cluster_name"]
+ }
+ }
+ }
+ },
+ "required": ["datasource_name", "clusters"]
+ }
+ }
+ }
+ },
+ "required": ["datasources"]
+}
diff --git a/tests/scripts/helpers/import_metadata_json_validate.py b/tests/scripts/helpers/import_metadata_json_validate.py
new file mode 100644
index 000000000..3772228ff
--- /dev/null
+++ b/tests/scripts/helpers/import_metadata_json_validate.py
@@ -0,0 +1,68 @@
+"""
+Copyright (c) 2023, 2023 Red Hat, IBM Corporation and others.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
+import json
+import jsonschema
+from jsonschema import FormatChecker
+from jsonschema.exceptions import ValidationError
+from helpers.import_metadata_json_schema import import_metadata_json_schema
+
+JSON_NULL_VALUES = ("is not of type 'string'", "is not of type 'integer'", "is not of type 'number'")
+VALUE_MISSING = " cannot be empty or null!"
+
+def validate_import_metadata_json(import_metadata_json, json_schema):
+ errorMsg = ""
+ try:
+ # create a validator with the format checker
+ print("Validating json against the json schema...")
+ validator = jsonschema.Draft7Validator(json_schema, format_checker=FormatChecker())
+
+ # validate the JSON data against the schema
+ errors = ""
+ errors = list(validator.iter_errors(import_metadata_json))
+ print("Validating json against the json schema...done")
+ errorMsg = validate_import_metadata_json_values(import_metadata_json)
+
+ if errors:
+ custom_err = ValidationError(errorMsg)
+ errors.append(custom_err)
+ return errors
+ else:
+ return errorMsg
+ except ValidationError as err:
+ print("Received a VaidationError")
+
+ # Check if the exception is due to empty or null required parameters and prepare the response accordingly
+ if any(word in err.message for word in JSON_NULL_VALUES):
+ errorMsg = "Parameters" + VALUE_MISSING
+ return errorMsg
+ # Modify the error response in case of additional properties error
+ elif str(err.message).__contains__('('):
+ errorMsg = str(err.message).split('(')
+ return errorMsg[0]
+ else:
+ return err.message
+
+def validate_import_metadata_json_values(metadata):
+ validationErrorMsg = ""
+
+ for key in metadata.keys():
+
+ # Check if any of the key is empty or null
+ if not (str(metadata[key]) and str(metadata[key]).strip()):
+ validationErrorMsg = ",".join([validationErrorMsg, "Parameters" + VALUE_MISSING])
+
+ return validationErrorMsg.lstrip(',')
+
diff --git a/tests/scripts/remote_monitoring_tests/helpers/kruize.py b/tests/scripts/helpers/kruize.py
similarity index 73%
rename from tests/scripts/remote_monitoring_tests/helpers/kruize.py
rename to tests/scripts/helpers/kruize.py
index 74fc89a3b..029e6eaba 100644
--- a/tests/scripts/remote_monitoring_tests/helpers/kruize.py
+++ b/tests/scripts/helpers/kruize.py
@@ -239,3 +239,95 @@ def list_experiments(results=None, recommendations=None, latest=None, experiment
response = requests.get(url)
print("Response status code = ", response.status_code)
return response
+
+
+# Description: This function obtains the list of datasources from Kruize Autotune using datasources API
+# Input Parameters: None
+def list_datasources(name=None):
+ print("\nListing the datasources...")
+ query_params = {}
+
+ if name is not None:
+ query_params['name'] = name
+
+ query_string = "&".join(f"{key}={value}" for key, value in query_params.items())
+
+ url = URL + "/datasources"
+ if query_string:
+ url += "?" + query_string
+ print("URL = ", url)
+ response = requests.get(url)
+
+ print("PARAMS = ", query_params)
+ print("Response status code = ", response.status_code)
+ print("\n************************************************************")
+ print(response.text)
+ print("\n************************************************************")
+ return response
+
+
+# Description: This function validates the input json and imports metadata using POST dsmetadata API to Kruize Autotune
+# Input Parameters: datasource input json
+def import_metadata(input_json_file, invalid_header=False):
+ json_file = open(input_json_file, "r")
+ input_json = json.loads(json_file.read())
+ print("\n************************************************************")
+ pretty_json_str = json.dumps(input_json, indent=4)
+ print(pretty_json_str)
+ print("\n************************************************************")
+
+ # read the json
+ print("\nImporting the metadata...")
+
+ url = URL + "/dsmetadata"
+ print("URL = ", url)
+
+ headers = {'content-type': 'application/xml'}
+ if invalid_header:
+ print("Invalid header")
+ response = requests.post(url, json=input_json, headers=headers)
+ else:
+ response = requests.post(url, json=input_json)
+
+ print("Response status code = ", response.status_code)
+ try:
+ # Parse the response content as JSON into a Python dictionary
+ response_json = response.json()
+
+ # Check if the response_json is a valid JSON object or array
+ if isinstance(response_json, (dict, list)):
+ # Convert the response_json back to a JSON-formatted string with double quotes and pretty print it
+ pretty_response_json_str = json.dumps(response_json, indent=4)
+
+ # Print the JSON string
+ print(pretty_response_json_str)
+ else:
+ print("Invalid JSON format in the response.")
+ print(response.text) # Print the response text as-is
+ except json.JSONDecodeError:
+ print("Response content is not valid JSON.")
+ print(response.text) # Print the response text as-is
+ return response
+
+
+# Description: This function deletes the metadata and posts the metadata using dsmetadata API to Kruize Autotune
+# Input Parameters: datasource input json
+def delete_metadata(input_json_file, invalid_header=False):
+ json_file = open(input_json_file, "r")
+ input_json = json.loads(json_file.read())
+
+ print("\nDeleting the metadata...")
+
+ url = URL + "/dsmetadata"
+ print("URL = ", url)
+
+ headers = {'content-type': 'application/xml'}
+ if invalid_header:
+ print("Invalid header")
+ response = requests.delete(url, json=input_json, headers=headers)
+ else:
+ response = requests.delete(url, json=input_json)
+
+ print(response)
+ print("Response status code = ", response.status_code)
+ return response
\ No newline at end of file
diff --git a/tests/scripts/helpers/list_datasources_json_schema.py b/tests/scripts/helpers/list_datasources_json_schema.py
new file mode 100644
index 000000000..3b14c4069
--- /dev/null
+++ b/tests/scripts/helpers/list_datasources_json_schema.py
@@ -0,0 +1,34 @@
+list_datasources_json_schema = {
+ "type": "object",
+ "properties": {
+ "version": {
+ "type": "string"
+ },
+ "datasources": {
+ "type": "array",
+ "items": {
+ "type": "object",
+ "properties": {
+ "name": {
+ "type": "string"
+ },
+ "provider": {
+ "type": "string"
+ },
+ "serviceName": {
+ "type": "string"
+ },
+ "namespace": {
+ "type": "string"
+ },
+ "url": {
+ "type": "string",
+ "format": "uri"
+ }
+ },
+ "required": ["name", "provider", "serviceName", "namespace", "url"]
+ }
+ }
+ },
+ "required": ["version", "datasources"]
+}
diff --git a/tests/scripts/helpers/list_datasources_json_validate.py b/tests/scripts/helpers/list_datasources_json_validate.py
new file mode 100644
index 000000000..d5e538625
--- /dev/null
+++ b/tests/scripts/helpers/list_datasources_json_validate.py
@@ -0,0 +1,81 @@
+"""
+Copyright (c) 2023, 2023 Red Hat, IBM Corporation and others.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
+import json
+import jsonschema
+from jsonschema import FormatChecker
+from jsonschema.exceptions import ValidationError
+from helpers.list_datasources_json_schema import list_datasources_json_schema
+
+#TODO - currently only prometheus datasurce provider is supported
+DATASOURCE_TYPE_SUPPORTED = "prometheus"
+
+JSON_NULL_VALUES = ("is not of type 'string'", "is not of type 'integer'", "is not of type 'number'")
+VALUE_MISSING = " cannot be empty or null!"
+
+def validate_list_datasources_json(list_datasources_json, json_schema):
+ errorMsg = ""
+ try:
+ # create a validator with the format checker
+ print("Validating json against the json schema...")
+ validator = jsonschema.Draft7Validator(json_schema, format_checker=FormatChecker())
+
+ # validate the JSON data against the schema
+ errors = ""
+ errors = list(validator.iter_errors(list_datasources_json))
+ print("Validating json against the json schema...done")
+ errorMsg = validate_list_datasources_json_values(list_datasources_json)
+
+ if errors:
+ custom_err = ValidationError(errorMsg)
+ errors.append(custom_err)
+ return errors
+ else:
+ return errorMsg
+ except ValidationError as err:
+ print("Received a VaidationError")
+
+ # Check if the exception is due to empty or null required parameters and prepare the response accordingly
+ if any(word in err.message for word in JSON_NULL_VALUES):
+ errorMsg = "Parameters" + VALUE_MISSING
+ return errorMsg
+ # Modify the error response in case of additional properties error
+ elif str(err.message).__contains__('('):
+ errorMsg = str(err.message).split('(')
+ return errorMsg[0]
+ else:
+ return err.message
+
+def validate_list_datasources_json_values(list_datasources_json):
+ validationErrorMsg = ""
+ obj_arr = ["datasources"]
+
+ for key in list_datasources_json.keys():
+
+ # Check if any of the key is empty or null
+ if not (str(list_datasources_json[key]) and str(list_datasources_json[key]).strip()):
+ validationErrorMsg = ",".join([validationErrorMsg, "Parameters" + VALUE_MISSING])
+
+ for obj in obj_arr:
+ if obj == key:
+ for subkey in list_datasources_json[key][0].keys():
+ # Check if any of the key is empty or null
+ if not (str(list_datasources_json[key][0][subkey]) and str(list_datasources_json[key][0][subkey]).strip()):
+ print(f"FAILED - {str(list_datasources_json[key][0][subkey])} is empty or null")
+ validationErrorMsg = ",".join([validationErrorMsg, "Parameters" + VALUE_MISSING])
+ elif str(subkey) == "provider" and str(list_datasources_json[key][0][subkey]) not in DATASOURCE_TYPE_SUPPORTED:
+ validationErrorMsg = ",".join([validationErrorMsg, DATASOURCE_TYPE_SUPPORTED])
+
+ return validationErrorMsg.lstrip(',')
diff --git a/tests/scripts/remote_monitoring_tests/helpers/list_reco_json_schema.py b/tests/scripts/helpers/list_reco_json_schema.py
similarity index 89%
rename from tests/scripts/remote_monitoring_tests/helpers/list_reco_json_schema.py
rename to tests/scripts/helpers/list_reco_json_schema.py
index cc535da27..24d7ff61c 100644
--- a/tests/scripts/remote_monitoring_tests/helpers/list_reco_json_schema.py
+++ b/tests/scripts/helpers/list_reco_json_schema.py
@@ -371,6 +371,49 @@
}
},
"required": []
+ },
+ "plots": {
+ "type": "object",
+ "properties": {
+ "datapoints": { "type": "number" },
+ "plots_data": {
+ "type": "object",
+ "patternProperties": {
+ "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": {
+ "type": "object",
+ "properties": {
+ "cpuUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ "memoryUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ },
+ "required": []
+ }
+ },
+ "required": []
+ }
+ },
+ "required": ["datapoints", "plots_data"]
}
},
"required": []
diff --git a/tests/scripts/remote_monitoring_tests/helpers/list_reco_json_validate.py b/tests/scripts/helpers/list_reco_json_validate.py
similarity index 100%
rename from tests/scripts/remote_monitoring_tests/helpers/list_reco_json_validate.py
rename to tests/scripts/helpers/list_reco_json_validate.py
diff --git a/tests/scripts/remote_monitoring_tests/helpers/long_term_list_reco_json_schema.py b/tests/scripts/helpers/long_term_list_reco_json_schema.py
similarity index 89%
rename from tests/scripts/remote_monitoring_tests/helpers/long_term_list_reco_json_schema.py
rename to tests/scripts/helpers/long_term_list_reco_json_schema.py
index 6f6ce48bf..abe4d2a9a 100644
--- a/tests/scripts/remote_monitoring_tests/helpers/long_term_list_reco_json_schema.py
+++ b/tests/scripts/helpers/long_term_list_reco_json_schema.py
@@ -409,6 +409,49 @@
}
},
"required": []
+ },
+ "plots": {
+ "type": "object",
+ "properties": {
+ "datapoints": { "type": "number" },
+ "plots_data": {
+ "type": "object",
+ "patternProperties": {
+ "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": {
+ "type": "object",
+ "properties": {
+ "cpuUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ "memoryUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ },
+ "required": []
+ }
+ },
+ "required": []
+ }
+ },
+ "required": ["datapoints", "plots_data"]
}
},
"required": []
diff --git a/tests/scripts/remote_monitoring_tests/helpers/medium_and_long_term_list_reco_json_schema.py b/tests/scripts/helpers/medium_and_long_term_list_reco_json_schema.py
similarity index 88%
rename from tests/scripts/remote_monitoring_tests/helpers/medium_and_long_term_list_reco_json_schema.py
rename to tests/scripts/helpers/medium_and_long_term_list_reco_json_schema.py
index 3aeee3123..9f6ae28df 100644
--- a/tests/scripts/remote_monitoring_tests/helpers/medium_and_long_term_list_reco_json_schema.py
+++ b/tests/scripts/helpers/medium_and_long_term_list_reco_json_schema.py
@@ -390,6 +390,49 @@
}
},
"required": []
+ },
+ "plots": {
+ "type": "object",
+ "properties": {
+ "datapoints": { "type": "number" },
+ "plots_data": {
+ "type": "object",
+ "patternProperties": {
+ "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": {
+ "type": "object",
+ "properties": {
+ "cpuUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ "memoryUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ },
+ "required": []
+ }
+ },
+ "required": []
+ }
+ },
+ "required": ["datapoints", "plots_data"]
}
},
"required": []
@@ -657,6 +700,49 @@
}
},
"required": []
+ },
+ "plots": {
+ "type": "object",
+ "properties": {
+ "datapoints": { "type": "number" },
+ "plots_data": {
+ "type": "object",
+ "patternProperties": {
+ "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": {
+ "type": "object",
+ "properties": {
+ "cpuUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ "memoryUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ },
+ "required": []
+ }
+ },
+ "required": []
+ }
+ },
+ "required": ["datapoints", "plots_data"]
}
},
"required": []
diff --git a/tests/scripts/remote_monitoring_tests/helpers/medium_term_list_reco_json_schema.py b/tests/scripts/helpers/medium_term_list_reco_json_schema.py
similarity index 89%
rename from tests/scripts/remote_monitoring_tests/helpers/medium_term_list_reco_json_schema.py
rename to tests/scripts/helpers/medium_term_list_reco_json_schema.py
index 0552d8d94..e04cb4977 100644
--- a/tests/scripts/remote_monitoring_tests/helpers/medium_term_list_reco_json_schema.py
+++ b/tests/scripts/helpers/medium_term_list_reco_json_schema.py
@@ -390,6 +390,49 @@
}
},
"required": []
+ },
+ "plots": {
+ "type": "object",
+ "properties": {
+ "datapoints": { "type": "number" },
+ "plots_data": {
+ "type": "object",
+ "patternProperties": {
+ "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": {
+ "type": "object",
+ "properties": {
+ "cpuUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ "memoryUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ },
+ "required": []
+ }
+ },
+ "required": []
+ }
+ },
+ "required": ["datapoints", "plots_data"]
}
},
"required": []
diff --git a/tests/scripts/remote_monitoring_tests/helpers/short_and_long_term_list_reco_json_schema.py b/tests/scripts/helpers/short_and_long_term_list_reco_json_schema.py
similarity index 88%
rename from tests/scripts/remote_monitoring_tests/helpers/short_and_long_term_list_reco_json_schema.py
rename to tests/scripts/helpers/short_and_long_term_list_reco_json_schema.py
index 1fa01cd36..264e24e48 100644
--- a/tests/scripts/remote_monitoring_tests/helpers/short_and_long_term_list_reco_json_schema.py
+++ b/tests/scripts/helpers/short_and_long_term_list_reco_json_schema.py
@@ -371,6 +371,49 @@
}
},
"required": []
+ },
+ "plots": {
+ "type": "object",
+ "properties": {
+ "datapoints": { "type": "number" },
+ "plots_data": {
+ "type": "object",
+ "patternProperties": {
+ "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": {
+ "type": "object",
+ "properties": {
+ "cpuUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ "memoryUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ },
+ "required": []
+ }
+ },
+ "required": []
+ }
+ },
+ "required": ["datapoints", "plots_data"]
}
},
"required": []
@@ -657,6 +700,49 @@
}
},
"required": []
+ },
+ "plots": {
+ "type": "object",
+ "properties": {
+ "datapoints": { "type": "number" },
+ "plots_data": {
+ "type": "object",
+ "patternProperties": {
+ "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": {
+ "type": "object",
+ "properties": {
+ "cpuUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ "memoryUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ },
+ "required": []
+ }
+ },
+ "required": []
+ }
+ },
+ "required": ["datapoints", "plots_data"]
}
},
"required": []
diff --git a/tests/scripts/remote_monitoring_tests/helpers/short_and_medium_term_list_reco_json_schema.py b/tests/scripts/helpers/short_and_medium_term_list_reco_json_schema.py
similarity index 88%
rename from tests/scripts/remote_monitoring_tests/helpers/short_and_medium_term_list_reco_json_schema.py
rename to tests/scripts/helpers/short_and_medium_term_list_reco_json_schema.py
index d7a7b37aa..d85f8e06d 100644
--- a/tests/scripts/remote_monitoring_tests/helpers/short_and_medium_term_list_reco_json_schema.py
+++ b/tests/scripts/helpers/short_and_medium_term_list_reco_json_schema.py
@@ -371,6 +371,49 @@
}
},
"required": []
+ },
+ "plots": {
+ "type": "object",
+ "properties": {
+ "datapoints": { "type": "number" },
+ "plots_data": {
+ "type": "object",
+ "patternProperties": {
+ "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": {
+ "type": "object",
+ "properties": {
+ "cpuUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ "memoryUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ },
+ "required": []
+ }
+ },
+ "required": []
+ }
+ },
+ "required": ["datapoints", "plots_data"]
}
},
"required": []
@@ -638,6 +681,49 @@
}
},
"required": []
+ },
+ "plots": {
+ "type": "object",
+ "properties": {
+ "datapoints": { "type": "number" },
+ "plots_data": {
+ "type": "object",
+ "patternProperties": {
+ "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": {
+ "type": "object",
+ "properties": {
+ "cpuUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ "memoryUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ },
+ "required": []
+ }
+ },
+ "required": []
+ }
+ },
+ "required": ["datapoints", "plots_data"]
}
},
"required": []
diff --git a/tests/scripts/remote_monitoring_tests/helpers/short_term_list_reco_json_schema.py b/tests/scripts/helpers/short_term_list_reco_json_schema.py
similarity index 89%
rename from tests/scripts/remote_monitoring_tests/helpers/short_term_list_reco_json_schema.py
rename to tests/scripts/helpers/short_term_list_reco_json_schema.py
index be87cc882..7867ff064 100644
--- a/tests/scripts/remote_monitoring_tests/helpers/short_term_list_reco_json_schema.py
+++ b/tests/scripts/helpers/short_term_list_reco_json_schema.py
@@ -371,6 +371,49 @@
}
},
"required": []
+ },
+ "plots": {
+ "type": "object",
+ "properties": {
+ "datapoints": { "type": "number" },
+ "plots_data": {
+ "type": "object",
+ "patternProperties": {
+ "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}.\\d{3}Z$": {
+ "type": "object",
+ "properties": {
+ "cpuUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ "memoryUsage": {
+ "type": "object",
+ "properties": {
+ "min": { "type": "number" },
+ "q1": { "type": "number" },
+ "median": { "type": "number" },
+ "q3": { "type": "number" },
+ "max": { "type": "number" },
+ "format": { "type": "string" }
+ },
+ "required": ["min", "q1", "median", "q3", "max", "format"]
+ },
+ },
+ "required": []
+ }
+ },
+ "required": []
+ }
+ },
+ "required": ["datapoints", "plots_data"]
}
},
"required": []
diff --git a/tests/scripts/remote_monitoring_tests/helpers/utils.py b/tests/scripts/helpers/utils.py
similarity index 94%
rename from tests/scripts/remote_monitoring_tests/helpers/utils.py
rename to tests/scripts/helpers/utils.py
index 4c00da192..53b0899c7 100644
--- a/tests/scripts/remote_monitoring_tests/helpers/utils.py
+++ b/tests/scripts/helpers/utils.py
@@ -1,5 +1,5 @@
"""
-Copyright (c) 2022, 2022 Red Hat, IBM Corporation and others.
+Copyright (c) 2022, 2024 Red Hat, IBM Corporation and others.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
@@ -49,6 +49,7 @@
COST_RECOMMENDATIONS_AVAILABLE = "Cost Recommendations Available"
PERFORMANCE_RECOMMENDATIONS_AVAILABLE = "Performance Recommendations Available"
CONTAINER_AND_EXPERIMENT_NAME = " for container : %s for experiment: %s.]"
+LIST_DATASOURCES_ERROR_MSG = "Given datasource name - \" %s \" either does not exist or is not valid"
# Kruize Recommendations Notification codes
NOTIFICATION_CODE_FOR_RECOMMENDATIONS_AVAILABLE = "111000"
@@ -137,6 +138,10 @@
MEDIUM_TERM_TEST = "medium_term_test"
LONG_TERM_TEST = "long_term_test"
+PLOTS = "plots"
+DATA_POINTS = "datapoints"
+PLOTS_DATA = "plots_data"
+
TERMS_NOTIFICATION_CODES = {
SHORT_TERM: NOTIFICATION_CODE_FOR_SHORT_TERM_RECOMMENDATIONS_AVAILABLE,
MEDIUM_TERM: NOTIFICATION_CODE_FOR_MEDIUM_TERM_RECOMMENDATIONS_AVAILABLE,
@@ -213,6 +218,12 @@
"memoryRSS_format": "MiB"
}
+# version, datasource_name
+import_metadata_test_data = {
+ "version": "v1.0",
+ "datasource_name": "prometheus-1",
+}
+
test_type = {"blank": "", "null": "null", "invalid": "xyz"}
aggr_info_keys_to_skip = ["cpuRequest_sum", "cpuRequest_avg", "cpuLimit_sum", "cpuLimit_avg", "cpuUsage_sum", "cpuUsage_max",
@@ -410,13 +421,13 @@ def validate_reco_json(create_exp_json, update_results_json, list_reco_json, exp
update_results_kubernetes_obj = update_results_json[0]["kubernetes_objects"][i]
create_exp_kubernetes_obj = create_exp_json["kubernetes_objects"][i]
list_reco_kubernetes_obj = list_reco_json["kubernetes_objects"][i]
- validate_kubernetes_obj(create_exp_kubernetes_obj, update_results_kubernetes_obj, update_results_json, \
+ validate_kubernetes_obj(create_exp_kubernetes_obj, update_results_kubernetes_obj, update_results_json,
list_reco_kubernetes_obj, expected_duration_in_hours, test_name)
else:
update_results_kubernetes_obj = None
create_exp_kubernetes_obj = create_exp_json["kubernetes_objects"][0]
list_reco_kubernetes_obj = list_reco_json["kubernetes_objects"][0]
- validate_kubernetes_obj(create_exp_kubernetes_obj, update_results_kubernetes_obj, update_results_json, \
+ validate_kubernetes_obj(create_exp_kubernetes_obj, update_results_kubernetes_obj, update_results_json,
list_reco_kubernetes_obj, expected_duration_in_hours, test_name)
@@ -480,7 +491,8 @@ def validate_kubernetes_obj(create_exp_kubernetes_obj, update_results_kubernetes
expected_duration_in_hours, test_name)
-def validate_container(update_results_container, update_results_json, list_reco_container, expected_duration_in_hours, test_name):
+def validate_container(update_results_container, update_results_json, list_reco_container, expected_duration_in_hours,
+ test_name):
# Validate container image name and container name
if update_results_container != None and list_reco_container != None:
assert list_reco_container["container_image_name"] == update_results_container["container_image_name"], \
@@ -514,8 +526,8 @@ def validate_container(update_results_container, update_results_json, list_reco_
terms_obj = list_reco_container["recommendations"]["data"][interval_end_time]["recommendation_terms"]
current_config = list_reco_container["recommendations"]["data"][interval_end_time]["current"]
- duration_terms = ["short_term", "medium_term", "long_term"]
- for term in duration_terms:
+ duration_terms = {'short_term': 4, 'medium_term': 7, 'long_term': 15}
+ for term in duration_terms.keys():
if check_if_recommendations_are_present(terms_obj[term]):
print(f"reco present for term {term}")
# Validate timestamps [deprecated as monitoring end time is moved to higher level]
@@ -557,13 +569,17 @@ def validate_container(update_results_container, update_results_json, list_reco_
recommendation_engines_object = None
if "recommendation_engines" in terms_obj[term]:
recommendation_engines_object = terms_obj[term]["recommendation_engines"]
- if None != recommendation_engines_object:
+ if recommendation_engines_object is not None:
for engine_entry in engines_list:
if engine_entry in terms_obj[term]["recommendation_engines"]:
engine_obj = terms_obj[term]["recommendation_engines"][engine_entry]
validate_config(engine_obj["config"], metrics)
validate_variation(current_config, engine_obj["config"], engine_obj["variation"])
-
+ # validate Plots data
+ validate_plots(terms_obj, duration_terms, term)
+ # verify that plots isn't generated in case of no recommendations
+ else:
+ assert PLOTS not in terms_obj[term], f"Expected plots to be absent in case of no recommendations"
else:
data = list_reco_container["recommendations"]["data"]
assert len(data) == 0, f"Data is not empty! Length of data - Actual = {len(data)} expected = 0"
@@ -574,6 +590,21 @@ def validate_container(update_results_container, update_results_json, list_reco_
assert result == False, f"Recommendations notifications does not contain the expected message - {NOT_ENOUGH_DATA_MSG}"
+def validate_plots(terms_obj, duration_terms, term):
+ plots = terms_obj[term][PLOTS]
+ datapoint = plots[DATA_POINTS]
+ plots_data = plots[PLOTS_DATA]
+
+ assert plots is not None, f"Expected plots to be available"
+ assert datapoint is not None, f"Expected datapoint to be available"
+ # validate the count of data points for the specific term
+ assert datapoint == duration_terms[term], f"datapoint Expected: {duration_terms[term]}, Obtained: {datapoint}"
+ assert len(plots_data) == duration_terms[term], f"plots_data size Expected: {duration_terms[term]}, Obtained: {len(plots_data)}"
+ # TODO: validate the datapoint JSON objects
+ # TODO: validate the actual JSONs present, how many are empty for each term, this should be passed as an input
+ # TODO: validate the format value against the results metrics
+
+
def set_duration_based_on_terms(duration_in_hours, term, interval_start_time, interval_end_time):
diff = time_diff_in_hours(interval_start_time, interval_end_time)
duration_in_hours += diff
diff --git a/tests/scripts/local_monitoring_tests/Local_monitoring_tests.md b/tests/scripts/local_monitoring_tests/Local_monitoring_tests.md
new file mode 100644
index 000000000..7d23513bf
--- /dev/null
+++ b/tests/scripts/local_monitoring_tests/Local_monitoring_tests.md
@@ -0,0 +1,95 @@
+# **Kruize Local monitoring tests**
+
+Kruize Local monitoring tests validates the behaviour of [Kruize local monitoring APIs](/design/KruizeLocalAPI.md)
+using various positive and negative scenarios. These tests are developed using pytest framework.
+
+## Tests description
+### **List Datasources API tests**
+
+Here are the test scenarios:
+- List all datasources
+- List datasources with name query parameter:
+ - /datasources?name=
+- List datasources with invalid parameter value for datasource name tested with empty, NULL or invalid.
+
+### **Import Metadata API tests**
+
+Here are the test scenarios:
+
+- Importing metadata for a valid datasource to the API.
+- Post the same datasource again
+- Test with invalid values such as blank, null or an invalid value for various keys in the dsmetadata input request json
+- Validate error messages when the mandatory fields are missing
+
+The above tests are developed using pytest framework and the tests are run using shell script wrapper that does the following:
+- Deploys kruize in non-CRD mode using the [deploy script](https://github.com/kruize/autotune/blob/master/deploy.sh) from the autotune repo
+- Creates a resource optimization performance profile using the [createPerformanceProfile API](/design/PerformanceProfileAPI.md)
+- Runs the above tests using pytest
+
+## Prerequisites for running the tests:
+- Minikube setup or access to Openshift cluster
+- Tools like kubectl, oc, curl, jq, python
+- Various python modules pytest, json, pytest-html, requests, jinja2
+ (these modules will be automatically installed while the test is run)
+
+## How to run the test?
+
+Use the below command to test :
+
+```
+/tests/test_autotune.sh -c minikube -r [location of benchmarks] [-i kruize image] [--tctype=functional] [--testmodule=Autotune module to be tested] [--testsuite=Group of tests that you want to perform] [--testcase=Particular test case that you want to test] [-n namespace] [--resultsdir=results directory] [--skipsetup]
+```
+
+Where values for test_autotune.sh are:
+
+```
+usage: test_autotune.sh [ -c ] : cluster type. Supported type - minikube, openshift. Default - minikube
+ [ -i ] : optional. Kruize docker image to be used for testing, default - kruize/autotune_operator:test
+ [ -r ] : Location of benchmarks. Not required for local_monitoring_tests
+ [ --tctype ] : optional. Testcases type to run, default is functional (runs all functional tests)
+ [ --testmodule ]: Module to be tested. Use testmodule=help, to list the modules to be tested
+ [ --testsuite ] : Testsuite to run. Use testsuite=help, to list the supported testsuites
+ [ --testcase ] : Testcase to run. Use testcase=help along with the testsuite name to list the supported testcases in that testsuite
+ [ -n ] : optional. Namespace to deploy autotune
+ [ --resultsdir ] : optional. Results directory location, by default it creates the results directory in current working directory
+ [ --skipsetup ] : optional. Specifying this option skips the Kruize setup and performance profile creation in case of local_monitoring_tests
+
+Note: If you want to run a particular testcase then it is mandatory to specify the testsuite
+Test cases supported are sanity, negative, extended and test_e2e
+
+```
+
+To run all the local monitoring tests,
+
+```
+/tests/test_autotune.sh -c minikube --testsuite=local_monitoring_tests --resultsdir=/home/results
+```
+
+To run only the sanity local monitoring tests,
+
+```
+/tests/test_autotune.sh -c minikube --testsuite=local_monitoring_tests --testcase=sanity --resultsdir=/home/results
+```
+
+Local monitoring tests can also be run without using the test_autotune.sh. To do this, follow the below steps:
+
+- Deploy Kruize using the deploy.sh from the kruize autotune repo
+- Create the performance profile by using the [createPerformanceProfile API](/design/PerformanceProfileAPI.md)
+- cd /tests/scripts/local_monitoring_tests
+- python3 -m pip install --user -r requirements.txt
+- cd rest_apis
+- To run all sanity tests
+```
+ pytest -m sanity --html=/report.html --cluster_type
+```
+- To run only sanity tests for List datasources API --cluster_type
+```
+ pytest -m sanity --html=/report.html test_list_datasources.py
+```
+- To run only a specific test within List datasources API
+```
+ pytest -s test_list_datasources.py::test_list_datasources_with_name --cluster_type
+```
+
+Note: You can check the report.html for the results as it provides better readability
+
diff --git a/tests/scripts/local_monitoring_tests/conftest.py b/tests/scripts/local_monitoring_tests/conftest.py
new file mode 100644
index 000000000..b03f52085
--- /dev/null
+++ b/tests/scripts/local_monitoring_tests/conftest.py
@@ -0,0 +1,5 @@
+def pytest_addoption(parser):
+ parser.addoption(
+ '--cluster_type', action='store', default='minikube', help='Cluster type'
+ )
+
diff --git a/tests/scripts/local_monitoring_tests/json_files/import_metadata.json b/tests/scripts/local_monitoring_tests/json_files/import_metadata.json
new file mode 100644
index 000000000..8a6ff0d0d
--- /dev/null
+++ b/tests/scripts/local_monitoring_tests/json_files/import_metadata.json
@@ -0,0 +1,5 @@
+{
+ "version": "v1.0",
+ "datasource_name": "prometheus-1"
+}
+
diff --git a/tests/scripts/local_monitoring_tests/json_files/import_metadata_mandatory.json b/tests/scripts/local_monitoring_tests/json_files/import_metadata_mandatory.json
new file mode 100644
index 000000000..b1b5d9ef5
--- /dev/null
+++ b/tests/scripts/local_monitoring_tests/json_files/import_metadata_mandatory.json
@@ -0,0 +1,4 @@
+{
+ "version": "v1.0",
+ "datasource_name": "prometheus-1"
+}
diff --git a/tests/scripts/local_monitoring_tests/json_files/import_metadata_template.json b/tests/scripts/local_monitoring_tests/json_files/import_metadata_template.json
new file mode 100644
index 000000000..f36ad6cf1
--- /dev/null
+++ b/tests/scripts/local_monitoring_tests/json_files/import_metadata_template.json
@@ -0,0 +1,4 @@
+{
+ "version": "{{version}}",
+ "datasource_name": "{{datasource_name}}"
+}
diff --git a/tests/scripts/local_monitoring_tests/json_files/resource_optimization_openshift.json b/tests/scripts/local_monitoring_tests/json_files/resource_optimization_openshift.json
new file mode 100644
index 000000000..6949385be
--- /dev/null
+++ b/tests/scripts/local_monitoring_tests/json_files/resource_optimization_openshift.json
@@ -0,0 +1,194 @@
+{
+ "name": "resource-optimization-openshift",
+ "profile_version": 1,
+ "k8s_type": "openshift",
+ "slo": {
+ "slo_class": "resource_usage",
+ "direction": "minimize",
+ "objective_function": {
+ "function_type": "source"
+ },
+ "function_variables": [
+ {
+ "name": "cpuRequest",
+ "datasource": "prometheus",
+ "value_type": "double",
+ "kubernetes_object": "container",
+ "aggregation_functions": [
+ {
+ "function": "avg",
+ "query": "avg(kube_pod_container_resource_requests{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", container=\"$CONTAINER_NAME$\", namespace=\"$NAMESPACE\", resource=\"cpu\", unit=\"core\"})"
+ },
+ {
+ "function": "sum",
+ "query": "sum(kube_pod_container_resource_requests{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", container=\"$CONTAINER_NAME$\", namespace=\"$NAMESPACE\", resource=\"cpu\", unit=\"core\"})"
+ }
+ ]
+ },
+ {
+ "name": "cpuLimit",
+ "datasource": "prometheus",
+ "value_type": "double",
+ "kubernetes_object": "container",
+ "aggregation_functions": [
+ {
+ "function": "avg",
+ "query": "avg(kube_pod_container_resource_limits{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", container=\"$CONTAINER_NAME$\", namespace=\"$NAMESPACE\", resource=\"cpu\", unit=\"core\"})"
+ },
+ {
+ "function": "sum",
+ "query": "sum(kube_pod_container_resource_limits{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", container=\"$CONTAINER_NAME$\", namespace=\"$NAMESPACE$\", resource=\"cpu\", unit=\"core\"})"
+ }
+ ]
+ },
+ {
+ "name": "cpuUsage",
+ "datasource": "prometheus",
+ "value_type": "double",
+ "kubernetes_object": "container",
+ "aggregation_functions": [
+ {
+ "function": "avg",
+ "query": "avg(avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=”$CONTAINER_NAME$”}[15m]))",
+ "versions": "<=4.8"
+ },
+ {
+ "function": "avg",
+ "query": "avg(avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=”$CONTAINER_NAME$”}[15m]))",
+ "versions": ">4.9"
+ },
+ {
+ "function": "min",
+ "query": "min(min_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=\"$CONTAINER_NAME$\"}[15m]))",
+ "versions": "<=4.8"
+ },
+ {
+ "function": "min",
+ "query": "min(min_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=\"$CONTAINER_NAME$\"}[15m]))",
+ "versions": ">4.9"
+ },
+ {
+ "function": "max",
+ "query": "max(max_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=\"$CONTAINER_NAME$\"}[15m]))",
+ "versions": "<=4.8"
+ },
+ {
+ "function": "max",
+ "query": "max(max_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=\"$CONTAINER_NAME$\"}[15m]))",
+ "versions": ">4.9"
+ },
+ {
+ "function": "sum",
+ "query": "sum(avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=\"$CONTAINER_NAME$\"}[15m]))",
+ "versions": "<=4.8"
+ },
+ {
+ "function": "sum",
+ "query": "sum(avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=\"$CONTAINER_NAME$\"}[15m]))",
+ "versions": ">4.9"
+ }
+ ]
+ },
+ {
+ "name": "cpuThrottle",
+ "datasource": "prometheus",
+ "value_type": "double",
+ "kubernetes_object": "container",
+ "aggregation_functions": [
+ {
+ "function": "avg",
+ "query": "avg(rate(container_cpu_cfs_throttled_seconds_total{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=”$CONTAINER_NAME$”}[15m]))"
+ },
+ {
+ "function": "max",
+ "query": "max(rate(container_cpu_cfs_throttled_seconds_total{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=”$CONTAINER_NAME$”}[15m]))"
+ },
+ {
+ "function": "sum",
+ "query": "sum(rate(container_cpu_cfs_throttled_seconds_total{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=\"$NAMESPACE$\", container=”$CONTAINER_NAME$”}[15m]))"
+ }
+ ]
+ },
+ {
+ "name": "memoryRequest",
+ "datasource": "prometheus",
+ "value_type": "double",
+ "kubernetes_object": "container",
+ "aggregation_functions": [
+ {
+ "function": "avg",
+ "query": "avg(kube_pod_container_resource_requests{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", container=”$CONTAINER_NAME$”, namespace=”$NAMESPACE”, resource=\"memory\", unit=\"byte\"})"
+ },
+ {
+ "function": "sum",
+ "query": "sum(kube_pod_container_resource_requests{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", container=”$CONTAINER_NAME$”, namespace=”$NAMESPACE”, resource=\"memory\", unit=\"byte\"})"
+ }
+ ]
+ },
+ {
+ "name": "memoryLimit",
+ "datasource": "prometheus",
+ "value_type": "double",
+ "kubernetes_object": "container",
+ "aggregation_functions": [
+ {
+ "function": "avg",
+ "query": "avg(kube_pod_container_resource_limits{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", container=\"$CONTAINER_NAME$\", namespace=\"$NAMESPACE\", resource=\"memory\", unit=\"byte\"})"
+ },
+ {
+ "function": "sum",
+ "query": "sum(kube_pod_container_resource_limits{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", container=”$CONTAINER_NAME$”, namespace=”$NAMESPACE”, resource=\"memory\", unit=\"byte\"})"
+ }
+ ]
+ },
+ {
+ "name": "memoryUsage",
+ "datasource": "prometheus",
+ "value_type": "double",
+ "kubernetes_object": "container",
+ "aggregation_functions": [
+ {
+ "function": "avg",
+ "query": "avg(avg_over_time(container_memory_working_set_bytes{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=$NAMESPACE$, container=”$CONTAINER_NAME$”}[15m]))"
+ },
+ {
+ "function": "min",
+ "query": "min(min_over_time(container_memory_working_set_bytes{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=$NAMESPACE$, container=\"$CONTAINER_NAME$\"}[15m]))"
+ },
+ {
+ "function": "max",
+ "query": "max(max_over_time(container_memory_working_set_bytes{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=$NAMESPACE$, container=\"$CONTAINER_NAME$\"}[15m]))"
+ },
+ {
+ "function": "sum",
+ "query": "sum(avg_over_time(container_memory_working_set_bytes{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=$NAMESPACE$, container=\"$CONTAINER_NAME$\"}[15m]))"
+ }
+ ]
+ },
+ {
+ "name": "memoryRSS",
+ "datasource": "prometheus",
+ "value_type": "double",
+ "kubernetes_object": "container",
+ "aggregation_functions": [
+ {
+ "function": "avg",
+ "query": "avg(avg_over_time(container_memory_rss{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=$NAMESPACE$, container=”$CONTAINER_NAME$”}[15m]))"
+ },
+ {
+ "function": "min",
+ "query": "min(min_over_time(container_memory_rss{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=$NAMESPACE$, container=\"$CONTAINER_NAME$\"}[15m]))"
+ },
+ {
+ "function": "max",
+ "query": "max(max_over_time(container_memory_rss{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=$NAMESPACE$, container=\"$CONTAINER_NAME$\"}[15m]))"
+ },
+ {
+ "function": "sum",
+ "query": "sum(avg_over_time(container_memory_rss{pod=~\"$DEPLOYMENT_NAME$-[^-]*-[^-]*$\", namespace=$NAMESPACE$, container=”$CONTAINER_NAME$”}[15m]))"
+ }
+ ]
+ }
+ ]
+ }
+}
diff --git a/tests/scripts/local_monitoring_tests/local_monitoring_tests.sh b/tests/scripts/local_monitoring_tests/local_monitoring_tests.sh
new file mode 100644
index 000000000..a76a2dd3d
--- /dev/null
+++ b/tests/scripts/local_monitoring_tests/local_monitoring_tests.sh
@@ -0,0 +1,159 @@
+#!/bin/bash
+#
+# Copyright (c) 2024, 2024 Red Hat, IBM Corporation and others.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+#
+##### Script to perform basic tests for EM #####
+
+
+# Get the absolute path of current directory
+CURRENT_DIR="$(dirname "$(realpath "$0")")"
+LOCAL_MONITORING_TEST_DIR="${CURRENT_DIR}/local_monitoring_tests"
+
+# Source the common functions scripts
+. ${LOCAL_MONITORING_TEST_DIR}/../common/common_functions.sh
+
+# Tests to validate Local monitoring mode in Kruize
+function local_monitoring_tests() {
+ start_time=$(get_date)
+ FAILED_CASES=()
+ TESTS_FAILED=0
+ TESTS_PASSED=0
+ TESTS=0
+ failed=0
+ marker_options=""
+ ((TOTAL_TEST_SUITES++))
+
+ python3 --version >/dev/null 2>/dev/null
+ err_exit "ERROR: python3 not installed"
+
+ target="crc"
+ perf_profile_json="${LOCAL_MONITORING_TEST_DIR}/json_files/resource_optimization_openshift.json"
+
+ local_monitoring_tests=("sanity" "extended" "negative")
+
+ # check if the test case is supported
+ if [ ! -z "${testcase}" ]; then
+ check_test_case "local_monitoring"
+ fi
+
+ # create the result directory for given testsuite
+ echo ""
+ TEST_SUITE_DIR="${RESULTS}/local_monitoring_tests"
+ KRUIZE_SETUP_LOG="${TEST_SUITE_DIR}/kruize_setup.log"
+ KRUIZE_POD_LOG="${TEST_SUITE_DIR}/kruize_pod.log"
+
+ mkdir -p ${TEST_SUITE_DIR}
+
+ # check for 'local' flag
+ kruize_local_patch
+
+ # Setup kruize
+ if [ ${skip_setup} -eq 0 ]; then
+ echo "Setting up kruize..." | tee -a ${LOG}
+ echo "${KRUIZE_SETUP_LOG}"
+ setup "${KRUIZE_POD_LOG}" >> ${KRUIZE_SETUP_LOG} 2>&1
+ echo "Setting up kruize...Done" | tee -a ${LOG}
+
+ sleep 60
+
+ # create performance profile
+ create_performance_profile ${perf_profile_json}
+ else
+ echo "Skipping kruize setup..." | tee -a ${LOG}
+ fi
+
+ # If testcase is not specified run all tests
+ if [ -z "${testcase}" ]; then
+ testtorun=("${local_monitoring_tests[@]}")
+ else
+ testtorun=${testcase}
+ fi
+
+ # create the result directory for given testsuite
+ echo ""
+ mkdir -p ${TEST_SUITE_DIR}
+
+ PIP_INSTALL_LOG="${TEST_SUITE_DIR}/pip_install.log"
+
+ echo ""
+ echo "Installing the required python modules..."
+ echo "python3 -m pip install --user -r "${LOCAL_MONITORING_TEST_DIR}/requirements.txt" > ${PIP_INSTALL_LOG}"
+ #removing --user flag as facing error: "Can not perform a '--user' install. User site-packages are not visible in this virtualenv."
+ python3 -m pip install -r "${LOCAL_MONITORING_TEST_DIR}/requirements.txt" > ${PIP_INSTALL_LOG} 2>&1
+ err_exit "ERROR: Installing python modules for the test run failed!"
+
+ echo ""
+ echo "******************* Executing test suite ${FUNCNAME} ****************"
+ echo ""
+
+ for test in "${testtorun[@]}"
+ do
+ TEST_DIR="${TEST_SUITE_DIR}/${test}"
+ mkdir ${TEST_DIR}
+ LOG="${TEST_DIR}/${test}.log"
+
+ echo ""
+ echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" | tee -a ${LOG}
+ echo " Running Test ${test}" | tee -a ${LOG}
+ echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"| tee -a ${LOG}
+
+ echo " " | tee -a ${LOG}
+ echo "Test description: ${local_monitoring_test_description[$test]}" | tee -a ${LOG}
+ echo " " | tee -a ${LOG}
+
+ pushd ${LOCAL_MONITORING_TEST_DIR}/rest_apis > /dev/null
+ echo "pytest -m ${test} --junitxml=${TEST_DIR}/report-${test}.xml --html=${TEST_DIR}/report-${test}.html --cluster_type ${cluster_type}"
+ pytest -m ${test} --junitxml=${TEST_DIR}/report-${test}.xml --html=${TEST_DIR}/report-${test}.html --cluster_type ${cluster_type} | tee -a ${LOG}
+ err_exit "ERROR: Running the test using pytest failed, check ${LOG} for details!"
+
+ popd > /dev/null
+
+ passed=$(grep -o -E '[0-9]+ passed' ${TEST_DIR}/report-${test}.html | cut -d' ' -f1)
+ failed=$(grep -o -E 'check the boxes to filter the results.*' ${TEST_DIR}/report-${test}.html | grep -o -E '[0-9]+ failed' | cut -d' ' -f1)
+ errors=$(grep -o -E '[0-9]+ errors' ${TEST_DIR}/report-${test}.html | cut -d' ' -f1)
+
+ TESTS_PASSED=$(($TESTS_PASSED + $passed))
+ TESTS_FAILED=$(($TESTS_FAILED + $failed))
+
+ if [ "${errors}" -ne "0" ]; then
+ echo "Tests did not execute there were errors, check the logs"
+ exit 1
+ fi
+
+ if [ "${TESTS_FAILED}" -ne "0" ]; then
+ FAILED_CASES+=(${test})
+ fi
+
+ done
+
+ TESTS=$(($TESTS_PASSED + $TESTS_FAILED))
+ TOTAL_TESTS_FAILED=${TESTS_FAILED}
+ TOTAL_TESTS_PASSED=${TESTS_PASSED}
+ TOTAL_TESTS=${TESTS}
+
+ if [ "${TESTS_FAILED}" -ne "0" ]; then
+ FAILED_TEST_SUITE+=(${FUNCNAME})
+ fi
+
+ end_time=$(get_date)
+ elapsed_time=$(time_diff "${start_time}" "${end_time}")
+
+ # Remove the duplicates
+ FAILED_CASES=( $(printf '%s\n' "${FAILED_CASES[@]}" | uniq ) )
+
+ # print the testsuite summary
+ testsuitesummary ${FUNCNAME} ${elapsed_time} ${FAILED_CASES}
+}
diff --git a/tests/scripts/local_monitoring_tests/pytest.ini b/tests/scripts/local_monitoring_tests/pytest.ini
new file mode 100644
index 000000000..48bdd36e6
--- /dev/null
+++ b/tests/scripts/local_monitoring_tests/pytest.ini
@@ -0,0 +1,7 @@
+# content of pytest.ini
+[pytest]
+markers =
+ sanity: mark a test as a sanity test
+ test_e2e: mark a test as end-to-end test
+ negative: mark test as a negative test
+ extended: mark test as a extended test
diff --git a/tests/scripts/local_monitoring_tests/requirements.txt b/tests/scripts/local_monitoring_tests/requirements.txt
new file mode 100644
index 000000000..b14263e72
--- /dev/null
+++ b/tests/scripts/local_monitoring_tests/requirements.txt
@@ -0,0 +1,4 @@
+pytest
+requests
+jinja2
+pytest-html==3.2.0
\ No newline at end of file
diff --git a/tests/scripts/local_monitoring_tests/rest_apis/test_import_metadata.py b/tests/scripts/local_monitoring_tests/rest_apis/test_import_metadata.py
new file mode 100644
index 000000000..b68627683
--- /dev/null
+++ b/tests/scripts/local_monitoring_tests/rest_apis/test_import_metadata.py
@@ -0,0 +1,185 @@
+"""
+Copyright (c) 2024, 2024 Red Hat, IBM Corporation and others.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
+import pytest
+import json
+import sys
+
+sys.path.append("../../")
+
+from helpers.fixtures import *
+from helpers.kruize import *
+from helpers.utils import *
+from helpers.import_metadata_json_validate import *
+from jinja2 import Environment, FileSystemLoader
+
+mandatory_fields = [
+ ("version", ERROR_STATUS_CODE, ERROR_STATUS),
+ ("datasource_name", ERROR_STATUS_CODE, ERROR_STATUS)
+]
+
+csvfile = "/tmp/import_metadata_test_data.csv"
+
+@pytest.mark.sanity
+def test_import_metadata(cluster_type):
+ """
+ Test Description: This test validates the response status code of dsmetadata API by passing a
+ valid input for the json
+ """
+ input_json_file = "../json_files/import_metadata.json"
+
+ form_kruize_url(cluster_type)
+
+ response = delete_metadata(input_json_file)
+ print("delete metadata = ", response.status_code)
+
+ # Import metadata using the specified json
+ response = import_metadata(input_json_file)
+ metadata_json = response.json()
+
+ # Validate the json against the json schema
+ errorMsg = validate_import_metadata_json(metadata_json, import_metadata_json_schema)
+ assert errorMsg == ""
+
+ response = delete_metadata(input_json_file)
+ print("delete metadata = ", response.status_code)
+
+
+@pytest.mark.negative
+@pytest.mark.parametrize(
+ "test_name, expected_status_code, version, datasource_name",
+ generate_test_data(csvfile, import_metadata_test_data, "import_metadata"))
+def test_import_metadata_invalid_test(test_name, expected_status_code, version, datasource_name, cluster_type):
+ """
+ Test Description: This test validates the response status code of POST dsmtedata API against
+ invalid input (blank, null, empty) for the json parameters.
+ """
+ print("\n****************************************************")
+ print("Test - ", test_name)
+ print("****************************************************\n")
+ tmp_json_file = "/tmp/import_metadata_" + test_name + ".json"
+
+ print("tmp_json_file = ", tmp_json_file)
+
+ form_kruize_url(cluster_type)
+
+ environment = Environment(loader=FileSystemLoader("../json_files/"))
+ template = environment.get_template("import_metadata_template.json")
+ if "null" in test_name:
+ field = test_name.replace("null_", "")
+ json_file = "../json_files/import_metadata_template.json"
+ filename = "/tmp/import_metadata_template.json"
+
+ strip_double_quotes_for_field(json_file, field, filename)
+ environment = Environment(loader=FileSystemLoader("/tmp/"))
+ template = environment.get_template("import_metadata_template.json")
+
+ content = template.render(
+ version=version,
+ datasource_name=datasource_name,
+ )
+ with open(tmp_json_file, mode="w", encoding="utf-8") as message:
+ message.write(content)
+
+ response = delete_metadata(tmp_json_file)
+ print("delete metadata = ", response.status_code)
+
+ # Import metadata using the specified json
+ response = import_metadata(tmp_json_file)
+ metadata_json = response.json()
+
+ # temporarily moved this up to avoid failures in the subsequent tests
+ response_delete_metadata = delete_metadata(tmp_json_file)
+ print("delete metadata = ", response_delete_metadata.status_code)
+
+ assert response.status_code == int(expected_status_code)
+
+
+@pytest.mark.extended
+@pytest.mark.parametrize("field, expected_status_code, expected_status", mandatory_fields)
+def test_import_metadata_mandatory_fields(cluster_type, field, expected_status_code, expected_status):
+ form_kruize_url(cluster_type)
+
+ # Import metadata using the specified json
+ json_file = "/tmp/import_metadata.json"
+ input_json_file = "../json_files/import_metadata_mandatory.json"
+ json_data = json.load(open(input_json_file))
+
+ if field == 'version':
+ json_data.pop("version", None)
+ else:
+ json_data.pop("datasource_name", None)
+
+ print("\n*****************************************")
+ print(json_data)
+ print("*****************************************\n")
+ data = json.dumps(json_data)
+ with open(json_file, 'w') as file:
+ file.write(data)
+
+ response = delete_metadata(json_file)
+ print("delete metadata = ", response.status_code)
+
+ # Import metadata using the specified json
+ response = import_metadata(json_file)
+ metadata_json = response.json()
+
+ assert response.status_code == expected_status_code, \
+ f"Mandatory field check failed for {field} actual - {response.status_code} expected - {expected_status_code}"
+ assert metadata_json['status'] == expected_status
+
+ response = delete_metadata(json_file)
+ print("delete metadata = ", response.status_code)
+
+
+@pytest.mark.sanity
+def test_repeated_metadata_import(cluster_type):
+ """
+ Test Description: This test validates the response status code of /dsmetadata API by specifying the
+ same datasource name
+ """
+ input_json_file = "../json_files/import_metadata.json"
+ json_data = json.load(open(input_json_file))
+
+ datasource_name = json_data['datasource_name']
+ print("datasource_name = ", datasource_name)
+
+ form_kruize_url(cluster_type)
+
+ response = delete_metadata(input_json_file)
+ print("delete metadata = ", response.status_code)
+
+ # Import metadata using the specified json
+ response = import_metadata(input_json_file)
+ metadata_json = response.json()
+
+ assert response.status_code == SUCCESS_STATUS_CODE
+
+ # Validate the json against the json schema
+ errorMsg = validate_import_metadata_json(metadata_json, import_metadata_json_schema)
+ assert errorMsg == ""
+
+ # Import metadata using the specified json
+ response = import_metadata(input_json_file)
+ metadata_json = response.json()
+
+ assert response.status_code == SUCCESS_STATUS_CODE
+
+ # Validate the json against the json schema
+ errorMsg = validate_import_metadata_json(metadata_json, import_metadata_json_schema)
+ assert errorMsg == ""
+
+ response = delete_metadata(input_json_file)
+ print("delete metadata = ", response.status_code)
\ No newline at end of file
diff --git a/tests/scripts/local_monitoring_tests/rest_apis/test_list_datasources.py b/tests/scripts/local_monitoring_tests/rest_apis/test_list_datasources.py
new file mode 100644
index 000000000..95ed0710a
--- /dev/null
+++ b/tests/scripts/local_monitoring_tests/rest_apis/test_list_datasources.py
@@ -0,0 +1,95 @@
+"""
+Copyright (c) 2024, 2024 Red Hat, IBM Corporation and others.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
+import pytest
+import json
+import sys
+
+sys.path.append("../../")
+
+from helpers.fixtures import *
+from helpers.kruize import *
+from helpers.utils import *
+from helpers.list_datasources_json_validate import *
+
+
+@pytest.mark.sanity
+def test_list_datasources_without_parameters(cluster_type):
+ """
+ Test Description: This test validates datasources API without parameters
+ """
+ form_kruize_url(cluster_type)
+
+ # Get the datasources name
+ datasource_name = None
+ response = list_datasources(datasource_name)
+
+ list_datasources_json = response.json()
+
+ assert response.status_code == SUCCESS_200_STATUS_CODE
+
+ # Validate the json against the json schema
+ errorMsg = validate_list_datasources_json(list_datasources_json, list_datasources_json_schema)
+ assert errorMsg == ""
+
+
+@pytest.mark.sanity
+def test_list_datasources_with_name(cluster_type):
+ """
+ Test Description: This test validates datasources API with 'name' parameter
+ """
+ form_kruize_url(cluster_type)
+
+ # Get the datasources name
+ datasource_name = "prometheus-1"
+ response = list_datasources(datasource_name)
+
+ list_datasources_json = response.json()
+
+ assert response.status_code == SUCCESS_200_STATUS_CODE
+
+ # Validate the json against the json schema
+ errorMsg = validate_list_datasources_json(list_datasources_json, list_datasources_json_schema)
+ assert errorMsg == ""
+
+
+@pytest.mark.negative
+@pytest.mark.parametrize("test_name, expected_status_code, datasource_name",
+ [
+ ("blank_name", 400, ""),
+ ("null_name", 400, "null"),
+ ("invalid_name", 400, "xyz")
+ ]
+)
+def test_list_datasources_invalid_datasource_name(test_name, expected_status_code, datasource_name, cluster_type):
+ """
+ Test Description: This test validates the response status code of list datasources API against
+ invalid input (blank, null, empty) for the json parameters.
+ """
+ print("\n****************************************************")
+ print("Test datasource_name = ", datasource_name)
+ print("****************************************************\n")
+
+ form_kruize_url(cluster_type)
+
+ # Get the datasource name
+ name = datasource_name
+ response = list_datasources(name)
+
+ list_datasources_json = response.json()
+ assert response.status_code == ERROR_STATUS_CODE
+ assert list_datasources_json['message'] == LIST_DATASOURCES_ERROR_MSG % name
+
+
diff --git a/tests/scripts/remote_monitoring_tests/Remote_monitoring_tests.md b/tests/scripts/remote_monitoring_tests/Remote_monitoring_tests.md
index 0f2b27b11..d401f9006 100644
--- a/tests/scripts/remote_monitoring_tests/Remote_monitoring_tests.md
+++ b/tests/scripts/remote_monitoring_tests/Remote_monitoring_tests.md
@@ -75,6 +75,22 @@ Here are the test scenarios:
- for non-contiguous data:
- similar tests as mentioned above for contiguous
+
+### **Update Recommendation API tests**
+
+
+Here are the test scenarios:
+
+- Update recommendations with valid results and plots available
+- Update recommendations with no plots available when no recommendations available for medium and long term
+- Update recommendations with just interval_end_time in input
+- Update recommendations without experiment name or end_time
+- Update recommendations without end_time
+- Update recommendations with invalid end_time format
+- Update recommendations with unknown experiment_name
+- Update recommendations with unknown end_time
+- Update recommendations with end_time preceding start_time
+
The above tests are developed using pytest framework and the tests are run using shell script wrapper that does the following:
- Deploys kruize in non-CRD mode using the [deploy script](https://github.com/kruize/autotune/blob/master/deploy.sh) from the autotune repo
- Creates a resource optimization performance profile using the [createPerformanceProfile API](/design/PerformanceProfileAPI.md)
@@ -97,7 +113,7 @@ Use the below command to test :
Where values for test_autotune.sh are:
```
-usage: test_autotune.sh [ -c ] : cluster type. Supported type - minikube
+usage: test_autotune.sh [ -c ] : cluster type. Supported type - minikube, openshift. Default - minikube
[ -i ] : optional. Kruize docker image to be used for testing, default - kruize/autotune_operator:test
[ -r ] : Location of benchmarks. Not required for remote_monitoring_tests
[ --tctype ] : optional. Testcases type to run, default is functional (runs all functional tests)
diff --git a/tests/scripts/remote_monitoring_tests/fault_tolerant_tests/kruize_pod_restart_test.py b/tests/scripts/remote_monitoring_tests/fault_tolerant_tests/kruize_pod_restart_test.py
index 41bf659fd..50118a982 100644
--- a/tests/scripts/remote_monitoring_tests/fault_tolerant_tests/kruize_pod_restart_test.py
+++ b/tests/scripts/remote_monitoring_tests/fault_tolerant_tests/kruize_pod_restart_test.py
@@ -18,7 +18,7 @@
import json
import os
import time
-sys.path.append("..")
+sys.path.append("../../")
from helpers.kruize import *
from helpers.utils import *
from helpers.generate_rm_jsons import *
diff --git a/tests/scripts/remote_monitoring_tests/rest_apis/test_create_experiment.py b/tests/scripts/remote_monitoring_tests/rest_apis/test_create_experiment.py
index b72032891..a0825f02c 100644
--- a/tests/scripts/remote_monitoring_tests/rest_apis/test_create_experiment.py
+++ b/tests/scripts/remote_monitoring_tests/rest_apis/test_create_experiment.py
@@ -1,4 +1,22 @@
+"""
+Copyright (c) 2022, 2024 Red Hat, IBM Corporation and others.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
import pytest
+import sys
+sys.path.append("../../")
+
from helpers.fixtures import *
from helpers.kruize import *
from helpers.utils import *
diff --git a/tests/scripts/remote_monitoring_tests/rest_apis/test_e2e_workflow.py b/tests/scripts/remote_monitoring_tests/rest_apis/test_e2e_workflow.py
index c501f9b27..a956ac732 100644
--- a/tests/scripts/remote_monitoring_tests/rest_apis/test_e2e_workflow.py
+++ b/tests/scripts/remote_monitoring_tests/rest_apis/test_e2e_workflow.py
@@ -1,7 +1,25 @@
+"""
+Copyright (c) 2022, 2024 Red Hat, IBM Corporation and others.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
import copy
import json
import pytest
+import sys
+sys.path.append("../../")
+
from helpers.fixtures import *
from helpers.generate_rm_jsons import *
from helpers.kruize import *
@@ -104,6 +122,7 @@ def test_list_recommendations_multiple_exps_from_diff_json_files(cluster_type):
assert data[0]['experiment_name'] == experiment_name
assert data[0]['kubernetes_objects'][0]['containers'][0]['recommendations']['notifications'][NOTIFICATION_CODE_FOR_RECOMMENDATIONS_AVAILABLE][
'message'] == RECOMMENDATIONS_AVAILABLE
+
response = list_recommendations(experiment_name)
if response.status_code == SUCCESS_200_STATUS_CODE:
recommendation_json = response.json()
diff --git a/tests/scripts/remote_monitoring_tests/rest_apis/test_list_recommendations.py b/tests/scripts/remote_monitoring_tests/rest_apis/test_list_recommendations.py
index 1ce577fce..ba767c9af 100644
--- a/tests/scripts/remote_monitoring_tests/rest_apis/test_list_recommendations.py
+++ b/tests/scripts/remote_monitoring_tests/rest_apis/test_list_recommendations.py
@@ -1,7 +1,24 @@
+"""
+Copyright (c) 2022, 2024 Red Hat, IBM Corporation and others.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
import datetime
import json
import pytest
+import sys
+sys.path.append("../../")
from helpers.all_terms_list_reco_json_schema import all_terms_list_reco_json_schema
from helpers.fixtures import *
@@ -386,7 +403,6 @@ def test_list_recommendations_single_exp_multiple_results(cluster_type):
assert data[0]['kubernetes_objects'][0]['containers'][0]['recommendations']['notifications'][
NOTIFICATION_CODE_FOR_RECOMMENDATIONS_AVAILABLE]['message'] == RECOMMENDATIONS_AVAILABLE
-
response = list_recommendations(experiment_name)
list_reco_json = response.json()
@@ -1054,7 +1070,8 @@ def test_list_recommendations_for_diff_reco_terms_with_only_latest(test_name, nu
exp_found = False
for list_reco in list_reco_json:
if create_exp_json[0]['experiment_name'] == list_reco['experiment_name']:
- validate_reco_json(create_exp_json[0], update_results_json, list_reco, expected_duration_in_hours, test_name)
+ validate_reco_json(create_exp_json[0], update_results_json, list_reco, expected_duration_in_hours,
+ test_name)
exp_found = True
continue
diff --git a/tests/scripts/remote_monitoring_tests/rest_apis/test_update_recommendations.py b/tests/scripts/remote_monitoring_tests/rest_apis/test_update_recommendations.py
index 9273c271d..da643a78a 100644
--- a/tests/scripts/remote_monitoring_tests/rest_apis/test_update_recommendations.py
+++ b/tests/scripts/remote_monitoring_tests/rest_apis/test_update_recommendations.py
@@ -1,4 +1,21 @@
+"""
+Copyright (c) 2022, 2024 Red Hat, IBM Corporation and others.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
import pytest
+import sys
+sys.path.append("../../")
from helpers.fixtures import *
from helpers.kruize import *
from helpers.list_reco_json_validate import *
@@ -12,6 +29,7 @@ def test_update_valid_recommendations_after_results_after_create_exp(cluster_typ
update results for 24 hrs +
update recommendation using start and end time as a parameter
Expected : recommendation should be available for the timestamp provided
+ Expected : plots data should be available
'''
input_json_file = "../json_files/create_exp.json"
result_json_file = "../json_files/update_results.json"
@@ -27,7 +45,7 @@ def test_update_valid_recommendations_after_results_after_create_exp(cluster_typ
# Create experiment using the specified json
num_exps = 1
- num_res = 100
+ num_res = 2
for i in range(num_exps):
create_exp_json_file = "/tmp/create_exp_" + str(i) + ".json"
generate_json(find, input_json_file, create_exp_json_file, i)
@@ -77,7 +95,7 @@ def test_update_valid_recommendations_after_results_after_create_exp(cluster_typ
assert data['message'] == UPDATE_RESULTS_SUCCESS_MSG
# Expecting that we have recommendations
- if j > 96:
+ if j > 1:
response = update_recommendations(experiment_name, None, end_time)
data = response.json()
assert response.status_code == SUCCESS_STATUS_CODE
@@ -119,7 +137,133 @@ def test_update_valid_recommendations_after_results_after_create_exp(cluster_typ
update_results_json = []
update_results_json.append(result_json_arr[len(result_json_arr) - 1])
- expected_duration_in_hours = SHORT_TERM_DURATION_IN_HRS_MAX
+ expected_duration_in_hours = SHORT_TERM_DURATION_IN_HRS_MIN
+ validate_reco_json(create_exp_json[0], update_results_json, list_reco_json[0], expected_duration_in_hours)
+
+ # Delete all the experiments
+ for i in range(num_exps):
+ json_file = "/tmp/create_exp_" + str(i) + ".json"
+ response = delete_experiment(json_file)
+ print("delete exp = ", response.status_code)
+ assert response.status_code == SUCCESS_STATUS_CODE
+
+
+@pytest.mark.sanity
+def test_plots_with_no_recommendations_in_some_terms(cluster_type):
+ '''
+ Creates Experiment +
+ update results for 30 mins +
+ update recommendation using start and end time as a parameter
+ Expected : recommendation should be available for the timestamp provided
+ Expected : plots data should not be available for medium and long term
+ '''
+ input_json_file = "../json_files/create_exp.json"
+ result_json_file = "../json_files/update_results.json"
+
+ find = []
+ json_data = json.load(open(input_json_file))
+
+ find.append(json_data[0]['experiment_name'])
+ find.append(json_data[0]['kubernetes_objects'][0]['name'])
+ find.append(json_data[0]['kubernetes_objects'][0]['namespace'])
+
+ form_kruize_url(cluster_type)
+
+ # Create experiment using the specified json
+ num_exps = 1
+ num_res = 2
+ for i in range(num_exps):
+ create_exp_json_file = "/tmp/create_exp_" + str(i) + ".json"
+ generate_json(find, input_json_file, create_exp_json_file, i)
+
+ # Delete the experiment
+ response = delete_experiment(create_exp_json_file)
+ print("delete exp = ", response.status_code)
+
+ # Create the experiment
+ response = create_experiment(create_exp_json_file)
+
+ data = response.json()
+ print("message = ", data['message'])
+ assert response.status_code == SUCCESS_STATUS_CODE
+ assert data['status'] == SUCCESS_STATUS
+ assert data['message'] == CREATE_EXP_SUCCESS_MSG
+
+ # Update results for the experiment
+ update_results_json_file = "/tmp/update_results_" + str(i) + ".json"
+
+ result_json_arr = []
+ # Get the experiment name
+ json_data = json.load(open(create_exp_json_file))
+ experiment_name = json_data[0]['experiment_name']
+ interval_start_time = get_datetime()
+ for j in range(num_res):
+ update_timestamps = True
+ generate_json(find, result_json_file, update_results_json_file, i, update_timestamps)
+ result_json = read_json_data_from_file(update_results_json_file)
+ if j == 0:
+ start_time = interval_start_time
+ else:
+ start_time = end_time
+
+ result_json[0]['interval_start_time'] = start_time
+ end_time = increment_timestamp_by_given_mins(start_time, 15)
+ result_json[0]['interval_end_time'] = end_time
+
+ write_json_data_to_file(update_results_json_file, result_json)
+ result_json_arr.append(result_json[0])
+ response = update_results(update_results_json_file)
+
+ data = response.json()
+ print("message = ", data['message'])
+ assert response.status_code == SUCCESS_STATUS_CODE
+ assert data['status'] == SUCCESS_STATUS
+ assert data['message'] == UPDATE_RESULTS_SUCCESS_MSG
+
+ # Expecting that we have recommendations after minimum of two datapoints
+ if j > 1:
+ response = update_recommendations(experiment_name, None, end_time)
+ data = response.json()
+ assert response.status_code == SUCCESS_STATUS_CODE
+ assert data[0]['experiment_name'] == experiment_name
+ assert data[0]['kubernetes_objects'][0]['containers'][0]['recommendations']['notifications']['111000'][
+ 'message'] == 'Recommendations Are Available'
+ response = list_recommendations(experiment_name)
+ if response.status_code == SUCCESS_200_STATUS_CODE:
+ recommendation_json = response.json()
+ recommendation_section = recommendation_json[0]["kubernetes_objects"][0]["containers"][0][
+ "recommendations"]
+ high_level_notifications = recommendation_section["notifications"]
+ # Check if duration
+ assert INFO_RECOMMENDATIONS_AVAILABLE_CODE in high_level_notifications
+ data_section = recommendation_section["data"]
+ short_term_recommendation = data_section[str(end_time)]["recommendation_terms"]["short_term"]
+ short_term_notifications = short_term_recommendation["notifications"]
+ for notification in short_term_notifications.values():
+ assert notification["type"] != "error"
+
+ response = update_recommendations(experiment_name, None, end_time)
+ data = response.json()
+ assert response.status_code == SUCCESS_STATUS_CODE
+ assert data[0]['experiment_name'] == experiment_name
+ assert data[0]['kubernetes_objects'][0]['containers'][0]['recommendations']['notifications']['111000'][
+ 'message'] == 'Recommendations Are Available'
+
+ # Invoke list recommendations for the specified experiment
+ response = list_recommendations(experiment_name)
+ assert response.status_code == SUCCESS_200_STATUS_CODE
+ list_reco_json = response.json()
+
+ # Validate the json against the json schema
+ errorMsg = validate_list_reco_json(list_reco_json, list_reco_json_schema)
+ assert errorMsg == ""
+
+ # Validate the json values
+ create_exp_json = read_json_data_from_file(create_exp_json_file)
+ update_results_json = []
+ update_results_json.append(result_json_arr[len(result_json_arr) - 1])
+
+ expected_duration_in_hours = SHORT_TERM_DURATION_IN_HRS_MIN
validate_reco_json(create_exp_json[0], update_results_json, list_reco_json[0], expected_duration_in_hours)
# Delete all the experiments
diff --git a/tests/scripts/remote_monitoring_tests/rest_apis/test_update_results.py b/tests/scripts/remote_monitoring_tests/rest_apis/test_update_results.py
index 65ef4d5f7..44a77a98b 100644
--- a/tests/scripts/remote_monitoring_tests/rest_apis/test_update_results.py
+++ b/tests/scripts/remote_monitoring_tests/rest_apis/test_update_results.py
@@ -1,4 +1,21 @@
+"""
+Copyright (c) 2022, 2024 Red Hat, IBM Corporation and others.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
import pytest
+import sys
+sys.path.append("../../")
from helpers.fixtures import *
from helpers.kruize import *
from helpers.utils import *
diff --git a/tests/test_autotune.sh b/tests/test_autotune.sh
index 5bf18e2e4..c99bece9f 100755
--- a/tests/test_autotune.sh
+++ b/tests/test_autotune.sh
@@ -214,7 +214,7 @@ if [ ! -z "${testcase}" ]; then
fi
# check for benchmarks directory path
-if [ ! "${testsuite}" == "remote_monitoring_tests" ]; then
+if [[ "${testsuite}" != "remote_monitoring_tests" && "${testsuite}" != "local_monitoring_tests" ]]; then
if [ -z "${APP_REPO}" ]; then
echo "Error: Do specify the benchmarks directory path"
usage
@@ -256,7 +256,8 @@ if [ "${setup}" -ne "0" ]; then
exit 0
fi
else
- if [ ${testsuite} == "remote_monitoring_tests" ]; then
+ #TODO: the target for local monitoring is temporarily set to "crc" for the demo
+ if [ ${testsuite} == "remote_monitoring_tests" ] || [ ${testsuite} == "local_monitoring_tests" ] ; then
target="crc"
else
target="autotune"
diff --git a/tests/test_plans/test_plan_rel_0.0.21.md b/tests/test_plans/test_plan_rel_0.0.21.md
new file mode 100644
index 000000000..202fb470b
--- /dev/null
+++ b/tests/test_plans/test_plan_rel_0.0.21.md
@@ -0,0 +1,146 @@
+# KRUIZE TEST PLAN RELEASE 0.0.21
+
+- [INTRODUCTION](#introduction)
+- [FEATURES TO BE TESTED](#features-to-be-tested)
+- [BUG FIXES TO BE TESTED](#bug-fixes-to-be-tested)
+- [TEST ENVIRONMENT](#test-environment)
+- [TEST DELIVERABLES](#test-deliverables)
+ - [New Test Cases Developed](#new-test-cases-developed)
+ - [Regression Testing](#regresion-testing)
+- [SCALABILITY TESTING](#scalability-testing)
+- [RELEASE TESTING](#release-testing)
+- [TEST METRICS](#test-metrics)
+- [RISKS AND CONTINGENCIES](#risks-and-contingencies)
+- [APPROVALS](#approvals)
+
+-----
+
+## INTRODUCTION
+
+This document describes the test plan for Kruize remote monitoring release 0.0.21
+
+----
+
+## FEATURES TO BE TESTED
+
+* Kruize local changes
+
+Kruize local changes have been included in this release which allows a user to add datasources, import datasource metadata, create an experiment and generate recommendations
+ using the metric results from the specified datasource. Refer [doc](https://github.com/kruize/autotune/pull/1174/files#diff-a23fa581de2556a8ab7cec3efa3b03833fdfa86d42d96209cf691b8f288210f8) for further details.
+
+* Kruize Security vulnerability issues
+
+ Security Vulnerabilities in the Kruize dependencies have been fixed through the below issues:
+
+
+ * [1150](https://github.com/kruize/autotune/pull/1150)
+ * [1153](https://github.com/kruize/autotune/pull/1153)
+
+
+* Kruize logging using CloudWatch
+
+Send kruize logs to CloudWatch so that these logs can be viewed using tools like kibana to debug issues
+
+
+------
+
+## BUG FIXES TO BE TESTED
+
+* [1156](https://github.com/kruize/autotune/pull/1156) - Notification is not displayed when the CPU usage is less than a millicore
+* [1165](https://github.com/kruize/autotune/pull/1165) - Fix the missing validation for Update recommendation API
+
+---
+
+## TEST ENVIRONMENT
+
+* Minikube Cluster
+* Openshift Cluster
+
+---
+
+## TEST DELIVERABLES
+
+### New Test Cases Developed
+
+| # | ISSUE (NEW FEATURE) | TEST DESCRIPTION | TEST DELIVERABLES | RESULTS | COMMENTS |
+| --- |--------------------------------------------------------------------------------------------------------------------------------------| ---------------- | ----------------- | ----- | --- |
+| 1 | [Kruize local changes](https://github.com/kruize/autotune/issues/) | Test scenarios identified - [1134](https://github.com/kruize/autotune/issues/1134), [1129](https://github.com/kruize/autotune/issues/1129), [1160](https://github.com/kruize/autotune/issues/1160) |Kruize local is PoC, tests will be implemented while productizing | Kruize local workflow tested manually | PASSED on Openshift | Debugging generate recommendations issue on minikube
+| 2 | [Kruize CloudWatch logging](https://github.com/kruize/autotune/pull/1173) | Kruize logging to CloudWatch is tested by using a CloudWatch in AWS cluster manually | Manual test | PASSED | |
+| 3 | [Notifications are not displayed when the CPU usage is less than a millicore or zero](https://github.com/kruize/autotune/pull/1156) | Kruize Functional testsuite will be updated to post results with cpu usage of less than millicore or zero to validate these notifications | Functional tests included in the same PR | PASSED | |
+
+### Regression Testing
+
+| # | ISSUE (BUG/NEW FEATURE) | TEST CASE | RESULTS | COMMENTS |
+| --- |--------------------------------| ---------------- | -------- | --- |
+| 1 | Kruize remote monitoring tests | Functional test suite | PASSED | |
+| 1 | Kruize fault tolerant tests | Functional test suite | PASSED | |
+| 1 | Kruize stress tests | Functional test suite | PASSED | |
+| 2 | Kruize local monitoring demo | kruize demo | Tested it manually | Authentication failure on Openshift has been fixed, recommendations issue on minikube is being debugged
+| 3 | Short Scalability test | 5k exps / 15 days | PASSED |
+
+---
+
+## SCALABILITY TESTING
+
+Evaluate Kruize Scalability on OCP, with 5k experiments by uploading resource usage data for 15 days and update recommendations.
+Changes do not have scalability implications. Short scalability test will be run as part of the release testing
+
+Short Scalability run
+- 5K exps / 15 days of results / 2 containers per exp
+- Kruize replicas - 10
+- OCP - Scalelab cluster
+
+Kruize Release | Exps / Results / Recos | Execution time | Latency (Max/ Avg) in seconds ||| Postgres DB size(MB) | Kruize Max CPU | Kruize Max Memory (GB)
+-- | -- | -- | -- | -- | -- | --| -- | --
+ | | | UpdateRecommendations | UpdateResults | LoadResultsByExpName | | |
+0.0.20.3_mvp | 5K / 72L / 3L | 3h 49 mins | 0.62 / 0.39 | 0.24 / 0.17 | 0.34 / 0.25 | 21302.32 | 4.8 | 40.6
+0.0.20.3_mvp (With Box plots) | 5K / 72L / 3L | 3h 50mins | 0.61 / 0.39 | 025 / 0.18 | 0.34 / 0.25 | 21855.04 | 4.7 | 35.1
+0.0.21_mvp | 5K / 72L / 3L | 3h 50 mins | 0.62 / 0.39 | 0.25 / 0.17 | 0.34 / 0.25 | 21417.14 | 6.04 | 35.37
+0.0.21_mvp (With Box plots) | 5K / 72L / 3L | 3h 53 mins | 0.63 / 0.39 | 0.25 / 0.17 | 0.35 / 0. 25 | 21868.5 | 4.4 | 40.71
+
+----
+## RELEASE TESTING
+
+As part of the release testing, following tests will be executed:
+- [Kruize Remote monitoring Functional tests](/tests/scripts/remote_monitoring_tests/Remote_monitoring_tests.md)
+- Kruize Local monitoring workflow - Tested manually
+- [Fault tolerant test](/tests/scripts/remote_monitoring_tests/fault_tolerant_tests.md)
+- [Stress test](/tests/scripts/remote_monitoring_tests/README.md)
+- [Scalability test (On openshift)](/tests/scripts/remote_monitoring_tests/scalability_test.md) - scalability test with 5000 exps / 15 days usage data
+- [Kruize remote monitoring demo (On minikube)](https://github.com/kruize/kruize-demos/blob/main/monitoring/remote_monitoring_demo/README.md)
+
+
+| # | TEST SUITE | EXPECTED RESULTS | ACTUAL RESULTS | COMMENTS |
+| --- | ---------- | ---------------- | -------------- | -------- |
+| 1 | Kruize Remote monitoring Functional testsuite | TOTAL - 356, PASSED - 313 / FAILED - 43 | TOTAL - 356, PASSED - 313 / FAILED - 43 | No new regressions seen, existing issues - [559](https://github.com/kruize/autotune/issues/559), [610](https://github.com/kruize/autotune/issues/610) |
+| 2 | Kruize Local monitoring workflow | PASSED | PASSED on Openshift, recommendations issue on minikube | PoC code, tested it manually |
+| 3 | Fault tolerant test | PASSED | PASSED | |
+| 4 | Stress test | PASSED | FAILED | [Intermittent failure](https://github.com/kruize/autotune/issues/1106) |
+| 5 | Scalability test (short run)| PASSED | PASSED | |
+| 6 | Kruize remote monitoring demo | PASSED | PASSED | |
+
+---
+
+## TEST METRICS
+
+### Test Completion Criteria
+
+* All must_fix defects identified for the release are fixed
+* New features work as expected and tests have been added to validate these
+* No new regressions in the functional tests
+* All non-functional tests work as expected without major issues
+* Documentation updates have been completed
+
+----
+
+## RISKS AND CONTINGENCIES
+
+* None
+
+----
+## APPROVALS
+
+Sign-off
+
+----
+