Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hidden topk var removal from Grafana dashboards #2879

Closed
1 task done
rahulguptajss opened this issue May 7, 2024 · 5 comments · Fixed by #2881, #2889 or #2900
Closed
1 task done

hidden topk var removal from Grafana dashboards #2879

rahulguptajss opened this issue May 7, 2024 · 5 comments · Fixed by #2881, #2889 or #2900
Labels
24.05 feature New feature or request status/done

Comments

@rahulguptajss
Copy link
Contributor

rahulguptajss commented May 7, 2024

TopK

  • Remove pending hidden topk vars as mentioned here and replace panel queries with @ modifier
@rahulguptajss
Copy link
Contributor Author

rahulguptajss commented May 8, 2024

Below are the pending ones after #2881 . Most are now from external_service_op.json and storage grid.

grafana/dashboards/cmode/external_service_op.json:          "query": "query_result(topk($TopResources, avg by (key) (avg_over_time(external_service_op_request_latency{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",svm=~\"$SVM\",service_name=~\"$ServiceName\",operation=~\"$Operation\"}[3h]))))",
grafana/dashboards/cmode/external_service_op.json:          "query": "query_result(topk($TopResources, sum by (key) (avg_over_time(external_service_op_num_not_found_responses{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",svm=~\"$SVM\",service_name=~\"$ServiceName\",operation=~\"$Operation\"}[3h]))))",
grafana/dashboards/cmode/external_service_op.json:          "query": "query_result(topk($TopResources, sum by (key) (avg_over_time(external_service_op_num_request_failures{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",svm=~\"$SVM\",service_name=~\"$ServiceName\",operation=~\"$Operation\"}[3h]))))",
grafana/dashboards/cmode/external_service_op.json:          "query": "query_result(topk($TopResources, sum by (key) (avg_over_time(external_service_op_num_requests_sent{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",svm=~\"$SVM\",service_name=~\"$ServiceName\",operation=~\"$Operation\"}[3h]))))",
grafana/dashboards/cmode/external_service_op.json:          "query": "query_result(topk($TopResources, sum by (key) (avg_over_time(external_service_op_num_responses_received{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",svm=~\"$SVM\",service_name=~\"$ServiceName\",operation=~\"$Operation\"}[3h]))))",
grafana/dashboards/cmode/external_service_op.json:          "query": "query_result(topk($TopResources, sum by (key) (avg_over_time(external_service_op_num_successful_responses{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",svm=~\"$SVM\",service_name=~\"$ServiceName\",operation=~\"$Operation\"}[3h]))))",
grafana/dashboards/cmode/external_service_op.json:          "query": "query_result(topk($TopResources, sum by (key) (avg_over_time(external_service_op_num_timeouts{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",svm=~\"$SVM\",service_name=~\"$ServiceName\",operation=~\"$Operation\"}[3h]))))",
grafana/dashboards/cmode/external_service_op.json:        "definition": "query_result(topk($TopResources, avg by (key) (avg_over_time(external_service_op_request_latency{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",svm=~\"$SVM\",service_name=~\"$ServiceName\",operation=~\"$Operation\"}[3h]))))",
grafana/dashboards/cmode/external_service_op.json:        "definition": "query_result(topk($TopResources, sum by (key) (avg_over_time(external_service_op_num_not_found_responses{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",svm=~\"$SVM\",service_name=~\"$ServiceName\",operation=~\"$Operation\"}[3h]))))",
grafana/dashboards/cmode/external_service_op.json:        "definition": "query_result(topk($TopResources, sum by (key) (avg_over_time(external_service_op_num_request_failures{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",svm=~\"$SVM\",service_name=~\"$ServiceName\",operation=~\"$Operation\"}[3h]))))",
grafana/dashboards/cmode/external_service_op.json:        "definition": "query_result(topk($TopResources, sum by (key) (avg_over_time(external_service_op_num_requests_sent{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",svm=~\"$SVM\",service_name=~\"$ServiceName\",operation=~\"$Operation\"}[3h]))))",
grafana/dashboards/cmode/external_service_op.json:        "definition": "query_result(topk($TopResources, sum by (key) (avg_over_time(external_service_op_num_responses_received{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",svm=~\"$SVM\",service_name=~\"$ServiceName\",operation=~\"$Operation\"}[3h]))))",
grafana/dashboards/cmode/external_service_op.json:        "definition": "query_result(topk($TopResources, sum by (key) (avg_over_time(external_service_op_num_successful_responses{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",svm=~\"$SVM\",service_name=~\"$ServiceName\",operation=~\"$Operation\"}[3h]))))",
grafana/dashboards/cmode/external_service_op.json:        "definition": "query_result(topk($TopResources, sum by (key) (avg_over_time(external_service_op_num_timeouts{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",svm=~\"$SVM\",service_name=~\"$ServiceName\",operation=~\"$Operation\"}[3h]))))",
grafana/dashboards/cmode/node.json:          "query": "query_result(topk($TopResources, avg_over_time(fcp_read_data{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",node=~\"$Node\"}[3h])+avg_over_time(fcp_write_data{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",node=~\"$Node\"}[3h])+avg_over_time(fcp_nvmf_read_data{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",node=~\"$Node\"}[3h])+avg_over_time(fcp_nvmf_write_data{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",node=~\"$Node\"}[3h])))",
grafana/dashboards/cmode/node.json:          "query": "query_result(topk($TopResources, avg_over_time(nic_rx_bytes{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",node=~\"$Node\"}[3h])+avg_over_time(nic_tx_bytes{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",node=~\"$Node\"}[3h])))",
grafana/dashboards/cmode/node.json:        "definition": "query_result(topk($TopResources, avg_over_time(fcp_read_data{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",node=~\"$Node\"}[3h])+avg_over_time(fcp_write_data{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",node=~\"$Node\"}[3h])+avg_over_time(fcp_nvmf_read_data{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",node=~\"$Node\"}[3h])+avg_over_time(fcp_nvmf_write_data{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",node=~\"$Node\"}[3h])))",
grafana/dashboards/cmode/node.json:        "definition": "query_result(topk($TopResources, avg_over_time(nic_rx_bytes{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",node=~\"$Node\"}[3h])+avg_over_time(nic_tx_bytes{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",node=~\"$Node\"}[3h])))",
grafana/dashboards/cmode/power.json:          "query": "query_result(topk($TopResources, avg_over_time(aggr_power{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\"}[${__range}])))",
grafana/dashboards/cmode/power.json:        "definition": "query_result(topk($TopResources, avg_over_time(aggr_power{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\"}[${__range}])))",
grafana/dashboards/cmode/snapmirror.json:          "query": "query_result(topk($TopResources, avg_over_time(snapmirror_last_transfer_size{source_cluster=~\"$SourceCluster\",source_volume=~\"$SourceVolume\",destination_volume=~\"$DestinationVolume\",source_vserver=~\"$SourceSVM\",destination_vserver=~\"$DestinationSVM\"}[3h])))",
grafana/dashboards/cmode/snapmirror.json:        "definition": "query_result(topk($TopResources, avg_over_time(snapmirror_last_transfer_size{source_cluster=~\"$SourceCluster\",source_volume=~\"$SourceVolume\",destination_volume=~\"$DestinationVolume\",source_vserver=~\"$SourceSVM\",destination_vserver=~\"$DestinationSVM\"}[3h])))",
grafana/dashboards/storagegrid/fabricpool.json:          "query": "query_result(topk($TopResources, avg_over_time(storagegrid_private_load_balancer_storage_request_count{cluster=~\"$SGCluster\",policy=~\"$Policy\"}[${__range}])))",
grafana/dashboards/storagegrid/fabricpool.json:          "query": "query_result(topk($TopResources, avg_over_time(storagegrid_private_load_balancer_storage_rx_bytes{cluster=~\"$SGCluster\",policy=~\"$Policy\"}[${__range}])))",
grafana/dashboards/storagegrid/fabricpool.json:          "query": "query_result(topk($TopResources, avg_over_time(storagegrid_private_load_balancer_storage_tx_bytes{cluster=~\"$SGCluster\",policy=~\"$Policy\"}[${__range}])))",
grafana/dashboards/storagegrid/fabricpool.json:        "definition": "query_result(topk($TopResources, avg_over_time(storagegrid_private_load_balancer_storage_request_count{cluster=~\"$SGCluster\",policy=~\"$Policy\"}[${__range}])))",
grafana/dashboards/storagegrid/fabricpool.json:        "definition": "query_result(topk($TopResources, avg_over_time(storagegrid_private_load_balancer_storage_rx_bytes{cluster=~\"$SGCluster\",policy=~\"$Policy\"}[${__range}])))",
grafana/dashboards/storagegrid/fabricpool.json:        "definition": "query_result(topk($TopResources, avg_over_time(storagegrid_private_load_balancer_storage_tx_bytes{cluster=~\"$SGCluster\",policy=~\"$Policy\"}[${__range}])))",
grafana/dashboards/storagegrid/overview.json:          "query": "query_result(topk($TopResources, avg by(tenant,datacenter)(avg_over_time(storagegrid_tenant_usage_data_bytes{cluster=~\"$Cluster\"}[3h]))))",
grafana/dashboards/storagegrid/overview.json:          "query": "query_result(topk($TopResources, avg by(tenant,datacenter)(avg_over_time(storagegrid_tenant_usage_data_bytes{cluster=~\"$Cluster\"}[3h])/avg_over_time(storagegrid_tenant_usage_quota_bytes{cluster=~\"$Cluster\"}[3h]))))",
grafana/dashboards/storagegrid/overview.json:          "query": "query_result(topk($TopResources, avg_over_time(storagegrid_private_load_balancer_storage_request_count{cluster=~\"$Cluster\",policy=~\"$Policy\"}[3h])))",
grafana/dashboards/storagegrid/overview.json:          "query": "query_result(topk($TopResources, avg_over_time(storagegrid_private_load_balancer_storage_request_time{cluster=~\"$Cluster\",policy=~\"$Policy\"}[3h])))",
grafana/dashboards/storagegrid/overview.json:          "query": "query_result(topk($TopResources, avg_over_time(storagegrid_private_load_balancer_storage_rx_bytes{cluster=~\"$Cluster\",policy=~\"$Policy\"}[3h])))",
grafana/dashboards/storagegrid/overview.json:          "query": "query_result(topk($TopResources, avg_over_time(storagegrid_private_load_balancer_storage_tx_bytes{cluster=~\"$Cluster\",policy=~\"$Policy\"}[3h])))",
grafana/dashboards/storagegrid/overview.json:          "query": "query_result(topk($TopResources,(avg_over_time(storagegrid_node_cpu_utilization_percentage{cluster=~\"$Cluster\"}[3h]))))",
grafana/dashboards/storagegrid/overview.json:          "query": "query_result(topk($TopResources,(avg_over_time(storagegrid_storage_utilization_data_bytes{cluster=~\"$Cluster\"}[3h]))))",
grafana/dashboards/storagegrid/overview.json:        "definition": "query_result(topk($TopResources, avg by(tenant,datacenter)(avg_over_time(storagegrid_tenant_usage_data_bytes{cluster=~\"$Cluster\"}[3h]))))",
grafana/dashboards/storagegrid/overview.json:        "definition": "query_result(topk($TopResources, avg by(tenant,datacenter)(avg_over_time(storagegrid_tenant_usage_data_bytes{cluster=~\"$Cluster\"}[3h])/avg_over_time(storagegrid_tenant_usage_quota_bytes{cluster=~\"$Cluster\"}[3h]))))",
grafana/dashboards/storagegrid/overview.json:        "definition": "query_result(topk($TopResources, avg_over_time(storagegrid_private_load_balancer_storage_request_count{cluster=~\"$Cluster\",policy=~\"$Policy\"}[3h])))",
grafana/dashboards/storagegrid/overview.json:        "definition": "query_result(topk($TopResources, avg_over_time(storagegrid_private_load_balancer_storage_request_time{cluster=~\"$Cluster\",policy=~\"$Policy\"}[3h])))",
grafana/dashboards/storagegrid/overview.json:        "definition": "query_result(topk($TopResources, avg_over_time(storagegrid_private_load_balancer_storage_rx_bytes{cluster=~\"$Cluster\",policy=~\"$Policy\"}[3h])))",
grafana/dashboards/storagegrid/overview.json:        "definition": "query_result(topk($TopResources, avg_over_time(storagegrid_private_load_balancer_storage_tx_bytes{cluster=~\"$Cluster\",policy=~\"$Policy\"}[3h])))",
grafana/dashboards/storagegrid/overview.json:        "definition": "query_result(topk($TopResources,(avg_over_time(storagegrid_node_cpu_utilization_percentage{cluster=~\"$Cluster\"}[3h]))))",
grafana/dashboards/storagegrid/overview.json:        "definition": "query_result(topk($TopResources,(avg_over_time(storagegrid_storage_utilization_data_bytes{cluster=~\"$Cluster\"}[3h]))))",

@rahulguptajss rahulguptajss reopened this May 9, 2024
@rahulguptajss rahulguptajss linked a pull request May 13, 2024 that will close this issue
@rahulguptajss
Copy link
Contributor Author

Only SG dashboards will remain after PR #2889

@cgrinds
Copy link
Collaborator

cgrinds commented May 15, 2024

Was TopDiskBusy missed in the aggregate dashboard?
Everything else looks good.

"expr": "avg by (node, aggr) (aggr_disk_busy{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",node=~\"$Node\",aggr=~\"$TopDiskBusy\"})",

and
"expr": "avg by (node, aggr) (aggr_disk_busy{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",node=~\"$Node\",aggr=~\"$TopDiskBusy\"})",

@rahulguptajss
Copy link
Contributor Author

Was TopDiskBusy missed in the aggregate dashboard? Everything else looks good.

"expr": "avg by (node, aggr) (aggr_disk_busy{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",node=~\"$Node\",aggr=~\"$TopDiskBusy\"})",

and

"expr": "avg by (node, aggr) (aggr_disk_busy{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\",node=~\"$Node\",aggr=~\"$TopDiskBusy\"})",

I have removed this and added topk to two panels where it was used, as part of issue #2900.

@cgrinds
Copy link
Collaborator

cgrinds commented May 16, 2024

Verified on 24.05.0 commit 6617960
All dashboards except for SG ones are fixed #2890

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
24.05 feature New feature or request status/done
Projects
None yet
2 participants