Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose pvc/pod level metrics (if not already exposed by kubelet and cadvisor) #78

Open
iyashu opened this issue May 21, 2021 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@iyashu
Copy link
Contributor

iyashu commented May 21, 2021

Required k8s persistent volume & filesystem level metrics along with their grafana dashboards and few sane alerts preferably in kube-prometheus mixin format. I believe many of them are already exposed by kubelet (or embedded cadvisor), but we need to check and expose them for cgroup v2 hierarchy as well.

  1. Utilisation metrics (both inodes and bytes usage) along with total space available. - Already being exposed by kubelet. Grafana dashboards and prometheus alerts are already provided by kube-prometheus stack.
  2. Volume read and write throughput metrics both in terms of iops and bytes per second. - These seems to be exposed by cadvisor, but somehow not visible for cgroup v2 hierarchy.
  3. Disk read & write IO latency - Need to check if cadvisor already exposes these for cgroup v2.
  4. No. of outstanding IO operations (preferably both queued as well as waiting for block device).
  5. PV abnormality metrics due to degrading of underlying disk attached to node, fs corruption, accidental volume deletion on node etc. See if we can leverage volume health monitoring for the same.

Additionally we require following metrics related to pvc failure & provisioning to generate appropriate alerts.

  1. pvc pending from long time. Explore if we can leverage kube-state-metrics to expose the same. Or we need to see if external provisioner already provide these metrics.
  2. Other plugin level metrics (both controller and node driver) like client-go metrics, creation/expansion/deletion rpc rates, latency & failures.

Environment:

  • Kubernetes version (use kubectl version): >= 1.19
  • OS (e.g. from /etc/os-release): Debian 10
@kmova
Copy link
Member

kmova commented Jun 9, 2021

Most of the metrics are available via:

  • kube-state-metrics
  • cAdvisor
  • Node exporter (standard and include kubelet mount point metrics )

In addition to the above, the LVM node-plugin will expose metrics (in addition to what exposed by sample LVM textfile exporter) with required labels attached to the metrics to co-relate with metrics exposed via standard exporters enabled in the cluster.

Sample dashboard with workload using LVM Local PV showing the PV utilization and performance metrics

@iyashu
Copy link
Contributor Author

iyashu commented Jun 9, 2021

Thanks @kmova. Let me know as the dashboard gets ready & pushed somewhere. I would like to try them out in our playground clusters.

@dsharma-dc
Copy link
Contributor

Need to verify the metrics. Previous comments mention that metrics are available. @abhilashshetty04 Could you please check this.

@dsharma-dc dsharma-dc added the enhancement New feature or request label Jun 4, 2024
@dsharma-dc dsharma-dc self-assigned this Jul 5, 2024
@dsharma-dc
Copy link
Contributor

Not yet picked up to prioritise.

@avishnu
Copy link
Member

avishnu commented Sep 12, 2024

@abhilashshetty04 @w3aman please confirm if the dashboard contains the needed metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants