-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simplified pod resource metrics (support for initContainers) #71
Comments
That being said https://kubernetes.io/docs/reference/instrumentation/metrics/ says that Update: but this says, referring to
|
@rptaylor I've also confirmed that
|
Maybe, but I'd like to at least investigate the alternative first (kube-scheduler metrics) to see how feasible it is, especially if the KSM developers specifically recommend that way and it would make our lives easier too. https://yuki-nakamura.com/2023/10/21/get-kube-schedulers-metrics-manually/ |
After some investigation, I've discovered that
|
@mwestphall That's great to hear, excellent investigation! I meant to reply earlier but I was trying to get the kube-scheduler metrics working in our cluster but got stuck. I think we are planning to move towards the kube-prometheus-stack instead of the Bitnami chart so hopefully we will be able to converge and build on your findings then. (Also investigating the possibility of VictoriaMetrics but that could involve more compatibility challenges). |
@rptaylor I'm about to head out for the holidays, let's plan on continuing this work at the start of the new year. |
@rptaylor I'm back in the office now. In terms of next steps here, would it make sense to set up a test cluster that exposes |
@mwestphall happy new year and sorry for the delayed response.
What do you think, does that make sense? In the end it would be similar to what you suggested here: #71 (comment) but the result would be more controllable/configurable. But given the logic involved, and in particular that the effective resource requests are different than the declared resource requests, ultimately using the pod-based metrics from the scheduler is the only way to be correct in more complex situations such as initcontainers. |
@rptaylor that sounds like a good plan, I will get started on that beginning of next week. With regards to reservations about getting scheduler metrics working, configuration on the Prometheus side should be easy enough so long as we properly document it. I think there's probably still a couple concerns with needing to edit the scheduler bind address to make the relevant metrics endpoint accessible in the first place, since by my understanding this is disabled by default in most kubernetes set ups. |
Currently I believe initContainers will not be accounted or seen, since there are separate metrics like
kube_pod_init_container_resource_requests
for them: https://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/workload/pod-metrics.mdIt would be a hassle to write code to add up the initContainers, along with the regular containers, and do a max on them to try to duplicate/emulate the logic that k8s follows to determine the final pod resource amounts. It would be much better to simply query the actual resource amount of the whole pod, which is also recommended under the kube_pod_container_resource_requests description: "It is recommended to use the kube_pod_resource_requests metric exposed by kube-scheduler instead, as it is more precise."
Info about the kube-scheduler metrics: https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/#kube-scheduler-metrics
I checked on one of our clusters and kube_pod_resource_requests was not available. I think those metrics may need to be enabled with the
--show-hidden-metrics-for-version
flag, or maybe Prom needs to be configured to scrape scheduler metrics. That would be an extra complication for deployment, but it's probably worth it to keep the code simpler and less bug prone, especially if support for accounting initContainers is needed.The text was updated successfully, but these errors were encountered: