-
Notifications
You must be signed in to change notification settings - Fork 41
Expose redis queue length metrics from ZMON worker #376
Comments
The main problem with this is, that we have a quite stable input from the ZMON scheduler into the queue. Once we reached the 0 length queue, we must not scale down again to keep the current worker throughput. |
What about exposing another value than queue length? E.g. "scheduled checks per minute" or whatever makes sense for the workers, then you have a number that will not be 0. Just an idea: If zmon-scheduler exposes a count of scheduled events in prometheus format, then we could use a prometheus query as the metric source for scaling e.g. events per min. |
up, let us start discussing this again. |
Maybe queue length aggregated over a specified time frame is good enough. It could work like this:
This should work without exposing zmon-scheduler stats, because the rate will be the same and not fluctuate too much. |
@szuecs We wouldn't need the zmon check, we can simply have an HPA based on prometheus query: https://github.com/zalando-incubator/kube-metrics-adapter#example-external-metric |
true, but if you need more logic you can do this in zmon check |
We should scale the number of zmon-workers running in Kubernetes based on the redis queue length. Since we now have custom metrics available in our Kubernetes setup we can do it by exposing a metric from the pods and scale based on that.
If each zmon-worker could expose the current redis queue length in a json metrics enpoint, then we could use the Horizontal Pod Autoscaler configuration described here: https://github.com/zalando-incubator/kube-metrics-adapter#example to do the scaling. This would allow us to run with a baseline of 1 zmon-worker in each cluster and only scale up when needed.
The alternative to the json metrics enpoint could be to scale on a ZMON check but it would not make sense to depend on ZMON in order to scale... ZMON. :)
The text was updated successfully, but these errors were encountered: