Expose redis queue length metrics from ZMON worker #376

mikkeloscar · 2018-10-26T12:53:10Z

We should scale the number of zmon-workers running in Kubernetes based on the redis queue length. Since we now have custom metrics available in our Kubernetes setup we can do it by exposing a metric from the pods and scale based on that.

If each zmon-worker could expose the current redis queue length in a json metrics enpoint, then we could use the Horizontal Pod Autoscaler configuration described here: https://github.com/zalando-incubator/kube-metrics-adapter#example to do the scaling. This would allow us to run with a baseline of 1 zmon-worker in each cluster and only scale up when needed.

The alternative to the json metrics enpoint could be to scale on a ZMON check but it would not make sense to depend on ZMON in order to scale... ZMON. :)

vetinari · 2018-11-07T07:58:30Z

The main problem with this is, that we have a quite stable input from the ZMON scheduler into the queue. Once we reached the 0 length queue, we must not scale down again to keep the current worker throughput.

mikkeloscar · 2018-11-08T11:12:31Z

What about exposing another value than queue length? E.g. "scheduled checks per minute" or whatever makes sense for the workers, then you have a number that will not be 0.

Just an idea: If zmon-scheduler exposes a count of scheduled events in prometheus format, then we could use a prometheus query as the metric source for scaling e.g. events per min.

jrake-revelant · 2019-08-20T15:38:39Z

up, let us start discussing this again.

szuecs · 2019-09-03T09:00:09Z

Maybe queue length aggregated over a specified time frame is good enough.

It could work like this:

Prometheus collects queue size
zmon check to query Prometheus, like this: sum(rate(queue_size{}[5m]))
custom metrics HPA to use zmon check

This should work without exposing zmon-scheduler stats, because the rate will be the same and not fluctuate too much.

mikkeloscar · 2019-09-03T09:49:06Z

@szuecs We wouldn't need the zmon check, we can simply have an HPA based on prometheus query: https://github.com/zalando-incubator/kube-metrics-adapter#example-external-metric

szuecs · 2019-09-03T12:31:46Z

true, but if you need more logic you can do this in zmon check

mohabusama added the feature improvements, enhancement, functionality requested by users label Oct 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose redis queue length metrics from ZMON worker #376

Expose redis queue length metrics from ZMON worker #376

mikkeloscar commented Oct 26, 2018

vetinari commented Nov 7, 2018

mikkeloscar commented Nov 8, 2018 •

edited

Loading

jrake-revelant commented Aug 20, 2019

szuecs commented Sep 3, 2019

mikkeloscar commented Sep 3, 2019

szuecs commented Sep 3, 2019

Expose redis queue length metrics from ZMON worker #376

Expose redis queue length metrics from ZMON worker #376

Comments

mikkeloscar commented Oct 26, 2018

vetinari commented Nov 7, 2018

mikkeloscar commented Nov 8, 2018 • edited Loading

jrake-revelant commented Aug 20, 2019

szuecs commented Sep 3, 2019

mikkeloscar commented Sep 3, 2019

szuecs commented Sep 3, 2019

mikkeloscar commented Nov 8, 2018 •

edited

Loading