Change some reported metrics from Prometheus' Counter
to Histogram
#5458
Replies: 7 comments 4 replies
-
Sorry for the late response and thanks for the suggestions! I have filed a |
Beta Was this translation helpful? Give feedback.
-
@mfuntowicz, we would be happy to have your contribution. We just ask you to first fill out the CCLA form found on the top level of the |
Beta Was this translation helpful? Give feedback.
-
Prometheus team member here. For latencies and such, histograms are recommended. Ideally, you would expose seconds, not us, as we generally try to use SI base units. This makes correlation etc a lot easier. Our values are float anyway, so that shouldn't be an issue. |
Beta Was this translation helpful? Give feedback.
-
Hi @mfuntowicz @RichiH , I am looking into this now (sorry it has taken so long 🙂). I'd like your feedback on the real world implications of these metrics being converted to histograms.
Any other feedback, thoughts, or ideas you have are welcome. |
Beta Was this translation helpful? Give feedback.
-
Having the quantile information is very important to set the right alerts and help us determine and estimate the bare-minimum resources needed to serve a traffic profile for the model in production without degraded experience. Quantiles, p50, p90, p95, p99, p99.99 are reasonable and it would save a lot of cost. Also I want to note that I haven't found a resonable way to compute quantiles accurately from the counters using prometheus queries. This is a cruicial ask for using triton in production environment. can you please prioritize it @rmccorm4 |
Beta Was this translation helpful? Give feedback.
-
Hi @rmccorm4 - I just wanted to follow up on this; is there an update on when we can expect these metrics to be exposed as With |
Beta Was this translation helpful? Give feedback.
-
It makes sense to use histograms for tracking latency metrics in Triton as they provide more detailed insights beyond just cumulative values, especially for metrics like inference durations. Histograms allow you to track distributions and still capture cumulative sums, aligning with your current monitoring needs. If you're willing to contribute, adding histogram support for these latency metrics would be valuable. You can start by modifying the relevant metrics collection code to use histograms instead of counters for the specified latency metrics. Let me know if you need guidance on contributing! |
Beta Was this translation helpful? Give feedback.
-
Currently, Triton reports metrics as
Counter
which are monotonically increasing for all the tracked metrics.Still, for some of them, it might be interesting to report distribution(s) instead of single value aggregation (cumulative), especially on latencies:
When looking at those, the cumulative only captures some piece of the latency information whereas an Histogram would report few more insights about what is going on w.r.t latency on Triton.
Actually, using an Histogram would also reports the cumulative as the
Counter
currently does.From the Prometheus doc:
If it's something which makes sense, I'm happy to contribute to these changes.
🤗
Beta Was this translation helpful? Give feedback.
All reactions