-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time decay of t-digest #127
Comments
My feeling for this is that addition is more robust than subtraction.
As such, we can build a history of digests for relatively small periods and
occasionally combine them into larger intervals. This is the same sort of
thing that has to be done with hyper-log-log and most other sketch data
structures. They may support subtraction, but the result usually inherits
relative accuracy from the operands which is disastrous.
In the case of HA collectors, you can get perfect consistency if all events
have an event time. You can also retire digests for time periods as soon as
possible if you have a watermark for the stream of data, but simply
overwriting a digest with late updates probably is just fine as well.
If events don't have event times or you don't want to respect them then you
have to depend on processing time. This is bad juju because it means that
the results of a program depend on when it is run, but it can be done. In
that case, the different HA collectors will have slightly discrepant
results.
…On Tue, May 7, 2019 at 8:40 PM ajwerner ***@***.***> wrote:
Hey Ted,
This is a re-posting of the now closed #55
<#55>. I'm interested in
re-opening the exploration into how to properly decay a t-digest. The
t-digest is attractive due to its high precision and relatively compact
serialization with no range configuration. It is a perfect fit for
approximate quantile estimation of large data sets. It seems to me that it
could additionally be a good data structure to track activity on a server.
The common data structure people reach for in the context of server
monitoring is the HDR histogram. The especially nice thing about the
histogram in this setting is that it is robust to non-uniform sampling
rate. A common monitoring architecture today (using something like
https://prometheus.io/) will collect a snapshot of values from servers
periodically where the collecting server chooses the timestamp. Histograms
can report cumulative values and then can subtract the previous timestamp's
bucket counts from the current to get a view of the data between the two.
The t-digest doesn't offer this convenient subtraction mechanism making it
more difficult to use in this sort of setting. One solution people
sometimes offer is to use collection as a trigger to reset the t-digest.
This is problematic in the face of more than one collector (as is the
recommendation for HA prometheus).
An alternative to only reporting the cumulative distribution would be to
sample a distribution which represents the previous trailing period. I've
experimented with this a bit and it seems roughly reasonable though with
some unexpected behavior as few new points are recorded and the count
decays away.
The approach in the blog post linked to #55
<#55> feels both high in terms
of overhead and it is not completely obvious to me exactly how to map that
idea on to the t-digest. Have you given this topic more thought? Do you
have a hunch on good approaches?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#127>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAB5E6TALZF4V2IEACX32UTPUJDTNANCNFSM4HLOD3VQ>
.
|
I tend to agree with this point-of-view. The metrics world roughly falls in to 2 camps, push based and pull based. Ideally the pull-based metrics would would have agreed to some amount of server buffering of recent data where the client dictates the timestamp. This at least would have made HA easier to implement without the complications discussed. Unfortunately Prometheus seems to be the winning and in that world the collector assigns the timestamp. The Prometheus push gateway is even worse as there the Prometheus server assigns the timestamp based on when the collector reads from the push gateway. Alas event times in the Prometheus metrics world isn't really an option.
This has added memory overhead for each of the finer granularity time windows. |
For some more context, I'd love to hack soon on adding first-class support for t-digests into prometheus (prometheus/prometheus#2682). Prometheus today allows you to report histogram buckets where each bucket is a timeseries and then provides query-language support to aggregate them. These things are often hundreds of buckets and give pretty low precision with a pre-configured range. t-digests could use a similar pattern such that instead of a timeseries per bucket you have two timeseries per centroid based on the index of the centroid where the total number of timeseries is 2*delta. The high-index timeseries will be NaN most of the time but prometheus does a good job with compression (I need to verify that it compresses contiguous NaNs well but that's what I have in mind). I'm worried about the aliasing and lost due to ticking at a fixed rate that's on the order of the collection rate. I'm worried about the overhead (primarily memory overhead) of keeping multiple t-digests at a finer granularity and merging them at read time and rolling them off the end of the ring-buffer. I suppose this could be made better by eliminating the write buffer of the merging digest for "sealed" t-digests making this approach somewhat more attractive. Something I've been playing with lately is just quantizing time in to some The formula for choosing the decay factor is driven by the tick rate and a user-provided
This approach seems to get close to what I want to see. It seems particularly in the middle of the distribution to track the trailing distribution on roughly the timescale provided. The problem as far as I can tell is that the decaying seems to flatten out and become fatter-tailed over time. I suspect the addition of some decay threshold to filter out centroids which have been sufficiently decayed will help and I'm going to experiment with that. Perhaps that value should just be Does this basic approach of quantizing time and using a traditional EWMA over weights seem reasonable? If yes, then there's the details of how sound are the actual details of this approach. |
Restarting. Yes. This approach to decay sounds pretty reasonable, but you will have very little idea about accuracy as things begin to decay. I suppose that you can still say that the invariants hold as much as they ever did, but I don't think you can say much more than that. Is the cost of windowed digests all that high? If you have compression = 100, then sealed digests will only have about 100 floats to store or fewer if there are not very many samples. This is on the order of a kilobyte or so. How many of these do you think you will need to keep in memory? Even keeping millions isn't that big of a deal. |
On the topic of integration with Prometheus, I think that sounds like a great idea. How can I help? |
I don' have any answers but maybe these thoughts can whet your appetite on the problem (or discourage, who knows). The first step to first class support in prometheus will be to figure out how to structure the data. The basic unit of the prometheus data model is, as I like to think about it, is a time series. A time series is a set of timestamp keys and floating point values. A time series is identified by a set of labels (and importantly a special To make this more explicit, below find the representation of a histogram:
It might be (is) the case that this histogram is configured for even larger buckets but because they have no values, they are omitted. These A significant barrier to the windowing approach is that prometheus has no mechanism to control retention per time series (prometheus/prometheus#1381). This makes windowing schemes feel less tractable. I want to explore windowing schemes more but haven't had much time. I started to think through them one day here: https://github.com/ajwerner/tdigest/blob/9165f388a6b2c88703b9630c9d15e59d2d212d7e/windowed/windowed.go#L12-L120 |
OK. That's interesting. Sad as well. Things like Hdr histograms have constant bin boundaries so this idea of having a different time series for each bin (or for each cumulative sum) is easy to hack into a simple time series database. I have done the same sort of thing with Open TSDB and accumulated similar karmic debt. The centroids in a t-digest wander around, however. That makes this much harder. One possibility is to not use an additive measure if Prometheus allows that. does it? |
There's nothing at all that bounds us to using an additive measure. That's an assumption of the data type encoding. One could imagine doing something like the below. That ends up being pretty good because it will mean highly compressed histograms will have mostly zero values. Sure the values in the low buckets won't compress well, but that seems fine. We aren't trying to do optimal storage here, just something that works. We can also leverage meta buckets to indicate things like start time and window sizes. I think the below encoding could get you a pretty long way. At query time you pull all of the buckets and merge where appropriate. It might be that there are multiple different readings corresponding to the same open window, that seems fine. At least, it seems fine if you could compact them away. In a more advanced time series database that allowed per series retention and compactions, then it really would be totally fine. Alas.
|
When I said that you don't get an event time I meant event time is collection time. At query time you do, of course, get a timestamp. |
That's true with a constant configuration. It's also sort of terrible. In the system I currently work on, we, a long time ago, hard-coded all of the buckets to end up having quite high precision in the nanoseconds and only second-level precision in the seconds. It was a bad choice for the system. We can never change it now. With the above proposed centroid index scheme, one could change their compression factor and everything would adjust nicely. |
This proposed encoding looks totally usable.
…On Mon, Jan 18, 2021 at 4:23 PM ajwerner ***@***.***> wrote:
One possibility is to not use an additive measure if Prometheus allows
that. does it?
There's nothing at all that bounds us to using an additive measure. That's
an assumption of the data type encoding. One could imagine doing something
like the below. That ends up being pretty good because it will mean highly
compressed histograms will have mostly zero values. Sure the values in the
low buckets won't compress well, but that seems fine. We aren't trying to
do optimal storage here, just something that works. We can also leverage
meta buckets to indicate things like start time and window sizes. I think
the below encoding could get you a pretty long way. At query time you pull
all of the buckets and merge where appropriate. It might be that there are
multiple different readings corresponding to the same open window, that
seems fine. At least, it seems fine if you could compact them away. In a
more advanced time series database that allowed per series retention and
compactions, then it really would be totally fine. Alas.
tdigest{bucket="1",type="count"} 1
tdigest{bucket="1",type="val"} 18431
tdigest{bucket="2",type="count"} 2
tdigest{bucket="2",type="val"} 22527
tdigest{type="start"} <timestamp>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#127 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB5E6UCFQP3FOOUK7YQAILS2TGIHANCNFSM4HLOD3VQ>
.
|
Hey @tdunning,
This is a re-posting of the now closed #55. I'm interested in re-opening the exploration into how to properly decay a t-digest. The t-digest is attractive due to its high precision and relatively compact serialization with no range configuration. It is a perfect fit for approximate quantile estimation of large data sets. It seems to me that it could additionally be a good data structure to track activity on a server.
The common data structure people reach for in the context of server monitoring is the HDR histogram. The especially nice thing about the histogram in this setting is that it is robust to non-uniform sampling rate. A common monitoring architecture today (using something like https://prometheus.io/) will collect a snapshot of values from servers periodically where the collecting server chooses the timestamp. Histograms can report cumulative values and then can subtract the previous timestamp's bucket counts from the current to get a view of the data between the two.
The t-digest doesn't offer this convenient subtraction mechanism making it more difficult to use in this sort of setting. One solution people sometimes offer is to use collection as a trigger to reset the t-digest. This is problematic in the face of more than one collector (as is the recommendation for HA prometheus).
An alternative to only reporting the cumulative distribution would be to sample a distribution which represents the previous trailing period. I've experimented with this a bit and it seems roughly reasonable though with some unexpected behavior as few new points are recorded and the count decays away.
The approach in the blog post linked to #55 feels both high in terms of overhead and it is not completely obvious to me exactly how to map that idea on to the t-digest. Have you given this topic more thought? Do you have a hunch on good approaches?
The text was updated successfully, but these errors were encountered: