-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decay TDigest #198
Comments
It's quite possible to build something kind of like a time decayed
t-digest, but there are some seriously difficult issues to overcome.
The first one is to decide what you actually mean by the invariant of the
scale function. Right now, we allow violation of the scale function
invariant in the case of centroids that have a single sample. This is
reasonable because you don't lose any accuracy in the estimation of
quantiles or the cumulative distribution in these cases. But if you allow
these single counts to decay and to partial counts, then you don't know
which additional samples should be allowed because everything has to
satisfy the invariant. In fact, a lot of the accuracy at the extreme tails
that the t digest gets is precisely because individual samples are
preserved near the tails. That means that this question of what do you do
with partial sample weighted centroids is particularly vexed.
Interestingly, the situation is very very different if you look at the log
histogram. In that case partial samples are no problem at all. The reason
for the difference is that the log histogram maintains the invariant in
terms of the original measurement space while the t digest maintains
invariants in the quantile space. This issue is subtle but very very thorny.
I am pretty firmly of the opinion that the right answer to this is to
simply store digests every sample. A common choice of sample is every
minute or every 5 minutes. If you care to you can maintain accumulative for
the last hour or so as well. This gives you complete flexibility in getting
a digest for any time period you want because you can merge multiple
digests together so easily.
…On Wed, Aug 31, 2022, 01:47 rnataf ***@***.***> wrote:
Hi, I am interested in a time decayed version of TDigest. I saw there was
a discussion about it in issue #127
<#127>. Has there been any
progress about this problem ?
—
Reply to this email directly, view it on GitHub
<#198>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB5E6RTAWYMSAU72K3E7VTV34L35ANCNFSM6AAAAAAQBE7WKM>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi Ted, thank you for your answer.
|
Yes. You have the first point correct. If a centroid has one sample, then it decays to 1/2, another sample, and it decays by a factor of 2/3, we have a weight of 1, but this represents two samples. Normally, we would use this specially for interpolation purposes to gain accuracy, but we can't do that any more. One fix for this would be to keep separate counter of number of samples and total weight for a centroid, but once you have at least one sample, then it can never decay to the point where it can be merged if it is in the tail and thus you force other samples to be merged. Regarding your second point, the most common desire for exponential decay of a digest is so that you can estimate the distribution over a particular (recent) time period. With short time range based digests, you don't have to estimate ... you can just combine the pertinent digests and you have exactly what you want. If you want some sort of weighting over a time period, you can do that too. When you want to do the quantile or CDF operation you translate into composite operations on each set of digests that have the same weight. |
Regarding your last point, I understand that what you suggest to do is to have one digest per time period. Then, I can decrease the weights of "old" digests when I no longer use them. Let's say now I'm in time period 2 and I decreased the weights of digests of time period 0 and 1. When I want to compute the quantile I should merge the digests of periods 0, 1 and 2 and then compute the quantile on the merged result. Did I understand you correctly ? |
Hi, I am interested in a time decayed version of TDigest. I saw there was a discussion about it in issue #127. Has there been any progress about this problem ?
The text was updated successfully, but these errors were encountered: