Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate ProgressBar ETA from most recent updates rather than for whole period #4120

Closed
wants to merge 44 commits into from

Conversation

davep
Copy link
Contributor

@davep davep commented Feb 5, 2024

There are two main changes in this PR:

  • The method used to calculate the ETA now looks at the most recent updates to the percentage and works off how long they took (underlying this is a class that helps estimate a time to completion based on a timed window, capped at a maximum number of samples to constrain memory usage, and with a final fallback position should the window become near-exhausted).
  • Added a reset_eta parameter to ProgressBar.update so that the dev can signal that the ProgressBar is being reused in some way so that the ETA calculation will always start fresh (fixes ProgressBar ETA goes adrift if a progress bar is "reused" #4096)

The PR also includes a fix to what seemed to be a bug with how ETAStatus stopped refreshing itself; in that before now it looks like it never did (there looks to have been some confusion over using its own timer, but trying to stop the updates via auto_refresh).

davep added 4 commits February 5, 2024 09:48
Rather than calculating just from when progress started, instead use a
recent set of samples; this means that the progress bar's ETA should better
reflect what's happening around now, rather than what's happened overall.

This have almost no impact on constant time progress; but should better
reflect (for example) the state of affairs when dealing with accelerating or
decelerating progress bars.
I had it in my head it was a reference to the ETA widget; it isn't; it's a
flag.
@davep davep linked an issue Feb 5, 2024 that may be closed by this pull request
@davep davep self-assigned this Feb 5, 2024
@davep davep added bug Something isn't working enhancement New feature or request labels Feb 5, 2024
davep added 22 commits February 5, 2024 11:57
It looks like the ETAStatus widget has never properly stopped its update,
because it was setting up a timer to do the refresh, but assigning None to
auto_refresh, which it wasn't actually using.
There's enough of a different in the calculation between Windows and other
operating systems that it's a problem which doesn't relate to this actual
test.
This should hopefully solve the Windows vs the World discrepancy during
snapshot testing.
I keep wanting to use distance_covered as if it's the total distance
covered; overall; period. It isn't. It's the distance covered by the window.
So here I rename it so I can't ever get confused about that again.
Swapping out the previous simple deque of times for the TimeToCompletion
class. Still working off the percentage; mainly because this should work
just as well as any other approach (the percentage is just a modified
version of the actual position and total after all), but also because it's
easier to plug it in this way and test.

From here on in I think I can make the tests work again because I should be
able to reach into the ETAStatus widget and plug a different class into
_samples that returns a constant desired value for the estimated time to
completion.
This may change again, especially if I add the sliding window of time.
In anticipation of adding a time window too.
If we've hit a point where all the samples we have to hand are too old,
given a time window to look at, fall back to everything we have to hand to
make a best-efforts estimate. This feels better than just shrugging.
davep added 13 commits February 8, 2024 14:27
One edge case that was bugging me was how the progress bar ETC calculation
would collapse if we ran out of samples in the time window (or, more
correctly, whittled it down to just one sample).

Rich's progress bar goes back to showing --:--:-- in this situation; which
is a fair estimation in that it's a shrug because there's nothing useful to
work off now. The problem here though is that people have often complained
that it has "stopped" or is "broken" in some way (it isn't, it's starved).

So, what I was wanting to do was to keep something going, that was ballpark
correct, looked active, and would become increasingly pessimistic. The
thinking here is that if someone sees a big number slowly getting bigger,
they'd blame their underlying process and not the design of the ProgressBar
itself.

Here then is an approach to that situation where we keep a reference to the
most-recently-dropped sample and, if we've only got the one sample left, we
bring the older one back into consideration.

This gives a nice "big number creeps up I think your download has stalled my
friend" show.
@davep davep changed the title WiP: Calculate ProgressBar ETA from most recent updates rather than for whole period Calculate ProgressBar ETA from most recent updates rather than for whole period Feb 12, 2024
@davep davep marked this pull request as ready for review February 12, 2024 09:55
Copy link
Member

@darrenburns darrenburns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asked a few questions.

Also, not enough walruses.

Comment on lines 22 to 25
def test_no_go_past_end() -> None:
"""It should not be possible to go past the destination value."""
with pytest.raises(ValueError):
TimeToCompletion(1).record(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I feel clamping would be safer here compared to raising an exception. If a TimeToCompletion(100) receives a value of 100.0001, do we want things to blow up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be convinced by that, although I wonder if given the lower-level nature of this code, it should be up to the caller to clamp its values?

Copy link
Collaborator

@willmcgugan willmcgugan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there may be too much emphasis on keeping samples.

We only need to discard old samples when we calculate the speed.

"""
if samples := self._samples:
# Trim off any "too old" samples.
oldest_time = samples[-1].moment - self._time_window_size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the pruning really occur from the last sample, or should it occur from the current time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me that it works off the last-known sample rather than off "now", as what we're making here is a heuristic for figuring out a likely time to completion, and going off "last best" makes more sense to me than going off nothing.

Self.
"""
self._samples.append(sample)
return self._prune()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to call prune on every sample?

Could we defer pruning to when we need the samples? i.e. if there are 1000 samples between refreshes, could we prune just once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could go either way; there could be 1,000 "refreshes" between samples too. It feels to me like, on balance, code that is showing an estimated time is more likely to get new estimates than it is to record new samples.

return self._elapsed

@property
def _speed_now(self) -> float:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the need for a "speed" and a "speed now".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible I could prune this back now, but at least for coding/testing purposes it's been a lot easier to reason about "what's the estimate for the data in the window, as of the last recording point" vs "what's the estimate as of now?"

@davep
Copy link
Contributor Author

davep commented Mar 11, 2024

Seems we're going to go with a very totally different PR, so closing this unreviewed.

@davep davep closed this Mar 11, 2024
@davep davep deleted the progress-progress branch March 11, 2024 08:39
@davep davep removed a link to an issue Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants