Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage on system time change #89

Open
jetomit opened this issue Jun 23, 2023 · 8 comments
Open

High CPU usage on system time change #89

jetomit opened this issue Jun 23, 2023 · 8 comments
Labels

Comments

@jetomit
Copy link

jetomit commented Jun 23, 2023

Since Guix upgraded to guile-fibers 1.3.1, shepherd hangs shortly after boot on systems without a RTC. I believe the problem comes from using get-internal-real-time in the guile-fibers timer wheel implementation. After NTP corrects the system time, this function returns a much larger value, and the CPU load (for one core) goes to 100%.

Profiling suggests the process spends the CPU time in timer-wheel-advance!, so I imagine it is trying to tick through a five-year time diff. I tried increasing the system time manually by N days, which causes shepherd to be unresponsive (e.g. to herd status) for about N×5 seconds. I observed similar behavior with the example from guile-fibers readme.

Replacing all instances of (get-internal-real-time) with (clock-gettime 1) in guile-fibers, and reconfiguring the system with the patched package, fixes this problem. I think using a monotonic clock makes sense, but there is probably a cleaner / more portable way to do it.

Thanks!

@civodul
Copy link
Collaborator

civodul commented Jul 16, 2023

Hi @jetomit!

Using CLOCK_MONOTONIC as you suggest seemed like the right choice to me so I started working on it. However, the API of (fibers timers) as well as schedule-task-at-time expect "internal time units"; changing timer-wheel to use CLOCK_MONOTONIC would affect those interfaces similarly, which is not acceptable.

Instead we should probably change timer-wheel-advance! to cope with large gaps.

@wingo, WDYT?

Thanks!

@civodul
Copy link
Collaborator

civodul commented Aug 21, 2023

@jetomit Here's a proposed workaround on the Guix side: https://issues.guix.gnu.org/64966

@jetomit
Copy link
Author

jetomit commented Aug 25, 2023

Here's a proposed workaround on the Guix side: https://issues.guix.gnu.org/64966

This would work for aarch64, but I also encounter this issue on armhf and x86_64 systems. This happens whenever system time is pushed forward by a significant amount (a day or more), either by ntpd or manually.

As I understand it, guile’s internal-time-units only depends on the platform and is the same for all clock types. The bigger problem with using CLOCK_MONOTONIC might be that it doesn’t count time the system is suspended, which would probably break stuff.

@emixa-d emixa-d added the bug label Sep 4, 2023
@civodul
Copy link
Collaborator

civodul commented Oct 23, 2023

Another report of shepherd spinning once system time has changed: https://issues.guix.gnu.org/66684

@civodul
Copy link
Collaborator

civodul commented May 22, 2024

@wingo Hello! Did you have a chance to look into that? I'd be happy to try and implement any suggestions you might have (I'd love to do that before Shepherd 1.0 is out).

@wingo
Copy link
Owner

wingo commented May 22, 2024

Took a look at it but it requires a bit of concentration to not introduce bugs :) Do have a look if you like!

@RJSent
Copy link

RJSent commented May 25, 2024

Just posting for the records another example "in the wild" of someone working around this issue. https://issues.guix.gnu.org/70892#3

Thanks for all your hard work! 😄

@jfrederickson
Copy link

Apologies if anything is inaccurate in this comment, but I had a look through the timer code to see if I could think of anything. I still don't fully grok the control flow, but from a cursory read it does look like timer-wheel-advance! iterates through all layers of the timer wheel even when "skipping ahead" across a large gap.

I wonder if a solution to this might be to skip iterating through the inner wheels if:

  1. The current slot in an outer wheel contains no events, and
  2. The current time is greater than the start time of the current slot plus the time duration of that slot

This feels like it should allow you to pretty rapidly "warp" through a large space of mostly unset timers (by climbing up the hierarchy and skipping slots at the lowest granularity possible), which is what you would expect to see if the clock has suddenly jumped years into the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants