-
-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: limit the rate of interruption checks #942
base: main
Are you sure you want to change the base?
Conversation
3a4f145
to
ec6c05c
Compare
With this PR, on macOS, I get: RStudio:
Command line:
R GUI:
This is basically the same performance as with interruption checks disabled. Without this PR, I get: RStudio:
Command line:
R GUI:
|
Using this PR, but not limiting timing checks to every 4th call gives me this timing:
This shows that in general timing checks are non-negligible. |
One problem with this timing approach is that it assumes that the interruption handler would be called at least once per second, but that may not always be the case. |
If we do go with this approach (which is something to be decided—I'm not 100% comfortable with this), the best function to use on Windows is likely QueryPerformanceCounter |
I wonder why RStudio is so slow compared to the CLI or the R GUI. Maybe there's some inter-thread synchronization involved? (R code is probably running on its own thread but it needs to call back to the GUI thread to check whether there was an interruption, and that call may be blocking until the next iteration of the event loop of the GUI thread). Anyhow, I'm okay with this PR and I don't think we could do any better than this given the constraints involved. |
I'm inclined to propose throttling interruption checks to 10 msec or even slower - no one would notice the difference, but for instance the resolution of the coarse monotonic clock on Linux (Ubuntu 20.04, kernel version 5.4) is 4 msec according to a response on SO, and if the kernel happens to round upwards to the nearest 4 msec, then the rate limiting won't have any effect at all. With 10 msec we are more likely to land in a regime that is coarser than the granularity of the coarse monotonic clock. |
Also, maybe we should move this to the C core so that all interfaces benefit from it? |
I don't think that's true (I actually saw that response on SO). The code measures from the last interruption check until now, not from the last call to the interruption handler to now. Right now the rate is set to 1 ms. That means that the check will happen at most once per millisecond, but if the clock granularity is 4 ms, then it will happen only every 4 ms and that's perfectly fine. My point is that we don't need to set a rate that is lower than the clock resolution. Nor can we, since different platforms might have very different resolutions. We can't possibly aim to check resolutions, and adapt. The code has to be such that it works regardless resolutions. Also, good to be aware: what What I am concerned about is:
I really don't want to do that. The interruption handler in Python and Mathematica is faster than the call to the fastest timer on macOS! This code would hurt those interfacers, not benefit them. Plus consider how non-portable all this is, and how little we are able to test such timing issues on exotic hardware. The C core is very portable, and I don't want to sacrifice that. So, all in all, I am not quite happy with this, but it's the best I could come up with given R's very slow interruption mechanism. |
If we want to take this upstream to fix RStudio, what's a good way to replicate it? I still propose thinning out calls to the interruption handler on a case-by-case basis. Nothing's faster than a function not being called at all. Can we somehow do this as well for the problem here? The "thinning out" by a factor of four, part of this PR, is it good for all cases? We started with 10 IIRC. Alternatively: Could we somehow make the "thinning out" adaptive? Start with checking every x iterations, check timing, and adapt to check less (or more) frequently? |
Current Aviator status
This pull request is currently open (not queued). How to mergeTo merge this PR, comment
See the real-time status of this PR on the
Aviator webapp.
Use the Aviator Chrome Extension
to see the status of your PR within GitHub.
|
The best solution, with zero cost (in the ideal case) for the interrupt checks, would be to run the igraph computation in a separate thread. The main thread would only be responsible for interruption checks and for memory allocation, and sleep the rest of the time. Let's not do timers but keep this open for now as a reference. I might be able to raise this with Posit based on the original example in #942. |
Picking this up. In duckdb, interruption now works by temporarily installing a signal handler. Is this something that would work for igraph? Are we signal-safe? |
Do you mean things like SIGABRT? Is that even available on Windows? I am not very familiar with this topic. Can you give me references to help me research/understand what "signal-safe" means? Any input here @ntamas ? |
With signal-safe, I mean that the library is left in a state that is suitable for further invocations. It works on Windows too. |
Host languages usually have their own signal handlers for SIGINT - for instance, in Python, SIGINT generates a KeyboardInterrupt. And it causes countless problems in threaded contexts where it cannot be predicted which thread the SIGINT will be delivered to. Anyhow, my assumption is that signal handling is the responsibility of the host language and igraph as a library should simply ask the host language whether an interruption request has been delivered. Messing with SIGINt behind the back of the host language is probably just calling for trouble. How would this work in R's case? Whenever we enter an igraph function from R, we overwrite R's signal handler with ours, and then swap it back whenever we return to R (either from the function itself or because we are calling a callback that is implemented in R)? |
I agree that messing with the signal handler is a special kind of fun. It works for duckdb: we install our own SIGINT handler, this handler calls a function in duckdb that tells it to abort. I now see how my question about signal safety is ill-phrased -- we don't need it, we're catching SIGINT and just setting a flag. Not sure we need to reset the signal handler for callbacks. We'd just wait for the callback to finish and catch the interrupt back in igraph land. |
We could add this to the The upside is also that no active interrupt check is needed. |
This is a suggestion for improving on (if not fixing) #940.
It uses a combination of limiting the number of times the function runs (currently every 4th call) and limiting the number of checks to a millisecond rate. Both are needed, as retrieving the time can be expensive.
I tested this quite a bit on macOS, and I think the current parameters are fine. The timing check is not cheap (even though using the fastest available timing method I could find), but limiting it to every 4th call helps.
I included code for Linux and FreeBSD but I could not test either. Testing is important: we need to make sure that the timing calls are not too expensive. Can you help with this for Linux @Antonov548 @krlmlr ? Here's what needs to be done:
return IGRAPH_SUCCESS
at the beginning of the interrupt handler.Note that
CLOCK_MONOTONIC_COARSE
is platform-specific, and the Linux version is unrelated to the FreeBSD one. They just happen to have the same name.If we go ahead with this approach, we need to add a Windows version as well (perhaps @Antonov548 can try?)