-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use large threads for processing Boogie programs #710
Use large threads for processing Boogie programs #710
Conversation
I'm a fan of the general approach, and it's nice there's already a single choke point where we're already delegating to another thread anyway. My main concern is the dependency we're taking on a random solution that isn't cleanly distributed or necessarily maintained, but I could be convinced if this is literally the only existing solution out there (with a LOT more dedicated testing). |
@@ -0,0 +1,325 @@ | |||
// Code taken from https://github.com/ChadBurggraf/parallel-extensions-extras/blob/ec803e58eee28c698e44f55f49c5ad6671b1aa58/TaskSchedulers/WorkStealingTaskScheduler.cs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, is this really the best option out there for this functionality? No commonly used nuget package out there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, and a NuGet package of this wouldn't work since I had to edit to code to enable customizing the stack size. Doing so is apparently something that nobody does since there's no library code anywhere to do it :-D
Anyways, I replaced this code with something much simpler written by hand. I'm guessing it's less performant but I think that's fine - this is a temporary solution anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I just started looking at the comment trail on this PR and it took a while for things to make sense. It sounds like @keyboardDrummer was using a copied version of some publicly-available code but has now changed the approach to a simple threadpool class with a large stack size. I saw the new class and indeed it looks quite simple.
Is there any reason to gate the use of the new threadpool under a command-line flag? In other words, do you think it is possible for the new threadpool to be worse than the custom threadpool in some way. If so, might be better to use a command-line option, especially since the solution is temporary anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to gate the use of the new threadpool under a command-line flag? In other words, do you think it is possible for the new threadpool to be worse than the custom threadpool in some way. If so, might be better to use a command-line option, especially since the solution is temporary anyway.
Note that older Boogies, like 2.9.4, also spawned 16MB stack threads, as you can see here: https://github.com/boogie-org/boogie/blob/v2.9.4/Source/ExecutionEngine/ExecutionEngine.cs#L433. They spawned one thread for each Boogie implementation: https://github.com/boogie-org/boogie/blob/v2.9.4/Source/ExecutionEngine/ExecutionEngine.cs#L1018
which could be garbage collected after an implementation had finished running, so the memory usage was limited to 16MB * implementations_running_concurrently.
I've updated this PR so it also spends 16MB * implementations_running_concurrently
of memory, so the memory profile should be similar to Boogie 2.9.4.
Regarding running time performance, I think the behavior here will be better than 2.9.4 because threads are re-used between implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please compare the running time of the current Boogie and your new version on a few large-ish examples, potentially generated from Dafny. Another option could be to compare the running time for the entire Boogie regression testsuite. Just a sanity check. I expect your intuition that there wouldn't be any significant difference to be valid.
Regardless of the outcome here, let's also cut an explicit issue for the root cause of traversing ASTs recursively, which your dafny-lang/dafny#2124 work might be the deeper solution for. |
520bd52
to
1aab6f6
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, thanks - I'm much more comfortable with this! Just a couple of comment suggestions.
private Task BlockUntilTaskIsAvailable() | ||
{ | ||
isWorkAvailable.Wait(); | ||
queue.TryDequeue(out var task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth a comment saying we don't need to check the return value or if task
is null because of the separate semaphore (too bad ConcurrentQueue
doesn't have a plain blocking Dequeue
method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the implementation to use AsyncQueue
, which has the behavior you were hoping for from ConcurrentQueue
.
@@ -26,10 +26,14 @@ public record ProcessedProgram(Program Program, Action<VCGen, Implementation, Ve | |||
|
|||
public class ExecutionEngine : IDisposable | |||
{ | |||
private static readonly CustomStackSizePoolTaskScheduler largeThreadScheduler = CustomStackSizePoolTaskScheduler.Create(16 * 1024 * 1024, Environment.ProcessorCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth a quick comment stating why we need such a deep stack, maybe point to #711
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Curious what the core underlying problem is---traversal of deep Boogie ASTs or traversal of deep VCExpr's? |
Traversal of deep VCExprs is what we've run into in practice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So little code! Love it! :)
Fixes Dafny issue dafny-lang/dafny#3803 and dafny-lang/dafny#3819
Testing