-
-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding local work requesting scheduler that is based on message passing internally #5845
Conversation
b9472b7
to
a98609d
Compare
First performance measurements show an overall improvement of up to 5-10%. Very promising! |
a98609d
to
26bbc03
Compare
Performance test reportHPX PerformanceComparison
Info
Comparison
Info
Comparison
Info
Explanation of Symbols
|
2a7d2fc
to
753df97
Compare
a7f7496
to
511b157
Compare
I started looking at the performance of this scheduler a bit (on the single-NUMA-domain case) Directly replacing our current scheduler, it performs worse in the general case (at least on the algorithm benchmarks, which I had handy), In the case of a bulk-execution of relatively uniform work, stealing is very limited, because our scheduling is already quite balanced anyways. So this would explain the performance deficit, as we are possibly taking on a larger overhead (more complex scheduler) for small benefit (a few non-cache-disrupting steals). And then, there is the fact that we are only able to poll the steal requests in between thread execution, which introduces some latency in responding to steal requests. I will try to produce a best-case scenario for this scheduler though, even if it is just as a proof of concept. Edit: it seems to perform much better on few cores, which could suggest a large number of failed stealing attempts when we have many cores. We can experiment with sending the steal request where there is actual work. |
I did see improvements on tests that ran a large amount of separate tasks (like fibonacci). For uniform iterative parallelism the benefit would probably be small. |
f910674
to
a282657
Compare
@hkaiser You'll still have to add the fix in 1d_stencil_8 |
I thought that was fixed by: #6294 |
@hkaiser No, I think it cannot be fixed that way (that's why I had gotten a bit confused there). It's a plain old out-of-scope situation. I think that makes sense, unless I have some misconception about the role of sliding_semaphore |
6b926b4
to
48b364b
Compare
@Pansysk75 I have applied the change to the use of the |
Seems like tasks get stuck in the stealing queue, re-applying this fix solves the issue: Did you do something else to try solve that issue, or was this fix accidentally left behind? |
Thanks a lot - not sure how that got lost. Much appreciated! |
… internally - Using uniform_int_distribution with proper bounds - Removing queue index from thread_queues as it was unused - flyby: remove commented out options from .clang-format - Renaming workstealing --> workrequesting - Adding adaptive work stealing (steal half/steal one) - this makes this scheduler consistently (albeit only slightly) faster than the (default) local-priority scheduler - Adding LIFO and FIFO variations of local work-stealing scheduler - flyby: fixing HPX_WITH_SWAP_CONTEXT_EMULATION - flyby: minor changes to fibonacci_local example - Adding high- and low- priority queues - flyby: cache_line_data now does not generate warnings errors if padding is not needed - Adding bound queues - flyby: using cache_line_data for scheduler states
cb0e449
to
02e2b4b
Compare
Performance test reportHPX PerformanceComparison
Info
Comparison
Info
Comparison
Info
Explanation of Symbols
|
I think this is good to go now. Thanks again @Pansysk75! |
bors merge |
Build succeeded! The publicly hosted instance of bors-ng is deprecated and will go away soon. If you want to self-host your own instance, instructions are here. If you want to switch to GitHub's built-in merge queue, visit their help page. |
This adds a new experimental work-requesting scheduler to the list of existing schedulers