Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for C++17 parallel execution? #16

Open
charles-randall opened this issue Apr 18, 2021 · 5 comments
Open

Support for C++17 parallel execution? #16

charles-randall opened this issue Apr 18, 2021 · 5 comments

Comments

@charles-randall
Copy link

As a drop-in replacement for std::sort, do you have plans to add support for parallel execution in pdqsort?

As I'm sure you know, sort can now do this,

std::sort(std::execution::par, lines.begin(), lines.end());

@orlp
Copy link
Owner

orlp commented Apr 20, 2021

I don't have plans currently. I would have to do some research on modern standard C++ parallel programming, and there are some tricky things in pdqsort if you want to parallelize. In particular it is assumed the left partition is recursed on first.

I currently don't have the time to do this.

@charles-randall
Copy link
Author

charles-randall commented Apr 20, 2021

Understood. Thanks for the response.

It may be helpful if you could to outline an approach that someone else could investigate and implement (like you've already pointed out with the left partition assumption). Even a just a few bullet points to get someone started.

As always YMMV, but just to give you an idea of what I'm seeing in my application, here's a simple comparison of the "sort phase" (a timer around std:sort or pdqsort) consisting of 25M text UUIDs,

~38 seconds - std::sort
~17 seconds - std::sort parallel 
~25 seconds - pdqsort      

This is with g++'s C++17 support and Intel's TBB implementation for parallel std::sort (the only thing that seems to work right now for g++; e.g., g++ --std=c++17 ..... -ltbb).

To be clear, pdqsort is far more CPU efficient because the parallel sort is at 100% CPU utilization on 4 cores for those 17 seconds whereas pdqsort is at 100% on only one core for 25 seconds (and pdqsort beats the single-threaded std::sort handily).

FYI

@stumarcus314
Copy link

Have you tried the parallel sort implemented by parasort.h? https://github.com/baserinia/parallel-sort
On line 36 of parasort.h, you could try to replace std::sort with pdqsort. Line 36 sorts a subset of elements within a single thread.

@alugowski
Copy link

alugowski commented Jan 25, 2024

You can parallelize pdqsort with poolSTL's pluggable_sort:

poolstl::pluggable_sort(poolstl::par, lines.begin(), lines.end(), pdqsort);

pluggable_sort performs the first few steps of quicksort until there are enough partitions to fill available cores then delegates each partition to the specified sequential method, in this case pdqsort. Works very well.

Sorting 100M integers (M1 chip, 6+2 cores):

6903 ms - std::sort
2313 ms - pdqsort
1323 ms - std::sort(std::execution::par) uses TBB's sort
1278 ms - poolstl::pluggable_sort(std::sort)
 697 ms - poolstl::pluggable_sort(pdqsort)

@GabTux
Copy link

GabTux commented Apr 13, 2024

I created parallel quicksort, which is highly inspired by pdqsort: https://github.com/GabTux/PPQSort

ppqsort::sort(ppqsort::execution::par, input.begin(), input.end());

Benchmarks seem promising; see the repository. IPS4o seems the fastest overall but it needs external dependencies (oneTBB). So, the PPQSort, cpp11sort, or poolSTL are great choices for a fast parallel algorithm without external dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants