Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using thread pool #140

Draft
wants to merge 82 commits into
base: development
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
ce5bee7
Refactoring
joserochh Sep 16, 2022
8ffa547
Fixing compile time flag
joserochh Sep 21, 2022
0dba78f
Adding benchmarks and sleep mode
joserochh Sep 23, 2022
63e3820
Adding Unit tests
joserochh Sep 28, 2022
58c0863
Fixing debug logging
joserochh Sep 28, 2022
678c074
Finishing unit testing
joserochh Oct 4, 2022
4d5743e
Refactoring and cleanup
joserochh Oct 5, 2022
6c8f560
Changing uinnt to uint64_t
joserochh Oct 5, 2022
4c86875
Removing compare exchange
joserochh Oct 5, 2022
bce1453
Acommodating tests for machine with small number of threads
joserochh Oct 6, 2022
fbaec64
Reducing max tests to 80 percent
joserochh Oct 7, 2022
32e0d54
Acommodating tests
joserochh Oct 7, 2022
319dc06
Disabling Multithreading at build if prrocessor count is low
joserochh Oct 7, 2022
b2edd13
Fixing HEXL_MULTI_THREADING setup
joserochh Oct 7, 2022
ef58dff
Fixing Typo
joserochh Oct 7, 2022
5abf225
Fixing test to max concurrency
joserochh Oct 7, 2022
59f7292
Trying stopping threads before creating new thread object
joserochh Oct 7, 2022
b4052b1
Last chance before leaving test foor machines with more cores
joserochh Oct 7, 2022
6059ad8
tAccounting for small odd thread pools as tasks are added in pairs
joserochh Oct 7, 2022
0b6ebce
Fixing iterations tests for even number of threads
joserochh Oct 7, 2022
a1edbf9
Final Fix for today
joserochh Oct 7, 2022
05d70d2
Skippin benchmarks when no threads available
joserochh Oct 8, 2022
36c59a0
Fix
joserochh Oct 8, 2022
96918af
Removing conflicting tests
joserochh Oct 10, 2022
c058b3a
Reducing tests max thread usage 60%
joserochh Oct 10, 2022
a749008
Applying review comments
joserochh Oct 10, 2022
8d454b1
splitting test cases
joserochh Oct 10, 2022
381bef8
Adding libs to test util
joserochh Oct 10, 2022
ec167fc
Applying some of the review comments
joserochh Oct 11, 2022
bd8f0ac
Applying fixes
joserochh Oct 12, 2022
018b05b
Reverting use of charconv
joserochh Oct 12, 2022
666435a
Fixing use of HEXL_VLOG now that pool is on stack
joserochh Oct 12, 2022
ca05910
Moving thread function to an object
joserochh Oct 17, 2022
4a2e20c
Not possible to use for_each_n on ubuntu 18.04
joserochh Oct 17, 2022
22a0356
Removing debug print
joserochh Oct 17, 2022
a94e16c
Fix sleep mode race condition
joserochh Oct 17, 2022
fe41e28
Modifying test to run on at least two threads
joserochh Oct 18, 2022
73fd427
Adding three benchmarks
joserochh Oct 18, 2022
74952de
Benchmarking
joserochh Oct 19, 2022
e0f0841
Adding test and fixes
joserochh Oct 21, 2022
bf6fc13
Merge branch 'custom_thread_pool' into custom_thread_pool_benchmarks
joserochh Oct 21, 2022
40af9fc
Adding extra test
joserochh Oct 22, 2022
67b6ac2
Replacing test for more exhausting test
joserochh Oct 24, 2022
ec7ab0f
Fixing debug mode
joserochh Oct 25, 2022
9390c52
Limiting nested test to 6 levels
joserochh Oct 25, 2022
dff4093
Merge branch 'custom_thread_pool' into custom_thread_pool_benchmarks
joserochh Oct 25, 2022
bb79298
Merge branch 'custom_thread_pool' into custom_thread_pool_benchmarks
joserochh Oct 25, 2022
2541ad0
Fix to NTT recursive calls arguments
joserochh Oct 26, 2022
b821868
Adding UNUSED
joserochh Oct 27, 2022
3de5728
Fix and adressing of PR's Review
joserochh Nov 2, 2022
fe42392
Adding extra test
joserochh Nov 3, 2022
88d2018
Merge branch 'custom_thread_pool' into custom_thread_pool_benchmarks
joserochh Nov 3, 2022
6e9f2d1
Handling odd input sizes
joserochh Nov 8, 2022
66a4944
Extending wait time
joserochh Nov 8, 2022
39be79b
Extra parallelization on NTT
joserochh Nov 8, 2022
7cc012b
Merge branch 'custom_thread_pool' into custom_thread_pool_benchmarks
joserochh Nov 8, 2022
0040292
Compliant to chunk ranges
joserochh Nov 8, 2022
16f8b41
Adding stress tests
joserochh Nov 15, 2022
e41c359
Fixing Paralell Inv NTT
joserochh Nov 15, 2022
b6f8f95
Merge branch 'custom_thread_pool' into custom_thread_pool_benchmarks
joserochh Nov 15, 2022
dca45e3
Adding forgotten file
joserochh Nov 17, 2022
8cabf76
Refactor Parallel calls
joserochh Nov 17, 2022
70f3854
Merge branch 'custom_thread_pool' into custom_thread_pool_benchmarks
joserochh Nov 17, 2022
f2b92b4
Refactor Parallel calls 2
joserochh Nov 22, 2022
20bccd4
Removing unneeded globals
joserochh Nov 24, 2022
617ec71
Removing old Global Vars references
joserochh Nov 24, 2022
a59752e
Merge branch 'custom_thread_pool' into custom_thread_pool_benchmarks
joserochh Nov 29, 2022
52f977e
fixing sign and lock error
joserochh Nov 29, 2022
7083d8d
pre-commit
joserochh Nov 29, 2022
ada737a
Merge branch 'custom_thread_pool' into custom_thread_pool_benchmarks
joserochh Nov 29, 2022
2f6a6b1
Using new parallel depth function
joserochh Nov 29, 2022
4b4a724
Merge branch 'development' into custom_thread_pool_benchmarks
joserochh Dec 12, 2022
2cde236
Adding thread dependency to HEXL example
joserochh Dec 14, 2022
99c5d54
Fixing thread dependency for pkgconfig example
joserochh Dec 15, 2022
f0f00a6
Fixing windows CI error
joserochh Dec 15, 2022
d3786fe
some cleaning
joserochh Dec 15, 2022
2b2d714
More cleaning and correction
joserochh Dec 15, 2022
7b9dd56
Updated README for multithreading documentation
joserochh Dec 20, 2022
b0e7864
fix Readme
joserochh Dec 20, 2022
2f2422b
Reducing minnumber of threads on CMake
joserochh Dec 20, 2022
bedf1c1
Set minimun to 3
joserochh Dec 20, 2022
3214a9b
Fix on parallel depth variable
joserochh Dec 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@ BasedOnStyle: Google
Language: Cpp
DerivePointerAlignment: false
PointerAlignment: Left
# Let cpplint enforce header order
SortIncludes: true
25 changes: 23 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ else()
set(HEXL_DEBUG OFF)
endif()



set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
Expand All @@ -55,17 +57,35 @@ set(CMAKE_INSTALL_RPATH "\$ORIGIN")
#------------------------------------------------------------------------------

option(HEXL_BENCHMARK "Enable benchmarking" ON)
option(HEXL_COVERAGE "Enables coverage for unit tests" OFF)
option(HEXL_COVERAGE "Enable coverage for unit tests" OFF)
option(HEXL_DOCS "Enable documentation building" OFF)
option(HEXL_EXPERIMENTAL "Enable experimental features" OFF)
option(HEXL_SHARED_LIB "Generate a shared library" OFF)
option(HEXL_TESTING "Enables unit-tests" ON)
option(HEXL_TESTING "Enable unit-tests" ON)
option(HEXL_TREAT_WARNING_AS_ERROR "Treat all compile-time warnings as errors" OFF)
option(HEXL_MULTI_THREADING "Enabling multithreading" ON)

if (NOT HEXL_FPGA_COMPATIBILITY)
set(HEXL_FPGA_COMPATIBILITY "0" CACHE INTERNAL "Set FPGA compatibility mask" FORCE)
endif()


if (HEXL_MULTI_THREADING STREQUAL "ON")
include(ProcessorCount)
ProcessorCount(N)
if(NOT N EQUAL 0)
if(N GREATER 2) # Minimum for testing
message(STATUS "ProcessorCount: ${N}")
else()
message(WARNING "Small Processor Count.")
set(HEXL_MULTI_THREADING OFF)
endif()
else()
message(WARNING "Not able to get Processor Count.")
set(HEXL_MULTI_THREADING OFF)
endif()
endif()

message(STATUS "CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}")
message(STATUS "CMAKE_C_COMPILER: ${CMAKE_C_COMPILER}")
message(STATUS "CMAKE_CXX_COMPILER: ${CMAKE_CXX_COMPILER}")
Expand All @@ -78,6 +98,7 @@ message(STATUS "HEXL_SHARED_LIB: ${HEXL_SHARED_LIB}")
message(STATUS "HEXL_TESTING: ${HEXL_TESTING}")
message(STATUS "HEXL_TREAT_WARNING_AS_ERROR: ${HEXL_TREAT_WARNING_AS_ERROR}")
message(STATUS "HEXL_FPGA_COMPATIBILITY: ${HEXL_FPGA_COMPATIBILITY}")
message(STATUS "HEXL_MULTI_THREADING: ${HEXL_MULTI_THREADING}")

hexl_check_compiler_version()
hexl_add_compiler_definition()
Expand Down
37 changes: 36 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ For convenience, they are listed below:
| HEXL_DOCS | ON / OFF | OFF | Set to ON to enable building of documentation |
| HEXL_TESTING | ON / OFF | ON | Set to ON to enable building of unit-tests |
| HEXL_TREAT_WARNING_AS_ERROR | ON / OFF | OFF | Set to ON to treat all warnings as error |
| HEXL_MULTI_THREADING | ON / OFF | ON | Set to ON to enable multithreading |

### Compiling Intel HE Acceleration Library
To compile Intel HE Acceleration Library from source code, first clone the
Expand Down Expand Up @@ -262,7 +263,41 @@ documentation](https://github.com/amrayn/easyloggingpp#application-arguments)
for more details.

## Threading
Intel HE Acceleration Library is single-threaded and thread-safe.
Intel HE Acceleration Library is multi-threaded and thread-safe.
`-DHEXL_MULTI_THREADING=OFF` can be used to disable multithreading.

**Note**, when using the Intel HE Acceleration Library from a multi-threaded
application only one top level thread at the time will have access to the
thread pool. For any other top level thread the use of the thread pool will
be bypassed to be executed sequentially.

By default, the thread pool will consist of 8 threads, but a different number
of threads can be set by using the following environment variable.

Example (in bash):
```bash
export HEXL_NUM_THREADS=<integer>
```

If the default value or `HEXL_NUM_THREADS` are bigger than C++'s
`N = hardware_concurrency()` then the thread pool will consist of
`N` threads only. In the case CMake's `ProcessorCount` is less than 3 then
the thread pool will be automatically disabled.

### NTT's recursive parallelization
By default, the NTT will run two levels of parallel recursion. A different
number of levels can be set by using the following environment variable.

Example (in bash):
```bash
export HEXL_NTT_PARALLEL_DEPTH=<integer>
```

Independently of the parallel depth value the recursive parallelization
will be limited by the total number of threads in the thread pool.

**Note**, NTT's recursive parallelization is controlled separately of
loop parallelization as the performance of both methods do not scale the same.

# Community Adoption

Expand Down
1 change: 1 addition & 0 deletions benchmark/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ set(SRC main.cpp
bench-eltwise-mult-mod.cpp
bench-eltwise-sub-mod.cpp
bench-eltwise-reduce-mod.cpp
bench-thread-pool.cpp
)

if (HEXL_EXPERIMENTAL)
Expand Down
24 changes: 20 additions & 4 deletions benchmark/bench-eltwise-add-mod.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,11 @@ BENCHMARK(BM_EltwiseVectorVectorAddModNative)
->Unit(benchmark::kMicrosecond)
->Args({1024})
->Args({4096})
->Args({16384});
->Args({16384})
->Args({32768})
->Args({65536})
->Args({131072})
->Args({262144});

//=================================================================

Expand All @@ -61,7 +65,11 @@ BENCHMARK(BM_EltwiseVectorVectorAddModAVX512)
->Unit(benchmark::kMicrosecond)
->Args({1024})
->Args({4096})
->Args({16384});
->Args({16384})
->Args({32768})
->Args({65536})
->Args({131072})
->Args({262144});
#endif

//=================================================================
Expand All @@ -85,7 +93,11 @@ BENCHMARK(BM_EltwiseVectorScalarAddModNative)
->Unit(benchmark::kMicrosecond)
->Args({1024})
->Args({4096})
->Args({16384});
->Args({16384})
->Args({32768})
->Args({65536})
->Args({131072})
->Args({262144});

//=================================================================

Expand All @@ -110,7 +122,11 @@ BENCHMARK(BM_EltwiseVectorScalarAddModAVX512)
->Unit(benchmark::kMicrosecond)
->Args({1024})
->Args({4096})
->Args({16384});
->Args({16384})
->Args({32768})
->Args({65536})
->Args({131072})
->Args({262144});
#endif

} // namespace hexl
Expand Down
12 changes: 10 additions & 2 deletions benchmark/bench-eltwise-cmp-add.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,11 @@ BENCHMARK(BM_EltwiseCmpAddNative)
->Unit(benchmark::kMicrosecond)
->Args({1024})
->Args({4096})
->Args({16384});
->Args({16384})
->Args({32768})
->Args({65536})
->Args({131072})
->Args({262144});

//=================================================================

Expand All @@ -61,7 +65,11 @@ BENCHMARK(BM_EltwiseCmpAddAVX512)
->Unit(benchmark::kMicrosecond)
->Args({1024})
->Args({4096})
->Args({16384});
->Args({16384})
->Args({32768})
->Args({65536})
->Args({131072})
->Args({262144});
#endif

} // namespace hexl
Expand Down
12 changes: 10 additions & 2 deletions benchmark/bench-eltwise-cmp-sub-mod.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,11 @@ BENCHMARK(BM_EltwiseCmpSubModNative)
->Unit(benchmark::kMicrosecond)
->Args({1024})
->Args({4096})
->Args({16384});
->Args({16384})
->Args({32768})
->Args({65536})
->Args({131072})
->Args({262144});

//=================================================================

Expand All @@ -59,7 +63,11 @@ BENCHMARK(BM_EltwiseCmpSubModAVX512_64)
->Unit(benchmark::kMicrosecond)
->Args({1024})
->Args({4096})
->Args({16384});
->Args({16384})
->Args({32768})
->Args({65536})
->Args({131072})
->Args({262144});
#endif

//=================================================================
Expand Down
9 changes: 6 additions & 3 deletions benchmark/bench-eltwise-fma-mod.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ static void BM_EltwiseFMAModAddNative(benchmark::State& state) { // NOLINT

BENCHMARK(BM_EltwiseFMAModAddNative)
->Unit(benchmark::kMicrosecond)
->ArgsProduct({{1024, 4096, 16384}, {false, true}});
->ArgsProduct({{1024, 4096, 16384, 32768, 65536, 131072, 262144},
{false, true}});

//=================================================================

Expand All @@ -62,7 +63,8 @@ static void BM_EltwiseFMAModAVX512DQ(benchmark::State& state) { // NOLINT

BENCHMARK(BM_EltwiseFMAModAVX512DQ)
->Unit(benchmark::kMicrosecond)
->ArgsProduct({{1024, 4096, 16384}, {false, true}});
->ArgsProduct({{1024, 4096, 16384, 32768, 65536, 131072, 262144},
{false, true}});
#endif

//=================================================================
Expand All @@ -87,7 +89,8 @@ static void BM_EltwiseFMAModAVX512IFMA(benchmark::State& state) { // NOLINT

BENCHMARK(BM_EltwiseFMAModAVX512IFMA)
->Unit(benchmark::kMicrosecond)
->ArgsProduct({{1024, 4096, 16384}, {false, true}});
->ArgsProduct({{1024, 4096, 16384, 32768, 65536, 131072, 262144},
{false, true}});

#endif

Expand Down
22 changes: 16 additions & 6 deletions benchmark/bench-eltwise-mult-mod.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,9 @@ static void BM_EltwiseMultMod(benchmark::State& state) { // NOLINT

BENCHMARK(BM_EltwiseMultMod)
->Unit(benchmark::kMicrosecond)
->ArgsProduct({{1024, 4096, 16384}, {48, 60}, {1, 2, 4}});
->ArgsProduct({{1024, 4096, 16384, 32768, 65536, 131072, 262144},
{48, 60},
{1, 2, 4}});

//=================================================================

Expand All @@ -61,7 +63,11 @@ BENCHMARK(BM_EltwiseMultModNative)
->Unit(benchmark::kMicrosecond)
->Args({1024})
->Args({4096})
->Args({16384});
->Args({16384})
->Args({32768})
->Args({65536})
->Args({131072})
->Args({262144});

//=================================================================

Expand Down Expand Up @@ -97,7 +103,8 @@ static void BM_EltwiseMultModAVX512Float(benchmark::State& state) { // NOLINT

BENCHMARK(BM_EltwiseMultModAVX512Float)
->Unit(benchmark::kMicrosecond)
->ArgsProduct({{1024, 4096, 16384}, {1, 2, 4}});
->ArgsProduct({{1024, 4096, 16384, 32768, 65536, 131072, 262144},
{1, 2, 4}});
#endif

//=================================================================
Expand Down Expand Up @@ -134,7 +141,8 @@ static void BM_EltwiseMultModAVX512DQInt(benchmark::State& state) { // NOLINT

BENCHMARK(BM_EltwiseMultModAVX512DQInt)
->Unit(benchmark::kMicrosecond)
->ArgsProduct({{1024, 4096, 16384}, {1, 2, 4}});
->ArgsProduct({{1024, 4096, 16384, 32768, 65536, 131072, 262144},
{1, 2, 4}});
#endif

#ifdef HEXL_HAS_AVX512IFMA
Expand Down Expand Up @@ -172,7 +180,8 @@ static void BM_EltwiseMultModAVX512IFMAInt(

BENCHMARK(BM_EltwiseMultModAVX512IFMAInt)
->Unit(benchmark::kMicrosecond)
->ArgsProduct({{1024, 4096, 16384}, {1, 2, 4}});
->ArgsProduct({{1024, 4096, 16384, 32768, 65536, 131072, 262144},
{1, 2, 4}});
#endif

//=================================================================
Expand Down Expand Up @@ -216,7 +225,8 @@ static void BM_EltwiseMultModMontAVX512IFMAIntEConv(

BENCHMARK(BM_EltwiseMultModMontAVX512IFMAIntEConv)
->Unit(benchmark::kMicrosecond)
->ArgsProduct({{1024, 4096, 16384}, {1, 2, 4}});
->ArgsProduct({{1024, 4096, 16384, 32768, 65536, 131072, 262144},
{1, 2, 4}});

#endif

Expand Down
Loading