Hardware: HP laptop, Intel(R) Core(TM) i5-8300H CPU @ 2.30GHz, 16 GB RAM
Software: Windows 10, MinGW GCC 11.3.0, MSVC 2022
Time unit: milliseconds (unless explicitly specified)
Unless it's specified, the default compiler is GCC.
The hardware used for benchmark is pretty medium to low end at the time of benchmarking (December 2023).
Iterations | Queue size | Event count | Event Types | Listener count | Time of single threading | Time of multi threading |
---|---|---|---|---|---|---|
100k | 100 | 10M | 100 | 100 | 289 | 939 |
100k | 1000 | 100M | 100 | 100 | 2822 | 9328 |
100k | 1000 | 100M | 1000 | 1000 | 2923 | 9502 |
Given eventpp::EventQueue<size_t, void (size_t), Policies>
, which Policies
is either single threading or multi threading, the benchmark adds Listener count
listeners to the queue, each listener is an empty lambda. Then the benchmark starts timing. It loops Iterations
times. In each loop, the benchmark puts Queue size
events, then process the event queue.
There are Event types
kinds of event type. Event count
is Iterations * Queue size
.
The EventQueue is processed in one thread. The Single/Multi threading in the table means the policies used.
Mutex | Enqueue threads | Process threads | Event count | Event Types | Listener count | Time |
---|---|---|---|---|---|---|
std::mutex | 1 | 1 | 10M | 100 | 100 | 1824 |
SpinLock | 1 | 1 | 10M | 100 | 100 | 1303 |
std::mutex | 1 | 3 | 10M | 100 | 100 | 2989 |
SpinLock | 1 | 3 | 10M | 100 | 100 | 3186 |
std::mutex | 2 | 2 | 10M | 100 | 100 | 3151 |
SpinLock | 2 | 2 | 10M | 100 | 100 | 3049 |
std::mutex | 4 | 4 | 10M | 100 | 100 | 1657 |
SpinLock | 4 | 4 | 10M | 100 | 100 | 1659 |
std::mutex | 16 | 16 | 10M | 100 | 100 | 708 |
SpinLock | 16 | 16 | 10M | 100 | 100 | 1891 |
There are Enqueue threads
threads enqueuing events to the queue, and Process threads
threads processing the events. The total event count is Event count
. Mutex
is the mutex type used to protect the data.
The multi threading version shows slower than previous single threading version, since the mutex locks cost time.
When there are fewer threads (about around the number of CPU cores which is 4 here), eventpp::SpinLock
has better performance than std::mutex
. But there are much more threads than CPU cores (here is 16 enqueue threads and 16 process threads), eventpp::SpinLock
has worse performance than std::mutex
.
The benchmark loops 100K times, in each loop it appends 1000 empty callbacks to a CallbackList, then remove all that 1000 callbacks. So there are totally 100M append/remove operations.
The total benchmarked time is about 16000 milliseconds. That's to say in 1 milliseconds there can be 6000 append/remove operations.
Iterations: 100,000,000
Function | Compiler | Native invoking | CallbackList single threading | CallbackList multi threading |
---|---|---|---|---|
Inline global function | MSVC | 139 | 1267 | 3058 |
GCC | 141 | 1149 | 2563 | |
Non-inline global function | MSVC | 143 | 1273 | 3047 |
GCC | 132 | 1218 | 2583 | |
Function object | MSVC | 139 | 1198 | 2993 |
GCC | 141 | 1107 | 2633 | |
Member virtual function | MSVC | 159 | 1221 | 3076 |
GCC | 140 | 1231 | 2691 | |
Member non-virtual function | MSVC | 140 | 1266 | 3054 |
GCC | 140 | 1193 | 2701 | |
Member non-inline virtual function | MSVC | 158 | 1223 | 3103 |
GCC | 133 | 1231 | 2676 | |
Member non-inline non-virtual function | MSVC | 134 | 1266 | 3028 |
GCC | 134 | 1205 | 2652 | |
All functions | MSVC | 91 | 903 | 2214 |
GCC | 89 | 858 | 1852 |
Testing functions
#if defined(_MSC_VER)
#define NON_INLINE __declspec(noinline)
#else
// gcc
#define NON_INLINE __attribute__((noinline))
#endif
volatile int globalValue = 0;
void globalFunction(int a, const int b)
{
globalValue += a + b;
}
NON_INLINE void nonInlineGlobalFunction(int a, const int b)
{
globalValue += a + b;
}
struct FunctionObject
{
void operator() (int a, const int b)
{
globalValue += a + b;
}
virtual void virFunc(int a, const int b)
{
globalValue += a + b;
}
void nonVirFunc(int a, const int b)
{
globalValue += a + b;
}
NON_INLINE virtual void nonInlineVirFunc(int a, const int b)
{
globalValue += a + b;
}
NON_INLINE void nonInlineNonVirFunc(int a, const int b)
{
globalValue += a + b;
}
};
#undef NON_INLINE