-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize PFCOUNT, PFMERGE command by SIMD acceleration #1293
base: unstable
Are you sure you want to change the base?
Conversation
8bcf1ae
to
f730f91
Compare
Signed-off-by: Xuyang Wang <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #1293 +/- ##
============================================
+ Coverage 70.69% 70.70% +0.01%
============================================
Files 114 115 +1
Lines 63161 63233 +72
============================================
+ Hits 44650 44710 +60
- Misses 18511 18523 +12
|
Signed-off-by: Xuyang Wang <[email protected]>
How was this change tested, particularly to confirm that the non-AVX2 and AVX2 implementations produce the same results? |
The algorithms are verified by comparing the results between scalar code and simd code with random input. |
Signed-off-by: Xuyang Wang <[email protected]>
This is so cool! 😎
I think we need to test this in our repo in some way. The binary representation can't change, because things like replicas will not understand it, so we should verify the binary representation. A hyperloglog key is actually a string so we can use the GET command to get the binary representation. In a TCL test case, we can use GET and compare the reply to the binary data we have stored earlier. Alternatively we can use DUMP. Can you add it? |
@lipzhu You're the performance expert. Do you want to review this? |
This PR optimizes the performance of HyperLogLog commands (PFCOUNT, PFMERGE) by adding AVX2 fast paths.
Two AVX2 functions are added for conversion between raw representation and dense representation. They are 15 ~ 30 times faster than scalar implementaion. Note that sparse representation is not accelerated.
AVX2 fast paths are enabled when the CPU supports AVX2 (checked at runtime) and the hyperloglog configuration is default (HLL_REGISTERS == 16384 && HLL_BITS == 6).
When merging 3 dense hll structures, the benchmark shows a 12x speedup compared to the scalar version.
Experiment repo: https://github.com/Nugine/redis-hyperloglog
Benchmark script: https://github.com/Nugine/redis-hyperloglog/blob/main/scripts/memtier.sh
Algorithm: https://github.com/Nugine/redis-hyperloglog/blob/main/cpp/bench.cpp