You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, Penryn lacks the rdtscp instruction. It can use rdtsc instead. Otherwise, it gets a bad instruction issue on the benchmark. Despite this, it seems the benchmark is nonfunctional anyways. :(
In addition, HighwayHash64 seems excessively slow on my (admittedly old) chip compared to other hashes.
xxhsum benchmark (100 KB)
gcc 8.2.0 gcc-8 -O2 -march=native
MacBook (13-inch, Mid 2009)/Macbook5,2
2.13 GHz Intel Core 2 Duo (Penryn, SSE4.1, P7450)
macOS 10.13.6 with High Sierra Patcher
4 GB RAM
Note that the Core 2 Duo has a slow multiplier, which takes twice as many cycles as it does for newer Intels. It is the main slowdown for the xxHash family, as replacing multiplies with xors gets it to the upper 5700s (it is ineffective as a hash, though). It also doesn't seem to have fast 64x2 vectors. GCC appears to do operations with 2 32-bit lanes, which is another slowdown.
I mostly want to bring this to attention, because I definitely was disappointed after the effort to make it compile.
The text was updated successfully, but these errors were encountered:
@easyaspi314 sorry to hear about the disappointing result. I'm surprised pmuludq was twice as slow on Conroe (assuming that is the Core 2 Duo in question?). According to uops.info, it's 3 cycle latency, as on Nehalem.
Compiler codegen is indeed a concern, we've seen Clang do a better job with intrinsics.
Note that HighwayHash is intended as a MAC (larger state and no trivially reversible operations), hence it is not comparable to other faster hashes.
First of all, Penryn lacks the
rdtscp
instruction. It can userdtsc
instead. Otherwise, it gets a bad instruction issue on the benchmark. Despite this, it seems the benchmark is nonfunctional anyways. :(In addition, HighwayHash64 seems excessively slow on my (admittedly old) chip compared to other hashes.
xxhsum benchmark (100 KB)
gcc 8.2.0
gcc-8 -O2 -march=native
MacBook (13-inch, Mid 2009)/Macbook5,2
2.13 GHz Intel Core 2 Duo (Penryn, SSE4.1, P7450)
macOS 10.13.6 with High Sierra Patcher
4 GB RAM
vector_size(16)
lanes)vector_size(16)
lanes)Note that the Core 2 Duo has a slow multiplier, which takes twice as many cycles as it does for newer Intels. It is the main slowdown for the xxHash family, as replacing multiplies with xors gets it to the upper 5700s (it is ineffective as a hash, though). It also doesn't seem to have fast 64x2 vectors. GCC appears to do operations with 2 32-bit lanes, which is another slowdown.
I mostly want to bring this to attention, because I definitely was disappointed after the effort to make it compile.
The text was updated successfully, but these errors were encountered: