You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Supposing your hash values are uniformly distributed "random" values in the range of 0..2**64-1, you can skip the modulo and instead use random-number scaling techniques to distribute the result (the bucket number) evenly throughout that space. I don't think there's any reason to stick with modulo, is there?
The quickest-and-dirtiest of these would be:
size_tbucket(uint64_t hash) {
returnuint64_t(uint128_t(hash) * K >> 64);
}
But if the incoming hashes are not already uniform then try this one weird trick:
size_tbucket(uint64_t hash) {
crc32(hash) * K >> 32;
}
(which only works when K is less than 2**32), and you'll have to find the right intrinsic for the target to get the hardware-accelerated CRC instruction)
This one is more thorough:
staticinlineuint64_tmurmurmix64(uint64_t h) {
h ^= h >> 33;
h *= 0xff51afd7ed558ccdULL;
h ^= h >> 33;
h *= 0xc4ceb9fe1a85ec53ULL;
h ^= h >> 33;
return h;
}
size_tbucket(uint64_t hash) {
hash = murmurmix64(hash);
returnuint64_t(uint128_t(hash) * K >> 64);
}
And this one is probably fine, covers up to 64-bit ranges, and is probably still faster than mod by a constant:
Plus it probably makes no difference if K is not constant, and so you can avoid calling via a function pointer and just load the table size into a register and perform the multiply by that inline.
The text was updated successfully, but these errors were encountered:
This returns a bucket number for the current table, and if things go wrong and you feel like trying a different hash -- even if it's a different length -- just call it again! It has also updated the input hash to give the unused entropy from the original hash properly distributed across the next range.
Constraints are:
original input hashes must be uniformly distributed across all 2**64 possible values
Product of nbuckets values for all calls on the same hash must be less than 2**64.
if all values of nbuckets are odd then the function will continue to hallucinate viable bucket choices even after it runs out of entropy (after the product exceeds 2**64)
If you don't trust the caller to provide uniformly distributed hashes (eg., where STL will just pass raw ints, or whatever) then you can precondition it with one of the mixing operations in the original post here.
Supposing your hash values are uniformly distributed "random" values in the range of 0..2**64-1, you can skip the modulo and instead use random-number scaling techniques to distribute the result (the bucket number) evenly throughout that space. I don't think there's any reason to stick with modulo, is there?
The quickest-and-dirtiest of these would be:
But if the incoming hashes are not already uniform then try this one weird trick:
(which only works when
K
is less than 2**32), and you'll have to find the right intrinsic for the target to get the hardware-accelerated CRC instruction)This one is more thorough:
And this one is probably fine, covers up to 64-bit ranges, and is probably still faster than mod by a constant:
Plus it probably makes no difference if
K
is not constant, and so you can avoid calling via a function pointer and just load the table size into a register and perform the multiply by that inline.The text was updated successfully, but these errors were encountered: