cache line optimized CountMin4 #6

ben-manes · 2022-09-26T03:46:33Z

The original frequency sketch in Caffeine would distribute the counters uniformly across the table. This had the benefit of giving the greatest opportunity for avoiding hash collisions, which could increase the error rate for popularity estimates. However this also meant that each counter was at an unpredictable location and very likely to cause a hardware cache miss. This was somewhat mitigated due to the maintenance performing a batch of work, so temporal locality for popular items may be helpful.

The improved sketch uniformly selects a 64 byte block from which all of an item's counters reside. This is the size of the L1 cache line so only one memory access is needed. The counters are selected from 16 byte segments using a secondary hash. In my benchmarks this improves the frequency estimation and increment throughput by up to 2.5x. The eviction throughput is 17% faster, but the get/put is more difficult to observe since that the maintenance is non-blocking (observable on a low core CI, less so elsewhere).

Java does not yet support using SIMD instructions (preview), but that could be used for an additional speed. A .NET developer ported an earlier prototype where he bitfaster/BitFaster.Caching#271 the operation times were cut in half.

This is just for fun, feel free to close if not interested. It is unlikely that users will observe users a speedup, but was a fun optimization that I forgot to use.

While the unrolling prevents inlining (due to the syntactic complexity), it still results in a slight overall speedup. goos: linux goarch: amd64 pkg: github.com/dgryski/go-tinylfu cpu: AMD Ryzen 9 3950X 16-Core Processor │ old.bench │ new.bench │ │ sec/op │ sec/op vs base │ CMAddSaturated-32 5.182n ± 1% 4.562n ± 1% -11.95% (n=50) CMEstimate-32 5.478n ± 1% 4.566n ± 1% -16.64% (n=50) CMReset-32 4.031µ ± 1% 3.964µ ± 1% -1.66% (p=0.000 n=50) Get-32 21.73n ± 0% 20.31n ± 1% -6.53% (n=50) geomean 39.71n 35.99n -9.37%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cache line optimized CountMin4 #6

cache line optimized CountMin4 #6

ben-manes commented Sep 26, 2022

cache line optimized CountMin4 #6

cache line optimized CountMin4 #6

Comments

ben-manes commented Sep 26, 2022