Common benchmark methodology #92

fhoering · 2024-12-12T13:57:01Z

Criteo effectuated benchmarks on the key-value service.

As a result it seems like there is a serious risk that TEE technology and/or Privacy constraints would significantly increase ad techs infrastructure cost.

Here are some examples of potential drivers of infrastructure cost:

TEE ML inference
Inter process communication
ROMA side effect implementation
TEE encryption

We agreed in the WICG call from 04/12/2024 call that it looks important to be able to assess new features in a common way and be able to compare the results later. It would allow to be able to track improvements and/or regressions.

Ideally Chrome would:

define the hardware (some AWS/GCP instance type or both) and required server side setup (log level, TEST_MODE, B&A vs KV server)
provide a repo that contains the benchmarks (empty bidding script, simple getValue() lookup, large getValue() lookup, etc)
some script or config that deploys and executes everything in a standardized way
define the way metrics are measured or calculated

This would allow the community to comment and make proposals to the benchmarks and protocol. As it is, it’s every man for himself and chaotic.

As some starting point here is the methodology we used in our benchmarks.

Setup

We deployed 1 instance on the remote server and injected Gatling load tests with a fixed set of requests (~1000 reinjected all the time) until the server starts to fail (from 100 QPS until 100k QPS). We are interested in latency because we need to reply in ms time but we are also mainly interested in QPS because this provides a direct measure on how much hardware we need to pay to be able to handle the daily RTB requests that come in.

In the graph below one can see that at around ~5000 QPS the latency starts to increase significantly and the server starts to fail. So we report this metric in our final results. And then we compare different server side code deployments with this same methodology.

Hardware configuration

We used a KV service with 16 hyperthreaded cores and 16 GB of memory on our own on-premise container management platform. It seems preferable to find a reference configuration in the cloud for a comparable AWS and GCP deployment (like AWS c6i.2xlarge).

We deactivated all logs and have set the ROMA worker to the number of available cores (=16). Storage is accessed on disk to prevent an additional S3 dependency.

Here is an example cli on how it has been launched:

./server --delta_directory=/tmp/deltas --realtime_directory=/tmp/realtime --port=$PORT1 --route_v1_to_v2=true \
--logging_verbosity_level=0 --stderrthreshold=0 --udf_update_timeout=120s --udf_timeout=120s --udf_num_workers=16

Metrics

QPS queries per second => should be as high as possible
Latency in ms => should be under 10ms
CPU usage => we are mostly CPU bound , it allows to see that at the failing all CPUs have been really used in a efficient way
Memory => it mostly depends on the runtime overhead but also on the payload that is evaluated, in our case as we execute dummy payload this didn’t make a significant difference
Request errors => we mostly operate at a level where this number should be negligible

The text was updated successfully, but these errors were encountered:

fhoering mentioned this issue Dec 12, 2024

Efficient in-memory caching #94

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Common benchmark methodology #92

Common benchmark methodology #92

fhoering commented Dec 12, 2024

Common benchmark methodology #92

Common benchmark methodology #92

Comments

fhoering commented Dec 12, 2024

Setup

Hardware configuration

Metrics