Glossary

Note: This is not an official definition. It should be a description / explanation of the terms used in the context of working together on https://github.com/mlcommons/mobile_app_open, so that all contributors have the same understanding of a term.

Term	Description
accelerator	A computer hardware which improves the speed of inference.
throughput	Number of inference queries per second.
latency	Duration of one inference query.
accuracy	The accuracy of a benchmark. The unit of measurements depends on the type of benchmark. For example, mAP is used for object detection task and F1 score is used for language processing task.
backend	Vendor-specific code for running benchmark.
performance mode	In performance mode, the app measures only the throughput.
accuracy mode	In accuracy mode, the app measures the throughput and the accuracy.
LoadGen (Load Generator)	LoadGen is a library that provides a reliable way to measure performance. It allows the app to run tests using different scenarios: single-stream, multistream, server, and offline (not all of them are actually used in this app). It collects information for logging, debugging, and postprocessing the data. It records queries and responses from the system under test, and at the end of the run, it reports statistics, summarizes the results, and determines whether the run was valid. You can read more about LoadGen in the paper it was originally introduced in: https://arxiv.org/pdf/1911.02549.pdf or take a look at the code in https://github.com/mlcommons/inference/tree/master/loadgen
single stream scenario	The single-stream scenario represents one inference-query stream with a query sample size of 1, reflecting the many client applications where responsiveness is critical. An example is offline voice transcription on Google’s Pixel 4 smartphone. To measure performance, we inject a single query into the inference system; when the query is complete, we record the completion time and inject the next query. The metric is the query stream’s 90th-percentile latency. Source: https://arxiv.org/pdf/1911.02549.pdf Note: In the MLPerf Mobile app we converted the latency to throughput to display it as score.
offline scenario	The offline scenario represents batch-processing applications where all data is immediately available and latency is unconstrained. An example is identifying the people and locations in a photo album. For this scenario, we send a single query that includes all sample-data IDs to be processed, and the system is free to process the input data in any order. Similar to the multistream scenario, neighboring samples in the query are contiguous in memory. The metric for the offline scenario is throughput measured in samples per second. Source: https://arxiv.org/pdf/1911.02549.pdf

Helpful links:

Mobile Inference Results: https://mlcommons.org/benchmarks/inference-mobile/
MLPerf Inference Rules: https://github.com/mlcommons/inference_policies/blob/master/inference_rules.adoc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

glossary.md

glossary.md

Glossary

Files

glossary.md

Latest commit

History

glossary.md

File metadata and controls

Glossary