Skip to content

Latest commit

 

History

History
27 lines (21 loc) · 8.97 KB

glossary.md

File metadata and controls

27 lines (21 loc) · 8.97 KB

Glossary

Note: This is not an official definition. It should be a description / explanation of the terms used in the context of working together on https://github.com/mlcommons/mobile_app_open, so that all contributors have the same understanding of a term.

Term Description
accelerator A computer hardware which improves the speed of inference.
throughput Number of inference queries per second.
latency Duration of one inference query.
accuracy The accuracy of a benchmark. The unit of measurements depends on the type of benchmark.
For example, mAP is used for object detection task and F1 score is used for language processing task.
backend Vendor-specific code for running benchmark.
performance mode In performance mode, the app measures only the throughput.
accuracy mode In accuracy mode, the app measures the throughput and the accuracy.
LoadGen (Load Generator) LoadGen is a library that provides a reliable way to measure performance. It allows the app to run tests using different scenarios: single-stream, multistream, server, and offline (not all of them are actually used in this app). It collects information for logging, debugging, and postprocessing the data. It records queries and responses from the system under test, and at the end of the run, it reports statistics, summarizes the results, and determines whether the run was valid. You can read more about LoadGen in the paper it was originally introduced in: https://arxiv.org/pdf/1911.02549.pdf or take a look at the code in https://github.com/mlcommons/inference/tree/master/loadgen
single stream scenario The single-stream scenario represents one inference-query stream with a query sample size of 1, reflecting the many client applications where responsiveness is critical. An example is offline voice transcription on Google’s Pixel 4 smartphone. To measure performance, we inject a single query into the inference system; when the query is complete, we record the completion time and inject the next query. The metric is the query stream’s 90th-percentile latency.
Source: https://arxiv.org/pdf/1911.02549.pdf
Note: In the MLPerf Mobile app we converted the latency to throughput to display it as score.
offline scenario The offline scenario represents batch-processing applications where all data is immediately available and latency is unconstrained. An example is identifying the people and locations in a photo album. For this scenario, we send a single query that includes all sample-data IDs to be processed, and the system is free to process the input data in any order. Similar to the multistream scenario, neighboring samples in the query are contiguous in memory. The metric for the offline scenario is throughput measured in samples per second.
Source: https://arxiv.org/pdf/1911.02549.pdf

Helpful links: