The latest version of LLMPerf brings a suite of significant updates designed to provide more in-depth and customizable benchmarking capabilities for LLM inference. These updates include:
- Expanded metrics with quantile distribution (P25-99): Comprehensive data representation for deeper insights.
- Customizable benchmarking parameters: Tailor parameters to fit specific use case scenarios.
- Introduction of load test and correctness test: Assessing performance and accuracy under stress.
- Broad compatibility: Supports a range of products including Anyscale Endpoints, OpenAI, Anthropic, together.ai, Fireworks.ai, Perplexity, Huggingface, Lepton AI, and various APIs supported by the LiteLLM project).
- Easy addition of new LLMs via the LLMClient API.
The old LLMPerf code base can be found in the llmperf-legacy repo.