dbarbuzzi triggered nightly on refs/heads/fix-nightly-benchmarks · neuralmagic/nm-vllm@8724164

# :warning: **Performance Alert** :warning: Possible performance regression was detected for benchmark **'bigger_is_better'**. Benchmark result of this commit is worse than the previous benchmark result exceeding threshold `1.10`. | Benchmark suite | Current: 8724164ef7985e9f21d69926ad039af0026343ee | Previous: df1f1a00d1fb111ef035ac385fafa38b5ed34488 | Ratio | |-|-|-|-| | `{"name": "output_throughput", "description": "VLLM Serving - Dense\nmodel - NousResearch/Llama-2-7b-chat-hf\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"300,1\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.2.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.2.1+cu121"}` | `162.87363002805006` tokens/s | `210.03681305793805` tokens/s | `1.29` | | `{"name": "output_throughput", "description": "VLLM Serving - Dense\nmodel - teknium/OpenHermes-2.5-Mistral-7B\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"300,1\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.2.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.2.1+cu121"}` | `144.6488418619174` tokens/s | `204.53062386991687` tokens/s | `1.41` | | `{"name": "output_throughput", "description": "VLLM Serving - Dense\nmodel - teknium/OpenHermes-2.5-Mistral-7B\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"150,0.5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.2.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.2.1+cu121"}` | `83.51937308982416` tokens/s | `115.03923920393221` tokens/s | `1.38` | | `{"name": "output_throughput", "description": "VLLM Serving - Dense\nmodel - TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"150,0.5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.2.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.2.1+cu121"}` | `81.2149336918379` tokens/s | `121.99375753076185` tokens/s | `1.50` | | `{"name": "output_throughput", "description": "VLLM Serving - 2:4 Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned2.4\nmax-model-len - 4096\nsparsity - semi_structured_sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"150,0.5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.2.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.2.1+cu121"}` | `58.57792575288278` tokens/s | `99.23863938310227` tokens/s | `1.69` | | `{"name": "output_throughput", "description": "VLLM Serving - Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned50\nmax-model-len - 4096\nsparsity - sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"750,2.5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.2.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.2.1+cu121"}` | `337.2733847194903` tokens/s | `539.7524824536448` tokens/s | `1.60` | | `{"name": "output_throughput", "description": "VLLM Serving - 2:4 Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned2.4\nmax-model-len - 4096\nsparsity - semi_structured_sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.2.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.2.1+cu121"}` | `558.5572659111468` tokens/s | `823.5844999041368` tokens/s | `1.47` | | `{"name": "output_throughput", "description": "VLLM Serving - Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned50\nmax-model-len - 4096\nsparsity - sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"300,1\",\n \"datase

BUILD-TEST / TEST-SOLO / TEST

This self-hosted runner is currently using runner version 2.314.1. This version is out of date. Please update to the latest version 2.316.1

BUILD-TEST / TEST-MULTI / TEST

This self-hosted runner is currently using runner version 2.314.1. This version is out of date. Please update to the latest version 2.316.1

BUILD-TEST / BENCHMARK / BENCHMARK

This self-hosted runner is currently using runner version 2.314.1. This version is out of date. Please update to the latest version 2.316.1

BUILD-TEST / BENCHMARK / BENCHMARK_REPORT

Performance alert! Previous value was 210.03681305793805 and current value is 162.87363002805006. It is 1.2895691771698437x worse than previous exceeding a ratio threshold 1.1

BUILD-TEST / BENCHMARK / BENCHMARK_REPORT

Performance alert! Previous value was 204.53062386991687 and current value is 144.6488418619174. It is 1.4139803764565426x worse than previous exceeding a ratio threshold 1.1

BUILD-TEST / BENCHMARK / BENCHMARK_REPORT

Performance alert! Previous value was 115.03923920393221 and current value is 83.51937308982416. It is 1.37739586574972x worse than previous exceeding a ratio threshold 1.1

BUILD-TEST / BENCHMARK / BENCHMARK_REPORT

Performance alert! Previous value was 121.99375753076185 and current value is 81.2149336918379. It is 1.5021099197550933x worse than previous exceeding a ratio threshold 1.1

BUILD-TEST / BENCHMARK / BENCHMARK_REPORT

Performance alert! Previous value was 99.23863938310227 and current value is 58.57792575288278. It is 1.6941303077502443x worse than previous exceeding a ratio threshold 1.1

BUILD-TEST / BENCHMARK / BENCHMARK_REPORT

Performance alert! Previous value was 539.7524824536448 and current value is 337.2733847194903. It is 1.6003411680484543x worse than previous exceeding a ratio threshold 1.1

BUILD-TEST / BENCHMARK / BENCHMARK_REPORT

Performance alert! Previous value was 823.5844999041368 and current value is 558.5572659111468. It is 1.4744853395840518x worse than previous exceeding a ratio threshold 1.1

BUILD-TEST / BENCHMARK / BENCHMARK_REPORT

Performance alert! Previous value was 212.27272149892232 and current value is 133.66477585821556. It is 1.5880976879360487x worse than previous exceeding a ratio threshold 1.1

BUILD-TEST / BENCHMARK / BENCHMARK_REPORT

Performance alert! Previous value was 509.00145060690625 and current value is 389.0296308435383. It is 1.308387357290064x worse than previous exceeding a ratio threshold 1.1

BUILD-TEST / BENCHMARK / BENCHMARK_REPORT

Performance alert! Previous value was 118.04199658885149 and current value is 94.64801975693179. It is 1.2471681593761648x worse than previous exceeding a ratio threshold 1.1

Artifacts

Produced during runtime

Name	Size
3.10.12-nm-vllm-0.2.0.tar.gz Expired	527 KB
9008903436-aws-avx2-32G-a10g-24G Expired	124 KB
cc-vllm-html-aws-avx2-192G-4-a10g-96G Expired	2.05 MB
cc-vllm-html-aws-avx2-32G-a10g-24G Expired	2.05 MB
gh_action_benchmark_jsons-9008903436-aws-avx2-32G-a10g-24G Expired	28.7 KB
nm_vllm-0.2.0-cp310-cp310-manylinux_2_17_x86_64.whl Expired	75.9 MB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dbarbuzzi triggered nightly on refs/heads/fix-nightly-benchmarks #108

Summary

dbarbuzzi triggered nightly on refs/heads/fix-nightly-benchmarks #108

Jobs

Run details

nightly.yml

Annotations

Artifacts