Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

dbarbuzzi triggered nightly on refs/heads/fix-nightly-benchmarks #108

dbarbuzzi triggered nightly on refs/heads/fix-nightly-benchmarks

dbarbuzzi triggered nightly on refs/heads/fix-nightly-benchmarks #108

Manually triggered May 8, 2024 21:39
Status Failure
Total duration 6h 14m 30s
Artifacts 6

nightly.yml

on: workflow_dispatch
BUILD-TEST  /  ...  /  BENCHMARK
5h 51m
BUILD-TEST / BENCHMARK / BENCHMARK
BUILD-TEST  /  ...  /  TEST
5h 6m
BUILD-TEST / TEST-SOLO / TEST
BUILD-TEST  /  ...  /  TEST
5h 18m
BUILD-TEST / TEST-MULTI / TEST
BUILD-TEST  /  ...  /  BENCHMARK_REPORT
27s
BUILD-TEST / BENCHMARK / BENCHMARK_REPORT
BUILD-TEST  /  ...  /  PUBLISH
6s
BUILD-TEST / PUBLISH / PUBLISH
Fit to window
Zoom out
Zoom in

Annotations

1 error and 13 warnings
BUILD-TEST / BENCHMARK / BENCHMARK_REPORT
# :warning: **Performance Alert** :warning: Possible performance regression was detected for benchmark **'bigger_is_better'**. Benchmark result of this commit is worse than the previous benchmark result exceeding threshold `1.10`. | Benchmark suite | Current: 8724164ef7985e9f21d69926ad039af0026343ee | Previous: df1f1a00d1fb111ef035ac385fafa38b5ed34488 | Ratio | |-|-|-|-| | `{"name": "output_throughput", "description": "VLLM Serving - Dense\nmodel - NousResearch/Llama-2-7b-chat-hf\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"300,1\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.2.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.2.1+cu121"}` | `162.87363002805006` tokens/s | `210.03681305793805` tokens/s | `1.29` | | `{"name": "output_throughput", "description": "VLLM Serving - Dense\nmodel - teknium/OpenHermes-2.5-Mistral-7B\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"300,1\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.2.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.2.1+cu121"}` | `144.6488418619174` tokens/s | `204.53062386991687` tokens/s | `1.41` | | `{"name": "output_throughput", "description": "VLLM Serving - Dense\nmodel - teknium/OpenHermes-2.5-Mistral-7B\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"150,0.5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.2.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.2.1+cu121"}` | `83.51937308982416` tokens/s | `115.03923920393221` tokens/s | `1.38` | | `{"name": "output_throughput", "description": "VLLM Serving - Dense\nmodel - TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"150,0.5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.2.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.2.1+cu121"}` | `81.2149336918379` tokens/s | `121.99375753076185` tokens/s | `1.50` | | `{"name": "output_throughput", "description": "VLLM Serving - 2:4 Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned2.4\nmax-model-len - 4096\nsparsity - semi_structured_sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"150,0.5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.2.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.2.1+cu121"}` | `58.57792575288278` tokens/s | `99.23863938310227` tokens/s | `1.69` | | `{"name": "output_throughput", "description": "VLLM Serving - Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned50\nmax-model-len - 4096\nsparsity - sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"750,2.5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.2.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.2.1+cu121"}` | `337.2733847194903` tokens/s | `539.7524824536448` tokens/s | `1.60` | | `{"name": "output_throughput", "description": "VLLM Serving - 2:4 Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned2.4\nmax-model-len - 4096\nsparsity - semi_structured_sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.2.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.2.1+cu121"}` | `558.5572659111468` tokens/s | `823.5844999041368` tokens/s | `1.47` | | `{"name": "output_throughput", "description": "VLLM Serving - Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned50\nmax-model-len - 4096\nsparsity - sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"300,1\",\n \"datase
BUILD-TEST / TEST-SOLO / TEST
This self-hosted runner is currently using runner version 2.314.1. This version is out of date. Please update to the latest version 2.316.1
BUILD-TEST / TEST-MULTI / TEST
This self-hosted runner is currently using runner version 2.314.1. This version is out of date. Please update to the latest version 2.316.1
BUILD-TEST / BENCHMARK / BENCHMARK
This self-hosted runner is currently using runner version 2.314.1. This version is out of date. Please update to the latest version 2.316.1
BUILD-TEST / BENCHMARK / BENCHMARK_REPORT
Performance alert! Previous value was 210.03681305793805 and current value is 162.87363002805006. It is 1.2895691771698437x worse than previous exceeding a ratio threshold 1.1
BUILD-TEST / BENCHMARK / BENCHMARK_REPORT
Performance alert! Previous value was 204.53062386991687 and current value is 144.6488418619174. It is 1.4139803764565426x worse than previous exceeding a ratio threshold 1.1
BUILD-TEST / BENCHMARK / BENCHMARK_REPORT
Performance alert! Previous value was 115.03923920393221 and current value is 83.51937308982416. It is 1.37739586574972x worse than previous exceeding a ratio threshold 1.1
BUILD-TEST / BENCHMARK / BENCHMARK_REPORT
Performance alert! Previous value was 121.99375753076185 and current value is 81.2149336918379. It is 1.5021099197550933x worse than previous exceeding a ratio threshold 1.1
BUILD-TEST / BENCHMARK / BENCHMARK_REPORT
Performance alert! Previous value was 99.23863938310227 and current value is 58.57792575288278. It is 1.6941303077502443x worse than previous exceeding a ratio threshold 1.1
BUILD-TEST / BENCHMARK / BENCHMARK_REPORT
Performance alert! Previous value was 539.7524824536448 and current value is 337.2733847194903. It is 1.6003411680484543x worse than previous exceeding a ratio threshold 1.1
BUILD-TEST / BENCHMARK / BENCHMARK_REPORT
Performance alert! Previous value was 823.5844999041368 and current value is 558.5572659111468. It is 1.4744853395840518x worse than previous exceeding a ratio threshold 1.1
BUILD-TEST / BENCHMARK / BENCHMARK_REPORT
Performance alert! Previous value was 212.27272149892232 and current value is 133.66477585821556. It is 1.5880976879360487x worse than previous exceeding a ratio threshold 1.1
BUILD-TEST / BENCHMARK / BENCHMARK_REPORT
Performance alert! Previous value was 509.00145060690625 and current value is 389.0296308435383. It is 1.308387357290064x worse than previous exceeding a ratio threshold 1.1
BUILD-TEST / BENCHMARK / BENCHMARK_REPORT
Performance alert! Previous value was 118.04199658885149 and current value is 94.64801975693179. It is 1.2471681593761648x worse than previous exceeding a ratio threshold 1.1

Artifacts

Produced during runtime
Name Size
3.10.12-nm-vllm-0.2.0.tar.gz Expired
527 KB
9008903436-aws-avx2-32G-a10g-24G Expired
124 KB
cc-vllm-html-aws-avx2-192G-4-a10g-96G Expired
2.05 MB
cc-vllm-html-aws-avx2-32G-a10g-24G Expired
2.05 MB
gh_action_benchmark_jsons-9008903436-aws-avx2-32G-a10g-24G Expired
28.7 KB
nm_vllm-0.2.0-cp310-cp310-manylinux_2_17_x86_64.whl Expired
75.9 MB