[Bug] Performance benchmarks for FaqGen / DocSum are calculated incorrectly (we see negative numbers and very high values) #189

amikoai · 2024-11-05T15:50:32Z

We are adapting OPEA applications for the AMD platform and faced the issue that launching the eval tests give us negative numbers for Input Tokens per Second, Input Tokens. And also Tokens per Second is too high: ~25K.
We use TGI LLM engine.
Our process:
From evals/benchmark/ we modify benchmark.yaml (picture is attached)

Main settings:
we use ["faqgen"] in examples
deployment_type - should be set to "Docker," as we are using deployment via Docker.
service_port - The backend port where the service API is available, can be checked in the Docker Compose output when starting the service. Currently, 18881 is used for deployment with GPU, and 19888 for the service deployed on CPU.

To start the test, use the command:
`python benchmark.py

And here is result from the benchmark tests with strange numbers:

Can you please check that it works fine on your side?

The text was updated successfully, but these errors were encountered:

wangkl2 · 2024-12-23T08:13:18Z

@amikoai Thanks for reporting this issue. I can reproduce this faqgen benchmark issue and fix it with this commit 5d717e8. Please try again with the main branch and let us know if it works on your end.

joshuayao added the bug Something isn't working label Nov 8, 2024

joshuayao changed the title ~~Performance benchmarks for FaqGen / DocSum are calculated incorrectly (we see negative numbers and very high values)~~ [Bug] Performance benchmarks for FaqGen / DocSum are calculated incorrectly (we see negative numbers and very high values) Dec 6, 2024

joshuayao assigned lvliang-intel Dec 6, 2024

wangkl2 self-assigned this Dec 9, 2024

wangkl2 mentioned this issue Dec 20, 2024

[FaqGen] Fix the metrics parse and statistics for benchmark #215

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Performance benchmarks for FaqGen / DocSum are calculated incorrectly (we see negative numbers and very high values) #189

[Bug] Performance benchmarks for FaqGen / DocSum are calculated incorrectly (we see negative numbers and very high values) #189

amikoai commented Nov 5, 2024

wangkl2 commented Dec 23, 2024

[Bug] Performance benchmarks for FaqGen / DocSum are calculated incorrectly (we see negative numbers and very high values) #189

[Bug] Performance benchmarks for FaqGen / DocSum are calculated incorrectly (we see negative numbers and very high values) #189

Comments

amikoai commented Nov 5, 2024

wangkl2 commented Dec 23, 2024