Multimodal #66

anmarques · 2024-11-04T19:14:04Z

This PR adds support for benchmarking multimodal models.

It mostly extends existing infrastructure to add support to requests containing images. For emulated requests it downloads images from an illustrated version from Pride and Prejudice and randomly selects from them.

The load_images logic is currently limited to download from url. It should be extended to HF datasets or local files in the future.

I tested by running the following command:

guidellm --data="prompt_tokens=128,generated_tokens=128,images=1" --data-type emulated --model microsoft/Phi-3.5-vision-instruct --target "http://localhost:8000/v1" --max-seconds 20

On 2xA5000 I had to set max_concurrenty=4 to run this command due to memory limitations.

… url

anmarques added 8 commits November 4, 2024 19:07

Add class to describe image samples and loading logic for images from…

bb9bc0c

… url

Add class to describe image samples and loading logic for images from…

59002b5

… url

Add url used to download images from for emulated requests

cb1f244

Add support to images in requests

24e6527

quality fixes

3946709

Quality fixes

7d93b02

Quality fixes

a441dad

Quality fixes

570670b

anmarques requested a review from markurtz November 5, 2024 01:11

Add new dependencies

984da28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal #66

Multimodal #66

anmarques commented Nov 4, 2024 •

edited

Loading

Multimodal #66

Are you sure you want to change the base?

Multimodal #66

Conversation

anmarques commented Nov 4, 2024 • edited Loading

anmarques commented Nov 4, 2024 •

edited

Loading