[Usage]: How can I use LLMEngine to perform distributed inference for multimodal large models, such as Qwen-VL? #12305

frederichen01 · 2025-01-22T07:18:13Z

Your current environment

infer LLM, the code is :
engine = LLMEngine.from_engine_args(engine_args)
prompt_token_ids = tokenizer.encode(
template.format(input_text=item["prompt"].strip()),
max_length=8192,
truncation=True
)
engine.add_request(request_id=, prompt={'prompt_token_ids': prompt_token_ids}, params=sampling_params)

In MLLM, Can you provide an example?

How would you like to use vllm

I want to run inference of a Qwen-VL. I don't know how to integrate it with vllm.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Hyunnicolou · 2025-01-23T06:33:18Z

Please read VLLM tutorial: https://docs.vllm.ai/en/latest/serving/multimodal_inputs.html

frederichen01 added the usage How to use vllm label Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: How can I use LLMEngine to perform distributed inference for multimodal large models, such as Qwen-VL? #12305

[Usage]: How can I use LLMEngine to perform distributed inference for multimodal large models, such as Qwen-VL? #12305

frederichen01 commented Jan 22, 2025

Hyunnicolou commented Jan 23, 2025

[Usage]: How can I use LLMEngine to perform distributed inference for multimodal large models, such as Qwen-VL? #12305

[Usage]: How can I use LLMEngine to perform distributed inference for multimodal large models, such as Qwen-VL? #12305

Comments

frederichen01 commented Jan 22, 2025

Your current environment

How would you like to use vllm

Before submitting a new issue...

Hyunnicolou commented Jan 23, 2025