QServe supports Qwen2.5 with TensorRT-LLM #50

limertang · 2025-01-02T07:58:30Z

Hi，
Now，TensorRT-LLM support QServe only for llama，will you enable it for other models？such as qwen2.5.

Thank you！

ys-2020 · 2025-01-17T20:14:18Z

Hi. We haven't implemented this feature yet. We may consider to implement it in the future. Thank you for your interests in QServe!

nzarif · 2025-01-23T01:04:19Z

Hi,
I want to up vote this feature request. It would also be great if you can provide instructions on how we may be able to generate the quantized checkpoint for Qwen2.5 to be used by trtllm-build.

limertang · 2025-01-24T01:50:01Z

Tensorrt_llm_ v0.16 released by Nvidia haven't have the feature of converting qwen(quantized by deepcompressor) to checkpoint,but I modify the code and it can convert qwen quantized by deepcompressor to ckt successfully, it is also successful by trtllm-build.

The inference result of qwen1.5-7b is right,but the inference result of qwen2 and qwen2.5 is wrong, so, I don't know the reason, could you please help me to fix it?
I test benchmark with qserve_benchmark.py or qserve_e2e_generation.py,it will throw a error:Unsupported model type: QWen2ForCausalLM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QServe supports Qwen2.5 with TensorRT-LLM #50

QServe supports Qwen2.5 with TensorRT-LLM #50

limertang commented Jan 2, 2025

ys-2020 commented Jan 17, 2025

nzarif commented Jan 23, 2025

limertang commented Jan 24, 2025

QServe supports Qwen2.5 with TensorRT-LLM #50

QServe supports Qwen2.5 with TensorRT-LLM #50

Comments

limertang commented Jan 2, 2025

ys-2020 commented Jan 17, 2025

nzarif commented Jan 23, 2025

limertang commented Jan 24, 2025