Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QServe supports Qwen2.5 with TensorRT-LLM #50

Open
limertang opened this issue Jan 2, 2025 · 3 comments
Open

QServe supports Qwen2.5 with TensorRT-LLM #50

limertang opened this issue Jan 2, 2025 · 3 comments

Comments

@limertang
Copy link

Hi,
Now,TensorRT-LLM support QServe only for llama,will you enable it for other models?such as qwen2.5.

Thank you!

@ys-2020
Copy link
Contributor

ys-2020 commented Jan 17, 2025

Hi. We haven't implemented this feature yet. We may consider to implement it in the future. Thank you for your interests in QServe!

@nzarif
Copy link

nzarif commented Jan 23, 2025

Hi,
I want to up vote this feature request. It would also be great if you can provide instructions on how we may be able to generate the quantized checkpoint for Qwen2.5 to be used by trtllm-build.

@limertang
Copy link
Author

Tensorrt_llm_ v0.16 released by Nvidia haven't have the feature of converting qwen(quantized by deepcompressor) to checkpoint,but I modify the code and it can convert qwen quantized by deepcompressor to ckt successfully, it is also successful by trtllm-build.

  1. The inference result of qwen1.5-7b is right,but the inference result of qwen2 and qwen2.5 is wrong, so, I don't know the reason, could you please help me to fix it?
  2. I test benchmark with qserve_benchmark.py or qserve_e2e_generation.py,it will throw a error:Unsupported model type: QWen2ForCausalLM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants