You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I want to up vote this feature request. It would also be great if you can provide instructions on how we may be able to generate the quantized checkpoint for Qwen2.5 to be used by trtllm-build.
Tensorrt_llm_ v0.16 released by Nvidia haven't have the feature of converting qwen(quantized by deepcompressor) to checkpoint,but I modify the code and it can convert qwen quantized by deepcompressor to ckt successfully, it is also successful by trtllm-build.
The inference result of qwen1.5-7b is right,but the inference result of qwen2 and qwen2.5 is wrong, so, I don't know the reason, could you please help me to fix it?
I test benchmark with qserve_benchmark.py or qserve_e2e_generation.py,it will throw a error:Unsupported model type: QWen2ForCausalLM
Hi,
Now,TensorRT-LLM support QServe only for llama,will you enable it for other models?such as qwen2.5.
Thank you!
The text was updated successfully, but these errors were encountered: