Build Qwen2-72B-Instruct model by INT4-AWQ quantization failed #2445

wangpeilin · 2024-11-14T03:47:21Z

System Info

Ubuntu 20.04
NVIDIA A100
nvcr.io/nvidia/tritonserver:24.10-trtllm-python-py3 and 24.07
TensorRT-LLM v0.14.0 and v0.11.0

Who can help?

@Tracin

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

docker run -itd --name xxx --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v /share/datasets:/share/datasets nvcr.io/nvidia/tritonserver:24.10-trtllm-python-py3
code version is 0.14.0
git clone https://github.com/NVIDIA/TensorRT-LLM.git
git clone https://github.com/triton-inference-server/tensorrtllm_backend.git
cd TensorRT-LLM/examples
python3 ./quantization/quantize.py
--model_dir /path/Qwen_Qwen2-72B-Instruct/
--output_dir /path/Qwen_Qwen2-72B-Instruct_int4_awq_4gpu
--dtype bfloat16
--qformat int4_awq
--awq_block_size 128
--calib_size 32
--tp_size 4
trtllm-build
--checkpoint_dir /path/Qwen_Qwen2-72B-Instruct_int4_awq_4gpu/
--output_dir triton_model_repo/Qwen_Qwen2-72B-Instruct_int4_awq/tensorrt_llm/1/
--gemm_plugin auto

Expected behavior

success convert model to quantified checkpoint and TensorRT engines

actual behavior

when I set tp_size=4 and awq_block_size=128 or 64, it report errors "Weight shape is not divisible for block size for block quantization."
when I set tp_size=4 and awq_block_size=32 or 16, step3 quantize.py run success but trtllm-build failed which report error2.

error1

error2
RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: Number of bytes for rows and cols must be a multiple of 32. However, num_rows_bytes = 4096 and num_col_bytes = 3696. (/workspace/tensorrt_llm/cpp/tensorrt_llm/kernels/cutlass_kernels/cutlass_preprocessors.cpp:279)

additional notes

This issue seems due to the weight shape of Qwen2-72B model. I build quantization Qwen1.5-72B and Llama-3-70B success.

The text was updated successfully, but these errors were encountered:

wangpeilin added the bug Something isn't working label Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build Qwen2-72B-Instruct model by INT4-AWQ quantization failed #2445

Build Qwen2-72B-Instruct model by INT4-AWQ quantization failed #2445

wangpeilin commented Nov 14, 2024

Build Qwen2-72B-Instruct model by INT4-AWQ quantization failed #2445

Build Qwen2-72B-Instruct model by INT4-AWQ quantization failed #2445

Comments

wangpeilin commented Nov 14, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes