Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2.5-instruct-AWQ Quantization Int4 cannot launch from latest docker containers with #2505

Open
1 of 3 tasks
zhyuchao123 opened this issue Oct 31, 2024 · 0 comments
Open
1 of 3 tasks
Labels
Milestone

Comments

@zhyuchao123
Copy link

zhyuchao123 commented Oct 31, 2024

System Info / 系統信息

cuda12.4
system: ubuntu 22.04.5 LTS
GPU:A10 (23G) * 4
images: xprobe/xinference:latest
image ID: 7a1223f1d698
vllm: 0.6.0
vllm-flash-attn: 2.6.1

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

version 0.16.1

pip packages' infomation:

accelerate 0.34.0
aiofiles 23.2.1
aiohappyeyeballs 2.4.0
aiohttp 3.10.5
aioprometheus 23.12.0
aiosignal 1.3.1
aliyun-python-sdk-core 2.16.0
aliyun-python-sdk-kms 2.16.5
altair 5.4.1
annotated-types 0.7.0
anthropic 0.37.1
antlr4-python3-runtime 4.9.3
anyio 4.4.0
argcomplete 3.5.1
async-timeout 4.0.3
attrdict 2.0.1
attrs 24.2.0
audioread 3.0.1
auto_gptq 0.7.1
autoawq 0.2.5
autoawq_kernels 0.0.6
av 13.1.0
bcrypt 4.2.0
beautifulsoup4 4.12.3
bitsandbytes 0.44.1
black 24.10.0
boto3 1.28.64
botocore 1.31.85
cdifflib 1.2.6
certifi 2019.11.28
cffi 1.17.1
chardet 3.0.4
charset-normalizer 3.3.2
chattts 0.2.0
click 8.1.7
cloudpickle 3.0.0
colorama 0.4.6
coloredlogs 15.0.1
conformer 0.3.2
contourpy 1.3.0
controlnet-aux 0.0.7
crcmod 1.7
cryptography 43.0.3
cycler 0.12.1
Cython 3.0.11
datamodel-code-generator 0.26.2
datasets 2.21.0
dbus-python 1.2.16
decorator 5.1.1
DeepCache 0.1.1
diffusers 0.31.0
dill 0.3.8
diskcache 5.6.3
distro 1.9.0
distro-info 0.23+ubuntu1.1
dnspython 2.7.0
ecdsa 0.19.0
editdistance 0.8.1
einops 0.8.0
einx 0.3.0
email_validator 2.2.0
encodec 0.1.1
eva-decord 0.6.1
exceptiongroup 1.2.2
fastapi 0.112.2
ffmpy 0.4.0
filelock 3.15.4
FlagEmbedding 1.2.11
flashinfer 0.1.6+cu124torch2.4
flatbuffers 24.3.25
fonttools 4.54.1
frozendict 2.4.6
frozenlist 1.4.1
fsspec 2024.6.1
funasr 1.1.12
fvcore 0.1.5.post20221221
gdown 5.2.0
gekko 1.2.1
genson 1.3.0
gguf 0.9.1
gradio 4.26.0
gradio_client 0.15.1
h11 0.14.0
hf_transfer 0.1.8
hiredis 3.0.0
httpcore 1.0.5
httptools 0.6.1
httpx 0.27.2
huggingface-hub 0.24.6
humanfriendly 10.0
hydra-core 1.3.2
HyperPyYAML 1.2.2
idna 2.8
imageio 2.36.0
imageio-ffmpeg 0.5.1
importlib_metadata 8.4.0
importlib_resources 6.4.5
inflect 5.6.2
interegular 0.3.3
iopath 0.1.10
isort 5.13.2
jaconv 0.4.0
jamo 0.4.1
jieba 0.42.1
Jinja2 3.1.4
jiter 0.5.0
jj-pytorchvideo 0.1.5
jmespath 0.10.0
joblib 1.4.2
jsonschema 4.23.0
jsonschema-specifications 2023.12.1
kaldiio 2.18.0
kiwisolver 1.4.7
lark 1.2.2
lazy_loader 0.4
libnacl 2.1.0
librosa 0.10.2.post1
lightning 2.4.0
lightning-utilities 0.11.8
litellm 1.50.4
llama_cpp_python 0.2.90
llvmlite 0.43.0
lm-format-enforcer 0.10.6
loguru 0.7.2
loralib 0.1.2
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.9.2
mdurl 0.1.2
mistral_common 1.3.4
modelscope 1.17.1
mpmath 1.3.0
msgpack 1.0.8
msgspec 0.18.6
multidict 6.0.5
multiprocess 0.70.16
mypy-extensions 1.0.0
narwhals 1.10.0
natsort 8.4.0
nemo_text_processing 1.0.2
nest-asyncio 1.6.0
networkx 3.3
numba 0.60.0
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-ml-py 12.560.30
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.6.68
nvidia-nvtx-cu12 12.1.105
omegaconf 2.3.0
onnxruntime-gpu 1.16.0
openai 1.52.2
opencv-contrib-python-headless 4.10.0.84
opencv-python 4.10.0.84
optimum 1.23.2
orjson 3.10.10
ormsgpack 1.6.0
oss2 2.19.0
outlines 0.0.46
packaging 24.1
pandas 2.2.2
parameterized 0.9.0
partial-json-parser 0.2.1.1.post4
passlib 1.7.4
pathspec 0.12.1
peft 0.13.2
pillow 10.4.0
pip 24.2
platformdirs 4.3.6
plumbum 1.9.0
pooch 1.8.2
portalocker 2.10.1
prometheus_client 0.20.0
prometheus-fastapi-instrumentator 7.0.0
protobuf 5.28.0
psutil 6.0.0
py-cpuinfo 9.0.0
pyairports 2.1.1
pyarrow 17.0.0
pyasn1 0.6.1
pybase16384 0.3.7
pycountry 24.6.1
pycparser 2.22
pycryptodome 3.21.0
pydantic 2.8.2
pydantic_core 2.20.1
pydub 0.25.1
Pygments 2.18.0
PyGObject 3.36.0
pynini 2.1.5
pynndescent 0.5.13
pyparsing 3.2.0
PySocks 1.7.1
python-apt 2.0.1+ubuntu0.20.4.1
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-jose 3.3.0
python-multipart 0.0.12
pytorch-lightning 2.4.0
pytorch-wpe 0.0.1
pytz 2024.1
PyYAML 6.0.2
pyzmq 26.2.0
quantile-python 1.1
qwen-vl-utils 0.0.8
ray 2.35.0
redis 5.2.0
referencing 0.35.1
regex 2024.7.24
requests 2.32.3
requests-unixsocket 0.2.0
rich 13.9.3
rouge 1.0.1
rpds-py 0.20.0
rpyc 6.0.1
rsa 4.9
ruamel.yaml 0.18.6
ruamel.yaml.clib 0.2.12
ruff 0.7.1
s3transfer 0.7.0
sacremoses 0.1.1
safetensors 0.4.4
scikit-image 0.24.0
scikit-learn 1.5.2
scipy 1.14.1
semantic-version 2.10.0
sentence-transformers 3.2.1
sentencepiece 0.2.0
setuptools 75.2.0
sglang 0.3.4.post1
shellingham 1.5.4
six 1.14.0
sniffio 1.3.1
soundfile 0.12.1
soupsieve 2.6
soxr 0.5.0.post1
sse-starlette 2.1.3
starlette 0.38.4
sympy 1.13.2
tabulate 0.9.0
tblib 3.0.0
tensorboardX 2.6.2.2
tensorizer 2.9.0
termcolor 2.5.0
threadpoolctl 3.5.0
tifffile 2024.9.20
tiktoken 0.7.0
timm 1.0.11
tokenizers 0.20.1
toml 0.10.2
tomli 2.0.2
tomlkit 0.12.0
torch 2.4.0
torch-complex 0.4.4
torchaudio 2.4.0
torchmetrics 1.5.1
torchvision 0.19.0
tqdm 4.66.5
transformers 4.45.2
transformers-stream-generator 0.0.5
triton 3.0.0
typer 0.11.1
typing_extensions 4.12.2
tzdata 2024.1
umap-learn 0.5.6
unattended-upgrades 0.1
urllib3 2.0.7
uvicorn 0.30.6
uvloop 0.20.0
vector-quantize-pytorch 1.18.5
verovio 4.3.1
vllm 0.6.0
vllm-flash-attn 2.6.1
vocos 0.1.0
watchfiles 0.24.0
websockets 11.0.3
WeTextProcessing 1.0.3
wget 3.2
wheel 0.34.2
wrapt 1.16.0
xformers 0.0.27.post2
xinference 0.16.1
xoscar 0.3.3
xxhash 3.5.0
yacs 0.1.8
yarl 1.9.9
zipp 3.20.1
zmq 0.0.0
zstandard 0.23.0

The command used to start Xinference / 用以启动 xinference 的命令

docker run -d -p 9997:9997 -e XINFERENCE_HOME=/data -e XINFERENCE_MODEL_SRC=modelscope -v /opt/dlami/nvme/models:/data --gpus all xprobe/xinference:latest xinference-local -H 0.0.0.0

Reproduction / 复现过程

command

xinference launch -en vllm --size-in-billions 72 --model-format awq --gpu-idx 0,1,2,3 --n-gpu 4 --quantization Int4 -n qwen2.5-instruct --max_model_len 4096 --gpu_memory_utilization 0.99 --NCCL_DEBUG=INFO

result

Traceback (most recent call last):
File "/usr/local/bin/xinference", line 8, in
sys.exit(cli())
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xinference/deploy/cmdline.py", line 845, in model_launch
kwargs[ctx.args[i][2:]] = handle_click_args_type(ctx.args[i + 1])
IndexError: list index out of range

Expected behavior / 期待表现

qwen2.5-instruct-awq q4 model can run launch smoothly on my device.

@XprobeBot XprobeBot added the gpu label Oct 31, 2024
@XprobeBot XprobeBot added this to the v0.16 milestone Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants