Failed to do quantilization #68

PeterYang12 · 2025-01-03T09:24:10Z

I followed the README but failed to do quantization for meta-llama/Llama-3.1-8B-Instruct

run command:

./calibrate_model.sh -m meta-llama/Llama-3.1-8B-Instruct -d /workspace/vllm-hpu-extension/calibration/open_orca/open_orca_gpt4_tokenized_llama.calibration_1000.pkl -o /workspace/vllm-hpu-extension/calibration/inc

The error message below:

[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1606, in _call_impl
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]: TypeError: PatchedVLLMKVCache.forward_measure() missing 2 required positional arguments: 'block_indices' and 'block_offset'
Step 2/4 done

3/4 Postprocessing scales
[]
[]
finished fix_measurements script
cp: cannot stat 'inc_tmp/llama-3.1-8b-instruct/g2/*': No such file or directory

The text was updated successfully, but these errors were encountered:

nirda7 · 2025-01-05T08:35:55Z

@PeterYang12
looks like a mismatch between vllm-fork version and vllm-hpu-extension version.
try uninstall all vllm related packages. (the 2 above, might do it more than once for each)
then:
pip install -e .
in vllm-fork base folder. (this should also install vllm-hpu-extension automatically)
then run the calibrate script again

Another option will be to only uninstall vllm-hpu-extension )again - might need to do it more than once) and then install it again.

PeterYang12 · 2025-01-10T03:22:52Z

Thank you. I am curious why I must input some data to do the quantization. Is it Gaudi specific?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to do quantilization #68

Failed to do quantilization #68

PeterYang12 commented Jan 3, 2025

nirda7 commented Jan 5, 2025

PeterYang12 commented Jan 10, 2025

Failed to do quantilization #68

Failed to do quantilization #68

Comments

PeterYang12 commented Jan 3, 2025

nirda7 commented Jan 5, 2025

PeterYang12 commented Jan 10, 2025