Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: _convert_weight_to_int4pack_cpu : expect weight to be kByte #274

Closed
dvrogozh opened this issue Aug 7, 2024 · 2 comments · Fixed by #286
Closed

RuntimeError: _convert_weight_to_int4pack_cpu : expect weight to be kByte #274

dvrogozh opened this issue Aug 7, 2024 · 2 comments · Fixed by #286

Comments

@dvrogozh
Copy link
Contributor

dvrogozh commented Aug 7, 2024

I am trying quanto library from head of the main branch (currently at 601dc19) against self-built pytorch main branch (upcoming 2.5, I am at pytorch/pytorch@073cee5).

With the stack described above I observe failures in one of the benchmark and in number of tests - see errors below. I don't have CUDA, execution goes to CPU with these examples.

$ python3 bench/torch_kernels/test_weight_int4pack_mm.py --device cpu
Traceback (most recent call last):
  File "/home/dvrogozh/git/huggingface/optimum-quanto/bench/torch_kernels/test_weight_int4pack_mm.py", line 119, in <module>
    main()
  File "/home/dvrogozh/git/huggingface/optimum-quanto/bench/torch_kernels/test_weight_int4pack_mm.py", line 93, in main
    B_packed = torch._convert_weight_to_int4pack(B_int32, innerKTiles=2)
RuntimeError: _convert_weight_to_int4pack_cpu : expect weight to be kByte.

$ python3 -m pytest --pspec --capture=no test/
FAILED test/tensor/ops/test_linear_dispatch.py::::test_linear_bf16_int4[cpu-bias-256-256-1] - RuntimeError: _convert_weight_to_int4pack_cpu : expect weight to be kByte.
FAILED test/tensor/ops/test_linear_dispatch.py::::test_linear_bf16_int4[cpu-bias-256-256-10] - RuntimeError: _convert_weight_to_int4pack_cpu : expect weight to be kByte.
FAILED test/tensor/ops/test_linear_dispatch.py::::test_linear_bf16_int4[cpu-no-bias-256-256-1] - RuntimeError: _convert_weight_to_int4pack_cpu : expect weight to be kByte.
FAILED test/tensor/ops/test_linear_dispatch.py::::test_linear_bf16_int4[cpu-no-bias-256-256-10] - RuntimeError: _convert_weight_to_int4pack_cpu : expect weight to be kByte.
FAILED test/tensor/optimizers/test_hqq_optimizer.py::::test_hqq_optimizer[cpu-128-first-axis-qint4-bf16-input_shape0] - RuntimeError: _convert_weight_to_int4pack_cpu : expect weight to be kByte.
<...>

In a benchmark issue is coming from here:

B_packed = torch._convert_weight_to_int4pack(B_int32, innerKTiles=2)

@dacorvo
Copy link
Collaborator

dacorvo commented Aug 14, 2024

It looks like the base quantized weights before packing must have a different dtype in pytorch 2.5 (they are int32 in 2.4). This could be addressed with conditional code eventually.

dvrogozh added a commit to dvrogozh/optimum-quanto that referenced this issue Aug 16, 2024
…>=2.5

Fixes: huggingface#274

PyTorch 2.5 adjust input weights of _convert_weight_to_int4pack_cpu
from [n][k] int32 to [n][k / 2] uint8. Changing quanto code accordingly.

See: pytorch/pytorch#129940
See: pytorch/pytorch@6f662e9
Signed-off-by: Dmitry Rogozhkin <[email protected]>
dvrogozh added a commit to dvrogozh/optimum-quanto that referenced this issue Aug 16, 2024
…>=2.5

Fixes: huggingface#274

PyTorch 2.5 adjusted input weights of _convert_weight_to_int4pack_cpu
from [n][k] int32 to [n][k / 2] uint8. Changing quanto code accordingly.

See: pytorch/pytorch#129940
See: pytorch/pytorch@6f662e9
Signed-off-by: Dmitry Rogozhkin <[email protected]>
@dvrogozh
Copy link
Contributor Author

@dacorvo : yes, it occurs that pytorch 2.5 has changed input tensor shape from [n][k] int32 to [n][k / 2] uint8 - see links below. I have tried to fix this for quanto in #286. Please, help to review.

See: pytorch/pytorch#129940
See: pytorch/pytorch@6f662e9

dvrogozh added a commit to dvrogozh/optimum-quanto that referenced this issue Aug 16, 2024
…>=2.5

Fixes: huggingface#274

PyTorch 2.5 adjusted input weights of _convert_weight_to_int4pack_cpu
from [n][k] int32 to [n][k / 2] uint8. Changing quanto code accordingly.

See: pytorch/pytorch#129940
See: pytorch/pytorch@6f662e9
Signed-off-by: Dmitry Rogozhkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants