-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: _convert_weight_to_int4pack_cpu : expect weight to be kByte #274
Comments
It looks like the base quantized weights before packing must have a different dtype in pytorch 2.5 (they are int32 in 2.4). This could be addressed with conditional code eventually. |
dvrogozh
added a commit
to dvrogozh/optimum-quanto
that referenced
this issue
Aug 16, 2024
…>=2.5 Fixes: huggingface#274 PyTorch 2.5 adjust input weights of _convert_weight_to_int4pack_cpu from [n][k] int32 to [n][k / 2] uint8. Changing quanto code accordingly. See: pytorch/pytorch#129940 See: pytorch/pytorch@6f662e9 Signed-off-by: Dmitry Rogozhkin <[email protected]>
dvrogozh
added a commit
to dvrogozh/optimum-quanto
that referenced
this issue
Aug 16, 2024
…>=2.5 Fixes: huggingface#274 PyTorch 2.5 adjusted input weights of _convert_weight_to_int4pack_cpu from [n][k] int32 to [n][k / 2] uint8. Changing quanto code accordingly. See: pytorch/pytorch#129940 See: pytorch/pytorch@6f662e9 Signed-off-by: Dmitry Rogozhkin <[email protected]>
dvrogozh
added a commit
to dvrogozh/optimum-quanto
that referenced
this issue
Aug 16, 2024
…>=2.5 Fixes: huggingface#274 PyTorch 2.5 adjusted input weights of _convert_weight_to_int4pack_cpu from [n][k] int32 to [n][k / 2] uint8. Changing quanto code accordingly. See: pytorch/pytorch#129940 See: pytorch/pytorch@6f662e9 Signed-off-by: Dmitry Rogozhkin <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I am trying quanto library from head of the main branch (currently at 601dc19) against self-built pytorch main branch (upcoming 2.5, I am at pytorch/pytorch@073cee5).
With the stack described above I observe failures in one of the benchmark and in number of tests - see errors below. I don't have CUDA, execution goes to CPU with these examples.
In a benchmark issue is coming from here:
optimum-quanto/bench/torch_kernels/test_weight_int4pack_mm.py
Line 93 in 601dc19
The text was updated successfully, but these errors were encountered: