Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: adjust _convert_weight_to_int4pack_cpu input weights for pytorch>=2.5 #286

Merged
merged 1 commit into from
Aug 20, 2024

Conversation

dvrogozh
Copy link
Contributor

@dvrogozh dvrogozh commented Aug 16, 2024

Fixes: #274

PyTorch 2.5 adjusted input weights of _convert_weight_to_int4pack_cpu from [n][k] int32 to [n][k / 2] uint8. Changing quanto code accordingly.

See: pytorch/pytorch#129940
See: pytorch/pytorch@6f662e9

CC: @dacorvo

…>=2.5

Fixes: huggingface#274

PyTorch 2.5 adjusted input weights of _convert_weight_to_int4pack_cpu
from [n][k] int32 to [n][k / 2] uint8. Changing quanto code accordingly.

See: pytorch/pytorch#129940
See: pytorch/pytorch@6f662e9
Signed-off-by: Dmitry Rogozhkin <[email protected]>
Copy link
Collaborator

@dacorvo dacorvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dvrogozh thank you very much for this pull-request. I did not look into the details of the new TinyGemm packing in pytorch, but I think it is now device agnostic. This is interesting because with 2.4, the weights are repacked when moving from cpu to cuda, which is slow (see #270). We could avoid the unpack/pack line 133 in packed.py with pytorch 2.5 (maybe an upcomimg pull-request ?).

@dacorvo dacorvo merged commit c02750b into huggingface:main Aug 20, 2024
13 checks passed
@dvrogozh
Copy link
Contributor Author

I did not look into the details of the new TinyGemm packing in pytorch, but I think it is now device agnostic.

Yes. This seems to be an intent of the change on pytorch side discussed in pytorch PR description pytorch/pytorch#129940.

We could avoid the unpack/pack line 133 in packed.py with pytorch 2.5 (maybe an upcomimg pull-request ?).

It seems so. Worth to try out. I shall see if I will have success building for cuda today to give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RuntimeError: _convert_weight_to_int4pack_cpu : expect weight to be kByte
2 participants