-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update build
in Dense
and EinsumDense
for QuantizedDTypePolicy
#19347
Update build
in Dense
and EinsumDense
for QuantizedDTypePolicy
#19347
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #19347 +/- ##
==========================================
+ Coverage 75.83% 75.85% +0.02%
==========================================
Files 367 367
Lines 40371 40408 +37
Branches 7853 7861 +8
==========================================
+ Hits 30614 30652 +38
+ Misses 8065 8061 -4
- Partials 1692 1695 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
trainable=False, | ||
) | ||
kernel_scale_shape = (1, kernel_shape[1]) | ||
self.kernel_scale = self.add_weight( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should default to 1, not 0, otherwise the layer's output would always be 0.
Remind me, why does it need to be a variable? Could it just be a constant?
And does it need to have shape (1, kernel_shape[1])
? Could it be a scalar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should default to 1, not 0, otherwise the layer's output would always be 0.
I've changed the defaults from "zeros" to "ones". However, it should be considered a dummy initialization because the quantized weights must be loaded from a pretrained model.
why does it need to be a variable? Could it just be a constant?
It would be reasonable to allow kernel_scale
to be a variable. We can easily save and load it, and even modify it with the current APIs.
And does it need to have shape (1, kernel_shape[1])? Could it be a scalar?
I've changed the shape of kernel_scale
in Dense
to (self.units,)
representing a 1D vector
Now, the current implementation is the same as google/gemma_pytorch:
https://github.com/google/gemma_pytorch/blob/main/gemma/model.py#L112-L121
self.weight_scaler = nn.Parameter(torch.Tensor(out_features))
…` for it both in Dense and EinsumDense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
This PR updates the logic in
build
to directly add int8 weights ifdtype_policy
is a QuantizedDTypePolicyAdditionally, I have moved the logic about quantization to the bottom of the class.