Avoid random weights initialization when quantizing #291

dacorvo · 2024-08-23T14:17:10Z

What does this PR do?

As raised by @latentCall145, there is a useless random weights initialization when quantizing a module.
The solution suggested in #290 is correct but makes the low-level quantization API depend on accelerate, which is only an optional dependency used by the high-level model API.

This is more or less the same implementation, but more explicitly using the meta device. Note that we need to explicitly preserve the scales, since unlike accelerate, pytorch does not distinguish between parameters and buffers when skipping initialization.

The device parameter is added to qcreate, and scale buffers are created on the same device as the weights.

dacorvo added 2 commits August 23, 2024 13:52

fix(qmodule): consistent device management

007217e

The device parameter is added to qcreate, and scale buffers are created on the same device as the weights.

feat(qmodule): avoid random weights initialization

b0a0d14

dacorvo requested review from sayakpaul and SunMarc August 23, 2024 15:55

dacorvo mentioned this pull request Aug 23, 2024

perf: faster and less memory-intensive model [re]quantization #290

Closed

4 tasks

dacorvo merged commit a1c310b into main Aug 24, 2024
16 checks passed

dacorvo deleted the avoid_random_weights_quantize branch August 24, 2024 12:36

dacorvo mentioned this pull request Sep 2, 2024

is textencoder Quantized Diffusers models can be saved and loaded? #264

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid random weights initialization when quantizing #291

Avoid random weights initialization when quantizing #291

dacorvo commented Aug 23, 2024 •

edited

Loading

Avoid random weights initialization when quantizing #291

Avoid random weights initialization when quantizing #291

Conversation

dacorvo commented Aug 23, 2024 • edited Loading

What does this PR do?

dacorvo commented Aug 23, 2024 •

edited

Loading