is textencoder Quantized Diffusers models can be saved and loaded? #264

lonngxiang · 2024-07-30T11:37:02Z

No description provided.

sayakpaul · 2024-08-02T03:54:44Z

They can be. Follow this: https://github.com/huggingface/optimum-quanto?tab=readme-ov-file#llm-models

For the unimplemented classes, you can just refer to

optimum-quanto/optimum/quanto/models/transformers_models.py

Line 171 in 601dc19

class QuantizedModelForCausalLM(QuantizedTransformersModel):

And send us a PR.

lonngxiang · 2024-08-02T04:01:21Z

They can be. Follow this: https://github.com/huggingface/optimum-quanto?tab=readme-ov-file#llm-models

For the unimplemented classes, you can just refer to

optimum-quanto/optimum/quanto/models/transformers_models.py

Line 171 in 601dc19

class QuantizedModelForCausalLM(QuantizedTransformersModel):

And send us a PR.

how PixArt-alpha/PixArt-Sigma-XL-2-1024-MS model use save and load text Quantized model

Ednaordinary · 2024-08-02T04:06:38Z

They can be. Follow this: https://github.com/huggingface/optimum-quanto?tab=readme-ov-file#llm-models
For the unimplemented classes, you can just refer to

optimum-quanto/optimum/quanto/models/transformers_models.py

Line 171 in 601dc19

class QuantizedModelForCausalLM(QuantizedTransformersModel):

And send us a PR.

how PixArt-alpha/PixArt-Sigma-XL-2-1024-MS model use save and load text Quantized model

This is shown in the guide
https://huggingface.co/blog/quanto-diffusers

Ednaordinary · 2024-08-02T04:11:31Z

Also, @sayakpaul, is there a way to load models without entirely requantizing them? I'm trying to load flux from a quantized save (made a class), but it takes a while due to the requantization (I think), to the point where there's no real reason to not just load normally and quantize on the fly

sayakpaul · 2024-08-02T04:14:42Z

Will let @dacorvo comment on that. But he is on vacation so expect delay.

github-actions · 2024-09-02T01:59:49Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

dacorvo · 2024-09-02T07:23:09Z

@Ednaordinary some recent pull-requests might have fixed the long requantization times (see #290).

Ednaordinary · 2024-09-02T07:37:42Z

@dacorvo Nice! Something along the way must have also changed how the first move between devices works because I'm seeing a lot faster transfer between CPU and GPU on the first move. Thanks for your hard work!

dacorvo · 2024-09-02T09:03:45Z

Something along the way must have also changed how the first move between devices works because I'm seeing a lot faster transfer between CPU and GPU on the first move.

This must be #291.

Thanks for your hard work!

Kudos to @latentCall145 !

github-actions bot added the Stale label Sep 2, 2024

dacorvo removed the Stale label Sep 2, 2024

dacorvo closed this as completed Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is textencoder Quantized Diffusers models can be saved and loaded? #264

is textencoder Quantized Diffusers models can be saved and loaded? #264

lonngxiang commented Jul 30, 2024

sayakpaul commented Aug 2, 2024

lonngxiang commented Aug 2, 2024

Ednaordinary commented Aug 2, 2024

Ednaordinary commented Aug 2, 2024

sayakpaul commented Aug 2, 2024

github-actions bot commented Sep 2, 2024

dacorvo commented Sep 2, 2024

Ednaordinary commented Sep 2, 2024

dacorvo commented Sep 2, 2024

is textencoder Quantized Diffusers models can be saved and loaded? #264

is textencoder Quantized Diffusers models can be saved and loaded? #264

Comments

lonngxiang commented Jul 30, 2024

sayakpaul commented Aug 2, 2024

lonngxiang commented Aug 2, 2024

Ednaordinary commented Aug 2, 2024

Ednaordinary commented Aug 2, 2024

sayakpaul commented Aug 2, 2024

github-actions bot commented Sep 2, 2024

dacorvo commented Sep 2, 2024

Ednaordinary commented Sep 2, 2024

dacorvo commented Sep 2, 2024