Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is textencoder Quantized Diffusers models can be saved and loaded? #264

Closed
lonngxiang opened this issue Jul 30, 2024 · 9 comments
Closed

Comments

@lonngxiang
Copy link

No description provided.

@sayakpaul
Copy link
Member

They can be. Follow this: https://github.com/huggingface/optimum-quanto?tab=readme-ov-file#llm-models

For the unimplemented classes, you can just refer to

class QuantizedModelForCausalLM(QuantizedTransformersModel):

And send us a PR.

@lonngxiang
Copy link
Author

They can be. Follow this: https://github.com/huggingface/optimum-quanto?tab=readme-ov-file#llm-models

For the unimplemented classes, you can just refer to

class QuantizedModelForCausalLM(QuantizedTransformersModel):

And send us a PR.

how PixArt-alpha/PixArt-Sigma-XL-2-1024-MS model use save and load text Quantized model

@Ednaordinary
Copy link

They can be. Follow this: https://github.com/huggingface/optimum-quanto?tab=readme-ov-file#llm-models
For the unimplemented classes, you can just refer to

class QuantizedModelForCausalLM(QuantizedTransformersModel):

And send us a PR.

how PixArt-alpha/PixArt-Sigma-XL-2-1024-MS model use save and load text Quantized model

This is shown in the guide
https://huggingface.co/blog/quanto-diffusers
image

@Ednaordinary
Copy link

Also, @sayakpaul, is there a way to load models without entirely requantizing them? I'm trying to load flux from a quantized save (made a class), but it takes a while due to the requantization (I think), to the point where there's no real reason to not just load normally and quantize on the fly
image

@sayakpaul
Copy link
Member

Will let @dacorvo comment on that. But he is on vacation so expect delay.

Copy link

github-actions bot commented Sep 2, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Sep 2, 2024
@dacorvo dacorvo removed the Stale label Sep 2, 2024
@dacorvo
Copy link
Collaborator

dacorvo commented Sep 2, 2024

@Ednaordinary some recent pull-requests might have fixed the long requantization times (see #290).

@Ednaordinary
Copy link

@dacorvo Nice! Something along the way must have also changed how the first move between devices works because I'm seeing a lot faster transfer between CPU and GPU on the first move. Thanks for your hard work!

@dacorvo
Copy link
Collaborator

dacorvo commented Sep 2, 2024

Something along the way must have also changed how the first move between devices works because I'm seeing a lot faster transfer between CPU and GPU on the first move.

This must be #291.

Thanks for your hard work!

Kudos to @latentCall145 !

@dacorvo dacorvo closed this as completed Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants