-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can inference be done at FP8? for both 1K and 2K models #128
Comments
There will be int4 version of Sana released soon. |
Awesome I am following ❤️ |
for fp8 you can use torchao toolbox |
how to use with current pipeline here? |
https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/pipelines/sana.md#quantization @FurkanGozukara This is what you are looking for. |
@lawrence-cj thanks is diffusers ready right now with all pull requests you made? |
What pull request are we talking about? |
@lawrence-cj i see none of these are closed they are not merged yet i assume? |
They are just opened within 24 hours. No hurry. |
waiting for int4 version, will be a game changer! Any way i can contribute to the quantization process ? |
People asking me to further reduce VRAM usage.
Currently 1K model uses 8.7 GB minimum with VAE offloading.
If we could do inference at FP8 that would reduce VRAM usage significantly
I am using official SANA pipeline shared here
The text was updated successfully, but these errors were encountered: