Can inference be done at FP8? for both 1K and 2K models #128

FurkanGozukara · 2025-01-05T00:01:20Z

People asking me to further reduce VRAM usage.

Currently 1K model uses 8.7 GB minimum with VAE offloading.

If we could do inference at FP8 that would reduce VRAM usage significantly

I am using official SANA pipeline shared here

lawrence-cj · 2025-01-05T09:47:07Z

There will be int4 version of Sana released soon.

FurkanGozukara · 2025-01-05T10:14:14Z

There will be int4 version of Sana released soon.

Awesome I am following ❤️

xieenze · 2025-01-08T22:40:31Z

for fp8 you can use torchao toolbox

FurkanGozukara · 2025-01-09T12:58:07Z

for fp8 you can use torchao toolbox

how to use with current pipeline here?

lawrence-cj · 2025-01-10T06:26:22Z

https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/pipelines/sana.md#quantization

@FurkanGozukara This is what you are looking for.

FurkanGozukara · 2025-01-10T15:01:52Z

@lawrence-cj thanks

is diffusers ready right now with all pull requests you made?

lawrence-cj · 2025-01-10T15:07:42Z

What pull request are we talking about?

FurkanGozukara · 2025-01-10T15:09:25Z

@lawrence-cj i see none of these are closed they are not merged yet i assume?

huggingface/diffusers#10523

huggingface/diffusers#10510

lawrence-cj · 2025-01-10T15:12:41Z

They are just opened within 24 hours. No hurry.

Shiven-saini · 2025-01-20T07:33:03Z

waiting for int4 version, will be a game changer! Any way i can contribute to the quantization process ?

lawrence-cj added the Answered Answered the question label Jan 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can inference be done at FP8? for both 1K and 2K models #128

Can inference be done at FP8? for both 1K and 2K models #128

FurkanGozukara commented Jan 5, 2025

lawrence-cj commented Jan 5, 2025

FurkanGozukara commented Jan 5, 2025

xieenze commented Jan 8, 2025

FurkanGozukara commented Jan 9, 2025

lawrence-cj commented Jan 10, 2025 •

edited

Loading

FurkanGozukara commented Jan 10, 2025

lawrence-cj commented Jan 10, 2025

FurkanGozukara commented Jan 10, 2025

lawrence-cj commented Jan 10, 2025

Shiven-saini commented Jan 20, 2025

Can inference be done at FP8? for both 1K and 2K models #128

Can inference be done at FP8? for both 1K and 2K models #128

Comments

FurkanGozukara commented Jan 5, 2025

lawrence-cj commented Jan 5, 2025

FurkanGozukara commented Jan 5, 2025

xieenze commented Jan 8, 2025

FurkanGozukara commented Jan 9, 2025

lawrence-cj commented Jan 10, 2025 • edited Loading

FurkanGozukara commented Jan 10, 2025

lawrence-cj commented Jan 10, 2025

FurkanGozukara commented Jan 10, 2025

lawrence-cj commented Jan 10, 2025

Shiven-saini commented Jan 20, 2025

lawrence-cj commented Jan 10, 2025 •

edited

Loading