You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is difficult to run LLM with f32/f16 on pc, To perform inference of LLM on the edge, it is almost necessary to use Q4 quantization. Perhaps Int4 can be used as a built-in type
The text was updated successfully, but these errors were encountered:
We can't upload int8 or int4 to the GPU, but @laggui is working on quantization on Burn. We will probably create abstractions making it easier to create quantized kernels
It is difficult to run LLM with f32/f16 on pc, To perform inference of LLM on the edge, it is almost necessary to use Q4 quantization. Perhaps Int4 can be used as a built-in type
The text was updated successfully, but these errors were encountered: