-
Notifications
You must be signed in to change notification settings - Fork 38
[BesTLA] Support int5&int6 for kernels and models #259
Conversation
LLaMa2-7B
weight_dtype=int5, group_size=128, alg=asym, scale_dtype=bf16, comp_dtype=int8:
|
LLaMa2-7B
weight_dtype=int6, group_size=128, alg=asym, scale_dtype=bf16, comp_dtype=int8:
|
please update supported data types in https://github.com/intel/neural-speed/blob/main/docs/advanced_usage.md |
Added |
int4 reference
asym:
int5/int4 = 1.23 |
LLaMa2-7B
weight_dtype=int6, group_size=128, alg=sym, scale_dtype=bf16, comp_dtype=int8:
|
Type of Change
Add new weight_dtype: int5 and int6
Support model quantization of int5 and int6