New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Introduce FP16 quantization #437

Open

LHT129 opened this issue Feb 24, 2025 · 0 comments

Assignees

Labels

kind/feature version/0.14

Collaborator

LHT129 commented Feb 24, 2025

FP16 is 16bit quantization shorter than float, but have high precise on ANN calculation, FP16 can be calculated with avx512 too.

LHT129 assigned wxyucs and LHT129 and unassigned wxyucs

LHT129 mentioned this issue

Introduce fp16 simd operators #436

Merged

wxyucs added version/0.15 kind/feature labels

LHT129 mentioned this issue

Support fp16 quantizer on HGraph #454

Open

LHT129 added version/0.14 and removed version/0.15 labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment