Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

support double quant in BesTLA #60

Merged
merged 4 commits into from
Jan 18, 2024
Merged

support double quant in BesTLA #60

merged 4 commits into from
Jan 18, 2024

Conversation

zhewang1-intc
Copy link
Contributor

Type of Change

feature or bug fix or documentation or others: feature
API changed or not: Yes
how to enable double-quant:
auto packw=kernel.createStorage(n, k, blocksize, qtype, BTLA_DTYPE::DQ8_BNB);
double_quant blocksize will be same as blocksize,
if user want to reset double_quant blocksize just need to call
kernel.setDoubleQuantBlkSize(&packw, BTLA_DTYPE::DQ8_BNB, dq_blksize);

Description

detail description
JIRA:https://jira.devtools.intel.com/browse/NLPTOOLKIU-1102
support double-quant feature in BesTLA(scale using dynamic-tree-quantization which def from bisandbytes, for more details pls refer (8-Bit Approximations for Parallelism in Deep Learning)[https://arxiv.org/abs/1511.04561])
Supported Launcher: LauncerBase(ref getweight impl), LauncherIntKblock(avx512/avx2 getscale impl)

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

@zhewang1-intc zhewang1-intc changed the title Double quant main support double quant in BesTLA Jan 18, 2024
.gitignore Outdated Show resolved Hide resolved
@VincyZhang VincyZhang merged commit d9bce93 into main Jan 18, 2024
9 checks passed
@zhewang1-intc zhewang1-intc deleted the double_quant_main branch January 18, 2024 04:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants