This repository has been archived by the owner on Aug 30, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 38
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[BesTLA] First-token inference optimization (#271)
* add per-channel kblock template * add gemv support for pckblock. revise all benchmark cases and UT cases. * fix bandwith calc of CompFp32 and CompBf16 * use correct core number * update thread pool * fix bug * fix bug * update kernels with gemm and qkv fusion * refactor epilogue (removing ISA from the class' template) * update fnn and ip_add * fix compile on gcc * fix gcc template * fix compile * update amx template * fix UT compile * fix benchmark compile * revert NTILE of amx_int8 * reduce templates * fix deprecated UTs. optimize cache block strategy * Enlarge stack size on windows * revert NTILE of amx_int8 * update cache config * add mul support * add mul implementation * support tensor mul tensor * fix compile on gcc * clang-format * fix doc * code scan fix * fix compile * fix batch bug * comment add * comment mul * enable mul&add * clang-format * fix the code bug of mul and add. use new kernels in custom::epilogue * clang-format --------- Co-authored-by: yuchengliu1 <[email protected]>
- Loading branch information
1 parent
af22f2a
commit 3757fda
Showing
26 changed files
with
1,515 additions
and
2,534 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.