integrate blockwise fp8 kernel #3529

yizhang2077 · 2025-02-12T17:47:56Z

Motivation

Integrate #3267 into python side to optimize deepseekv3, merge it after #3267 has merged and sgl kernel is released

Modifications

Test Method

Following #3486, test on gsm8k and mmlu

	latency	accuracy
gsm8k (before)	97.148s	0.953
gsm8k (after)	82.708s	0.958
mmlu (before)	250.839s	0.871
mmlu (after)	240.800s	0.871

TODO

remove some of sgl kenel files once CUTLASS has solve this issue [QST] how to use groupwise scaling along M for FP8 gemm to impelement per-token-per-128-channel and blockwise? NVIDIA/cutlass#2087
maybe we can replace per_token_group_quant_fp8 with sgl_per_token_group_quant_fp8, but we need column major scale output.
use more robust shape check (not only for 128 x 128 blockwise)

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.

integrate blockwise fp8 kernel

efac164

yizhang2077 assigned zhyncs Feb 12, 2025

yizhang2077 requested review from merrymercy, Ying1123, zhyncs and ispobock as code owners February 12, 2025 17:47

zhyncs changed the title ~~integrate blockwise fp8 kernel to optimize DeepSeekV3~~ integrate blockwise fp8 kernel Feb 12, 2025

yizhang2077 and others added 2 commits February 13, 2025 01:54

update sgl kernel to 0.0.3.post5

1ba0d27

Merge branch 'main' into integrate-cutlass-blockwise-fp8

d18b726

zhyncs merged commit 98eecbd into main Feb 12, 2025
21 checks passed

zhyncs deleted the integrate-cutlass-blockwise-fp8 branch February 12, 2025 20:39

chongli-uw pushed a commit to chongli-uw/sglang that referenced this pull request Feb 15, 2025

integrate blockwise fp8 kernel (sgl-project#3529)

7ea559a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integrate blockwise fp8 kernel #3529

integrate blockwise fp8 kernel #3529

yizhang2077 commented Feb 12, 2025

integrate blockwise fp8 kernel #3529

integrate blockwise fp8 kernel #3529

Conversation

yizhang2077 commented Feb 12, 2025

Motivation

Modifications

Test Method

TODO

Checklist