Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate blockwise fp8 kernel #3529

Merged
merged 3 commits into from
Feb 12, 2025
Merged

integrate blockwise fp8 kernel #3529

merged 3 commits into from
Feb 12, 2025

Conversation

yizhang2077
Copy link
Collaborator

Motivation

Integrate #3267 into python side to optimize deepseekv3, merge it after #3267 has merged and sgl kernel is released

Modifications

Test Method

Following #3486, test on gsm8k and mmlu

latency accuracy
gsm8k (before) 97.148s 0.953
gsm8k (after) 82.708s 0.958
mmlu (before) 250.839s 0.871
mmlu (after) 240.800s 0.871

TODO

Checklist

  • Format your code according to the Code Formatting with Pre-Commit.
  • Add unit tests as outlined in the Running Unit Tests.
  • Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
  • Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
  • For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.

@zhyncs zhyncs changed the title integrate blockwise fp8 kernel to optimize DeepSeekV3 integrate blockwise fp8 kernel Feb 12, 2025
@zhyncs zhyncs merged commit 98eecbd into main Feb 12, 2025
21 checks passed
@zhyncs zhyncs deleted the integrate-cutlass-blockwise-fp8 branch February 12, 2025 20:39
chongli-uw pushed a commit to chongli-uw/sglang that referenced this pull request Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants