-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate KleidiAI for MatMulNBits via MlasQNBitGemm #23627
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Michael Tyler <[email protected]>
Signed-off-by: Michael Tyler <[email protected]>
Signed-off-by: Michael Tyler <[email protected]>
Signed-off-by: Michael Tyler <[email protected]>
@@ -99,6 +100,10 @@ function(setup_mlas_source_for_windows) | |||
${MLAS_SRC_DIR}/halfgemm_kernel_neon_fp16.cpp | |||
) | |||
|
|||
setup_kleidiai() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that kleidiai will be a new dependency for all ONNX Runtime build configs. For such changes the onnx runtime team needs to hold an internal discussion with the leadership of this project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot move forward until the internal review is complete, since this PR adds a new dependency.
Please fix the iOS build errors. |
Signed-off-by: Michael Tyler <[email protected]>
Signed-off-by: Michael Tyler <[email protected]>
Signed-off-by: Michael Tyler <[email protected]>
/azp run Linux CPU CI Pipeline, Windows CPU CI Pipeline, Linux QNN CI Pipeline |
Azure Pipelines successfully started running 3 pipeline(s). |
Can the workflows be retriggered please? |
/azp run Linux CPU CI Pipeline, Windows CPU CI Pipeline, Linux QNN CI Pipeline |
Azure Pipelines successfully started running 3 pipeline(s). |
Signed-off-by: Michael Tyler <[email protected]>
Signed-off-by: Michael Tyler <[email protected]>
Can the workflows be retriggered please? |
/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Linux CPU CI Pipeline |
/azp run Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline |
/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline,CoreML CI Pipeline,Linux DNNL CI Pipeline,Linux MIGraphX CI Pipeline,Linux ROCm CI Pipeline |
Azure Pipelines successfully started running 6 pipeline(s). |
Azure Pipelines successfully started running 10 pipeline(s). |
1 similar comment
Azure Pipelines successfully started running 10 pipeline(s). |
Signed-off-by: Michael Tyler <[email protected]>
Signed-off-by: Michael Tyler <[email protected]>
Signed-off-by: Michael Tyler <[email protected]>
Signed-off-by: Michael Tyler <[email protected]>
Signed-off-by: Michael Tyler <[email protected]>
Signed-off-by: Michael Tyler <[email protected]>
Co-authored-by: Edward Chen <[email protected]>
Co-authored-by: Edward Chen <[email protected]>
Signed-off-by: Michael Tyler <[email protected]>
Description
This PR integrates Arm® KleidiAI™ to provide optimized assembly kernels for matrix multiplication with 4-bit quantized weights. These changes target the MlasQNBitGemm functions, and can be utilized via the MatMulNBits operator.
Motivation and Context
These optimized assembly kernels lead to significant performance improvements on Arm-based devices.