Feature：Support flashMLA decoding via flashAttn2 #36

zhangjinnan · 2025-02-24T15:57:51Z

Changes: (#29)

Implement flashMLA with matrix absorption algorithm via flashAttn2
Add golden test on MXMACA platform

FlashAttention-2 currently supports:

Datatype fp16 and bf16.
Multi-Token Parallelism = 1
Paged kvcache with block size equal to 2^n (n >= 0)

Changes: 1. Implement flashMLA with matrix absorption algorithm via flashAttn2 2. Add golden test on MXMACA platform

yiakwy-xpu-ml-framework-team · 2025-02-24T16:28:43Z

Will it be better to move Flashinfer related function from vendors to flashinfer repo ?

zhangjinnan · 2025-02-25T08:00:19Z

https://github.com/[MetaX-MACA/FlashMLA](https://github.com/MetaX-MACA/FlashMLA)

sanggusti · 2025-02-25T08:00:28Z

I think cutlass that is called via submodules is handling what flashAttn2 functionalities that you mentioned

zhangjinnan · 2025-02-25T08:40:27Z

I think cutlass that is called via submodules is handling what flashAttn2 functionalities that you mentioned

you can watch update in https://github.com/MetaX-MACA/FlashMLA repo.

Feature：Support flashMLA decoding via flashAttn2(#29)

e0557de

Changes: 1. Implement flashMLA with matrix absorption algorithm via flashAttn2 2. Add golden test on MXMACA platform

zhangjinnan marked this pull request as draft February 24, 2025 16:17

zhangjinnan marked this pull request as ready for review February 24, 2025 16:18

zhangjinnan marked this pull request as draft February 24, 2025 16:18

zhangjinnan marked this pull request as ready for review February 24, 2025 16:23

zhangjinnan closed this Feb 25, 2025

zhangjinnan mentioned this pull request Feb 25, 2025

[Feature]Support flashMLA decoding via flashAttn2 #29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature：Support flashMLA decoding via flashAttn2 #36

Feature：Support flashMLA decoding via flashAttn2 #36

zhangjinnan commented Feb 24, 2025 •

edited

Loading

yiakwy-xpu-ml-framework-team commented Feb 24, 2025

zhangjinnan commented Feb 25, 2025

sanggusti commented Feb 25, 2025

zhangjinnan commented Feb 25, 2025

Feature：Support flashMLA decoding via flashAttn2 #36

Feature：Support flashMLA decoding via flashAttn2 #36

Conversation

zhangjinnan commented Feb 24, 2025 • edited Loading

yiakwy-xpu-ml-framework-team commented Feb 24, 2025

zhangjinnan commented Feb 25, 2025

sanggusti commented Feb 25, 2025

zhangjinnan commented Feb 25, 2025

zhangjinnan commented Feb 24, 2025 •

edited

Loading