You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are planning to replace the underneath kernel implementation with the newly developed CK tile-programming fmha kernel. The performance is much better for MI200/MI300, especially for MI300 cases. After this is done, the current implementation in main branch will be deprecated.
fwd integration with hdim=64/128, support mask, varlen, different kernels for padding case.
fwd extend to other hdims
dropout support
bwd integration (to be planed)
The text was updated successfully, but these errors were encountered:
@carlushuang Our top priority ask on FA should be enabling E2E training for GPT3/LLAMA2/Qwen/MPT first.
Are you proposing to start integration based on current ROCM FA or base on latest upstream FA?
We are planning to replace the underneath kernel implementation with the newly developed CK tile-programming fmha kernel. The performance is much better for MI200/MI300, especially for MI300 cases. After this is done, the current implementation in main branch will be deprecated.
The text was updated successfully, but these errors were encountered: