This tutorial is about how to use inline mfma GCN asm in kernel.
MI-100 support MFMA (Matrix Fused Multiply Add) instructions set. This example just introduces how to call and compile
- mfma fp32
- mfma fp16 in HIP source kernel.
For more insight Please read the following blogs by Ben Sander The Art of AMDGCN Assembly: How to Bend the Machine to Your Will AMD GCN Assembly: Cross-Lane Operations
For more information: AMD GCN3 ISA Architecture Manual User Guide for AMDGPU Back-end