Skip to content

Latest commit

 

History

History
 
 

eltwise_mul

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Eltwise Multiplication

This design implements a bfloat16 based element-wise multiplication between two vectors, performed in parallel on two cores in a single column. Element-wise multiplication usually ends up being I/O bound due to the low compute intensity. In a practical ML implementation, it is an example of the type of kernel that is likely best fused onto another more compute-dense kernel (e.g., a convolution or GEMM).

Source Files Overview

  1. aie2.py: A Python script that defines the AIE array structural design using MLIR-AIE operations. This generates MLIR that is then compiled using aiecc.py to produce design binaries (i.e., XCLBIN and inst.txt for the NPU in Ryzen™ AI).

  2. add.cc: A C++ implementation of a vectorized vector multiplication operation for AIE cores. The code uses the AIE API, which is a C++ header-only library providing types and operations that get translated into efficient low-level intrinsics, and whose documentation can be found here. The source can be found here.

  3. test.cpp: This C++ code is a testbench for the design example. The code is responsible for loading the compiled XCLBIN file, configuring the AIE module, providing input data, and executing the AIE design on the NPU. After executing, the script verifies the memcpy results and optionally outputs trace data.

Usage

C++ Testbench

To compile the design and C++ testbench:

make

To run the design:

make run

To generate a trace file:

make trace