- Python Bindings
- Python Helper Functions
- Helpful AI Engine Architecture References and Tables
- AI Engine documentation
Function Signature | Definition | Parameters | Return Type | Example |
---|---|---|---|---|
tile(column, row) |
Declare AI Engine tile | column : column index number row : row index number |
<tile> |
ComputeTile = tile(1,3) |
external_func(name, inputs, output) |
Declare external kernel function that will run on AIE Cores | name : external function name input : list of input types output : list of output types |
<external_func> |
scale_scalar = external_func("vector_scalar_mul_aie_scalar", inputs=[tensor_ty, tensor_ty, np.ndarray[(1,), np.dtype[np.int32]]]) |
npu_dma_memcpy_nd(metadata, bd_id, mem, sizes) |
configure n-dimensional DMA accessing external memory | metadata : ObjectFifo python object or string with name of object_fifo bd_id : Identifier numbermem : memory for transfersizes : 4-D transfer size in 4B granularity |
None |
npu_dma_memcpy_nd(metadata="out", bd_id=0, mem=C, sizes=[1, 1, 1, N]) |
dma_wait(object_fifo, ...) |
configure host-ShimDMA synchronization for accessing external memory | metadata : Identifies the ObjectFifo (by Python object or name string) whose half-DMA completion we are waiting on. This is a variable argument function that can accept one or more metadatas at once, to be waited on in order given, |
None |
dma_wait(of_out) |
npu_sync(column, row, direction, channel, column_num=1, row_num=1) |
alternative method to configure host-ShimDMA synchronization for accessing external memory | column and row : Specify the tile location for initiating the synchronization. direction : Indicates the DMA direction (0 for write to host, 1 for read from host). channel : Identifies the DMA channel (0 or 1) for the synchronization token column_num and row_num (optional): Define the range of tiles to wait for synchronization |
None |
npu_sync(column=0, row=0, direction=0, channel=1) |
Object FIFO | ||||
object_fifo(name, producerTile, consumerTiles, depth, datatype) |
Construct Object FIFO | name : Object FIFO name producerTile : producer tile object ConsumerTiles : list of consumer tile objects depth : number of object in Object FIFO datatype : type of the objects in the Object FIFO |
<object_fifo> |
of0 = object_fifo("objfifo0", A, B, 3, np.ndarray[(256,), np.dtype[np.int32]]) |
<object_fifo>.acquire(port, num_elem) |
Acquire from Object FIFO | port : ObjectFifoPort.Produce or ObjectFifoPort.Consume num_elem : number of objects to acquire |
<objects> |
elem0 = of0.acquire(ObjectFifoPort.Produce, 1) |
object_fifo.release(port, num_elem) |
Release from Object FIFO | port : ObjectFifoPort.Produce or ObjectFifoPort.Consume num_elem : |
None |
of0.release(ObjectFifoPort.Consume, 2) |
object_fifo_link(fifoIns, fifoOuts) |
Create a link between Object FIFOs | fifoIns : list of Object FIFOs (variables or names)fifoOuts : list of Object FIFOs (variables or names) |
None |
object_fifo_link(of0, of1) |
Routing Bindings (relevant for trace and low-level design) | ||||
flow(source, source_bundle, source_channel, dest, dest_bundle, dest_channel) |
Create a circuit switched flow between src and dest | source : source tile of the flow source_bundle : type of source WireBundle (see full list in AIEAttrs.td) source_channel : source channel index dest : destination tile of the flow dest_bundle : type of destination WireBundle (see full list in AIEAttrs.td) dest_channel : destination channel index |
None |
flow(ComputeTile, WireBundle.DMA, 0, ShimTile, WireBundle.DMA, 1) |
packetflow(pkt_id, source, source_port, source_channel, dest, dest_port, dest_channel, keep_pkt_header) |
Create a packet switched flow between src and dest | pkt_id : unique packet ID source : source tile of the packet flow source_port : type of source WireBundle (see full list in AIEAttrs.td) source_channel : source channel index dest : destination tile of the packet flow dest_port : type of destination WireBundle (see full list in AIEAttrs.td) dest_channel : destination channel index keep_pkt_header : boolean flag to keep header |
None |
packetflow(1, ComputeTile2, WireBundle.Trace, 0, ShimTile, WireBundle.DMA, 1, keep_pkt_header=True) |
NOTE:
tile
: The actual tile coordinates run on the device may deviate from the ones declared here. In Ryzen AI, for example, these coordinates tend to be relative coordinates as the runtime scheduler may assign it to a different available column.
NOTE:
object_fifo
: TheproducerTile
andconsumerTiles
inputs are AI Engine tiles. TheconsumerTiles
may also be specified as an array of tiles for multiple consumers.
NOTE:
<object_fifo>.{acquire,release}
: The output may be either a single object or an array of objects which can then be indexed in an array-like fashion.
NOTE:
object_fifo_link
The tile that is used as the shared tile in the link must currently be a Mem tile. The inputsfifoIns
andfifoOuts
may be either a single Object FIFO or a list of them. Both can be specified either using their python variables or their names. Currently, if one of the two inputs is a list of ObjectFIFOs then the other can only be a single Object FIFO.
Function Signature | Description |
---|---|
print(ctx.module) |
Converts our ctx wrapped structural code to mlir and prints to stdout |
ctx.module.operation.verify() |
Runs additional structural verficiation on the python binded source code and return result to stdout |
Function Signature | Definition | Parameters | Return Type | Example |
---|---|---|---|---|
aie::vector<T, vec_factor> my_vector |
Declare vector type | T : data type vec_factor : vector width |
n/a | aie::vector<int16_t, 32> my_vector; |
aie::load_v<vec_factor>(pA1); |
Vector load | vec_factor : vector width |
aie::vector |
aie::vector<int16_t, 32> my_vector; |
-
AIE2 - Table of supported data types and vector sizes (AIE API)
-
Some useful Tile core Trace Events
Some common events event ID dec value True 0x01 1 Stream stalls 0x18 24 Core Instruction - Event 0 0x21 33 Core Instruction - Event 1 0x22 34 Vector Instructions (e.g. VMAC, VADD, VCMP) 0x25 37 Lock acquire requests 0x2C 44 Lock release requests 0x2D 45 Lock stall 0x1A 26 Core Port Running 1 0x4F 79 Core Port Running 0 0x4B 75 - A more exhaustive list of events for core tile, core memory, memtile and shim tile can be found in this header file