Skip to content

Composable Kernel Overview

Chao Liu edited this page Oct 25, 2021 · 1 revision

Composable Kernel Features

Hardware Agnostic “Tensor Coordinate Transformation” Primitives

  • Enable composition of complex operators from basic ones, without overhead of tensor re-formatting
    • GEMM --> Implicit GEMM, Hybrid direct/implicit GEMM
    • Reduction --> Pooling, Batch-norm, etc
    • Data-Transfer --> Im2Col, Depth2Space, etc
  • Automatically generated optimized logic for address calculation associated with coordinate transformation without developer's intervention.

Reusable Tensor Operators for AMD GPUs

  • Grid/Block/wave/thread-level tensor operators implemented as C++ templated device functions/classes
    • GEMM-like operators
    • Reduction-like operators
    • Data-transfer-like operators

Prebuilt and Customized Operator Fusion

  • Prebuilt fused operators include [Work in progress]
    • GEMM/Conv + pointwise Op
    • Conv + Pooling
    • GEMM/Conv + reduction-like operator

Unified Implementation of Tensor Operators

image

image