-
Notifications
You must be signed in to change notification settings - Fork 128
Composable Kernel Overview
Chao Liu edited this page Oct 25, 2021
·
1 revision
- Enable composition of complex operators from basic ones, without overhead of tensor re-formatting
- GEMM --> Implicit GEMM, Hybrid direct/implicit GEMM
- Reduction --> Pooling, Batch-norm, etc
- Data-Transfer --> Im2Col, Depth2Space, etc
- Automatically generated optimized logic for address calculation associated with coordinate transformation without developer's intervention.
- Grid/Block/wave/thread-level tensor operators implemented as C++ templated device functions/classes
- GEMM-like operators
- Reduction-like operators
- Data-transfer-like operators
- Prebuilt fused operators include [Work in progress]
- GEMM/Conv + pointwise Op
- Conv + Pooling
- GEMM/Conv + reduction-like operator