Minimal implementation for graph compiler with torch_mlir frontend to compile AI workloads to the NVIDIA GPU. The soul purpose of the project to understand the graph compilers and MLIR-framework.
More optimization will be focused in future and looking for colabs for the project and learning process
TinyFusion dialect is the part of TinyCompiler which can supports operator fusion for the TOSA instruction architecture. The fusion dialect works based on the approach discussed in the TVM white-paper. TinyFusion can reduce the memory footprints in the over-all computation by avoiding the intermediate memory allocations.
Refer Getting-Start of MLIR to build and install MLIR & LLVM on the machine
TinyCompiler CMake Instructions:
$ export MLIR_DIR=~/llvm-project/build/lib/cmake/mlir
$ mkdir build && cd build
$ cmake ..
$ make -j32
The compiler can be tested as follows,
$ ./tools/TinyCompiler-Opt --cpu-compile ../../Test/Conv2dRelu.mlir
The pass-pipeline supports affine transformation for limited tinyfusion operators and lowers most operators to tinyfusion, arith and tensor dialects.
- Affine lowering for every TinyFusion operator
TinyFlow.dispatch()
to support parallelism and scalability