Eliminate temporary buffer in binary op conversion #130

adam-smnk · 2024-06-27T14:21:02Z

Improves MLIR pipeline to avoid temporary buffer allocation and copy in linalg named binary operation conversion.

Changes:

use tensor.empty in outs
add bufferization pre- and post-processing
mark outputs as 'restrict'

Improves MLIR pipeline to avoid allocation of temporary buffer and a copy for simple linalg binary named operations. Changes: - use tensor.empty in outs - add bufferization pre- and post-processing - marks outputs as 'restrict'

adam-smnk · 2024-06-27T14:35:55Z

Initial tweaks as I'm still figuring out the whole integration.

Next, we can enable conversion of more ops to work on larger MLIR sources and then further iterate on the pipeline.

src/common/transformations/src/transformations/mlir/convert.cpp

slyalin · 2024-07-01T12:50:04Z

src/common/transformations/src/transformations/mlir/convert.cpp

+        // Named binary ops directly overwrite data in `outs` buffer so, there is no need to provide non-empty
+        // destination at the tensor-level.
+        // Use `tensor.empty` to avoid temporary buffer allocation and memcpy after bufferization.
+        llvm::SmallVector<Value> dynamicSizes;
+        for (auto [idx, dim] : llvm::enumerate(outType.getShape())) {
+            if (!mlir::ShapedType::isDynamic(dim))
+                continue;
+            auto dimSize = builder.create<tensor::DimOp>(loc, inputs[0], idx);
+            dynamicSizes.push_back(dimSize);
+        }
+        auto empty = builder.create<tensor::EmptyOp>(loc, outType, dynamicSizes);


But we still need to ensure that the output tensor is properly sized even if the op itself implies strict shapes alignment for inputs and outputs. As we see here, it involves some additional stuff with dynamics dimensions but semantically iteration space has already been completely defined by shape of the input that doesn't require special decoding of dynamic dimensions (and would be a part of normal tensor lowering with dynamic dimensions). I would like to see more compactness from the named ops at least to avoid writing this dynamic-dimensions code block.

You're right that eltwise named ops' semantics are sufficient to automatically derived the sizes.
However, hiding these details involves tradeoffs in terms of implicit vs explicit information. Overall, MLIR design leans more toward the explicit side as it allows for simpler IR verification, analysis, and pattern matching - same motivation for having named ops vs generic, or explicit broadcast semantics through linalg.broadcast.

In this case, you can view tensor.empty as a temporary buffer so, we need to provide all the necessary information required for potential allocation. Having to infer that dynamically would imply stronger coupling between ops/dialects (tensor.empty would have to understand all potential consumers) and would bring more infrastructure boilerplate. Also, having implicit output is not possible as it goes against linalg's DPS interface.

Overall, we can't (and probably shouldn't) remove this extra dynamic logic from IR.
However, we could provide simpler named ops builders that only take inputs and create correct tensor.empty automatically.

Overall, MLIR design leans more toward the explicit side as it allows for simpler IR verification, ...

It is well understood for a single operation, but if it involves a sub-graph consisting of multiple ops that are doing the same thing again and again along the program it raises questions. Because we already have tensors that carry dynamic dimensions.

However, we could provide simpler named ops builders that only take inputs and create correct tensor.empty automatically.

That would be a good thing that we are missing in the current approach, at least for tensors (not memrefs). Because otherwise it just becomes a boilerplate code in the lowering part instead of having written it once as a part of the dialect/infrastructure. So similar boilerplate infrastructure will appear multiple times in each project that uses Linalg on tensors as the next dialect in its lowering path. If each OV operation is lowered individually it will generate a lot of such sub-expressions that are taking various dimensions dynamically just to tailor shape and pass it to the next op. So you will need to optimize the additionally generated expressions that appear even if they are dealing with the same dynamic dimensions along the whole program.

Eliminate temporary buffer in binary op conversion

9d5fa06

Improves MLIR pipeline to avoid allocation of temporary buffer and a copy for simple linalg binary named operations. Changes: - use tensor.empty in outs - add bufferization pre- and post-processing - marks outputs as 'restrict'

github-actions bot added the category: transformations label Jun 27, 2024

adam-smnk requested review from rengolin and slyalin June 27, 2024 14:36

rengolin approved these changes Jun 27, 2024

View reviewed changes

slyalin reviewed Jun 28, 2024

View reviewed changes

src/common/transformations/src/transformations/mlir/convert.cpp Outdated Show resolved Hide resolved

slyalin reviewed Jun 28, 2024

View reviewed changes

src/common/transformations/src/transformations/mlir/convert.cpp Outdated Show resolved Hide resolved

adam-smnk added 2 commits June 28, 2024 13:31

Address comments

5bd668b

Fix comment

f46e0fc

adam-smnk merged commit 71d28d7 into slyalin:mlir Jun 28, 2024
14 of 29 checks passed

slyalin reviewed Jul 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate temporary buffer in binary op conversion #130

Eliminate temporary buffer in binary op conversion #130

adam-smnk commented Jun 27, 2024

adam-smnk commented Jun 27, 2024

slyalin Jul 1, 2024

adam-smnk Jul 1, 2024

slyalin Jul 1, 2024

Eliminate temporary buffer in binary op conversion #130

Eliminate temporary buffer in binary op conversion #130

Conversation

adam-smnk commented Jun 27, 2024

adam-smnk commented Jun 27, 2024

slyalin Jul 1, 2024

Choose a reason for hiding this comment

adam-smnk Jul 1, 2024

Choose a reason for hiding this comment

slyalin Jul 1, 2024

Choose a reason for hiding this comment