-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate temporary buffer in binary op conversion #130
Eliminate temporary buffer in binary op conversion #130
Conversation
Improves MLIR pipeline to avoid allocation of temporary buffer and a copy for simple linalg binary named operations. Changes: - use tensor.empty in outs - add bufferization pre- and post-processing - marks outputs as 'restrict'
Initial tweaks as I'm still figuring out the whole integration. Next, we can enable conversion of more ops to work on larger MLIR sources and then further iterate on the pipeline. |
src/common/transformations/src/transformations/mlir/convert.cpp
Outdated
Show resolved
Hide resolved
src/common/transformations/src/transformations/mlir/convert.cpp
Outdated
Show resolved
Hide resolved
// Named binary ops directly overwrite data in `outs` buffer so, there is no need to provide non-empty | ||
// destination at the tensor-level. | ||
// Use `tensor.empty` to avoid temporary buffer allocation and memcpy after bufferization. | ||
llvm::SmallVector<Value> dynamicSizes; | ||
for (auto [idx, dim] : llvm::enumerate(outType.getShape())) { | ||
if (!mlir::ShapedType::isDynamic(dim)) | ||
continue; | ||
auto dimSize = builder.create<tensor::DimOp>(loc, inputs[0], idx); | ||
dynamicSizes.push_back(dimSize); | ||
} | ||
auto empty = builder.create<tensor::EmptyOp>(loc, outType, dynamicSizes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we still need to ensure that the output tensor is properly sized even if the op itself implies strict shapes alignment for inputs and outputs. As we see here, it involves some additional stuff with dynamics dimensions but semantically iteration space has already been completely defined by shape of the input that doesn't require special decoding of dynamic dimensions (and would be a part of normal tensor lowering with dynamic dimensions). I would like to see more compactness from the named ops at least to avoid writing this dynamic-dimensions code block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that eltwise named ops' semantics are sufficient to automatically derived the sizes.
However, hiding these details involves tradeoffs in terms of implicit vs explicit information. Overall, MLIR design leans more toward the explicit side as it allows for simpler IR verification, analysis, and pattern matching - same motivation for having named ops vs generic, or explicit broadcast semantics through linalg.broadcast
.
In this case, you can view tensor.empty
as a temporary buffer so, we need to provide all the necessary information required for potential allocation. Having to infer that dynamically would imply stronger coupling between ops/dialects (tensor.empty
would have to understand all potential consumers) and would bring more infrastructure boilerplate. Also, having implicit output is not possible as it goes against linalg's DPS interface.
Overall, we can't (and probably shouldn't) remove this extra dynamic logic from IR.
However, we could provide simpler named ops builders that only take inputs and create correct tensor.empty
automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, MLIR design leans more toward the explicit side as it allows for simpler IR verification, ...
It is well understood for a single operation, but if it involves a sub-graph consisting of multiple ops that are doing the same thing again and again along the program it raises questions. Because we already have tensors that carry dynamic dimensions.
However, we could provide simpler named ops builders that only take inputs and create correct tensor.empty automatically.
That would be a good thing that we are missing in the current approach, at least for tensors (not memrefs). Because otherwise it just becomes a boilerplate code in the lowering part instead of having written it once as a part of the dialect/infrastructure. So similar boilerplate infrastructure will appear multiple times in each project that uses Linalg on tensors as the next dialect in its lowering path. If each OV operation is lowered individually it will generate a lot of such sub-expressions that are taking various dimensions dynamically just to tailor shape and pass it to the next op. So you will need to optimize the additionally generated expressions that appear even if they are dealing with the same dynamic dimensions along the whole program.
Improves MLIR pipeline to avoid temporary buffer allocation and copy in linalg named binary operation conversion.
Changes: