Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate temporary buffer in binary op conversion #130

Merged
merged 3 commits into from
Jun 28, 2024

Conversation

adam-smnk
Copy link
Collaborator

Improves MLIR pipeline to avoid temporary buffer allocation and copy in linalg named binary operation conversion.

Changes:

  • use tensor.empty in outs
  • add bufferization pre- and post-processing
  • mark outputs as 'restrict'

Improves MLIR pipeline to avoid allocation of temporary buffer
and a copy for simple linalg binary named operations.

Changes:
- use tensor.empty in outs
- add bufferization pre- and post-processing
- marks outputs as 'restrict'
@adam-smnk
Copy link
Collaborator Author

Initial tweaks as I'm still figuring out the whole integration.

Next, we can enable conversion of more ops to work on larger MLIR sources and then further iterate on the pipeline.

@adam-smnk adam-smnk merged commit 71d28d7 into slyalin:mlir Jun 28, 2024
14 of 29 checks passed
Comment on lines +568 to +578
// Named binary ops directly overwrite data in `outs` buffer so, there is no need to provide non-empty
// destination at the tensor-level.
// Use `tensor.empty` to avoid temporary buffer allocation and memcpy after bufferization.
llvm::SmallVector<Value> dynamicSizes;
for (auto [idx, dim] : llvm::enumerate(outType.getShape())) {
if (!mlir::ShapedType::isDynamic(dim))
continue;
auto dimSize = builder.create<tensor::DimOp>(loc, inputs[0], idx);
dynamicSizes.push_back(dimSize);
}
auto empty = builder.create<tensor::EmptyOp>(loc, outType, dynamicSizes);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we still need to ensure that the output tensor is properly sized even if the op itself implies strict shapes alignment for inputs and outputs. As we see here, it involves some additional stuff with dynamics dimensions but semantically iteration space has already been completely defined by shape of the input that doesn't require special decoding of dynamic dimensions (and would be a part of normal tensor lowering with dynamic dimensions). I would like to see more compactness from the named ops at least to avoid writing this dynamic-dimensions code block.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that eltwise named ops' semantics are sufficient to automatically derived the sizes.
However, hiding these details involves tradeoffs in terms of implicit vs explicit information. Overall, MLIR design leans more toward the explicit side as it allows for simpler IR verification, analysis, and pattern matching - same motivation for having named ops vs generic, or explicit broadcast semantics through linalg.broadcast.

In this case, you can view tensor.empty as a temporary buffer so, we need to provide all the necessary information required for potential allocation. Having to infer that dynamically would imply stronger coupling between ops/dialects (tensor.empty would have to understand all potential consumers) and would bring more infrastructure boilerplate. Also, having implicit output is not possible as it goes against linalg's DPS interface.

Overall, we can't (and probably shouldn't) remove this extra dynamic logic from IR.
However, we could provide simpler named ops builders that only take inputs and create correct tensor.empty automatically.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, MLIR design leans more toward the explicit side as it allows for simpler IR verification, ...

It is well understood for a single operation, but if it involves a sub-graph consisting of multiple ops that are doing the same thing again and again along the program it raises questions. Because we already have tensors that carry dynamic dimensions.

However, we could provide simpler named ops builders that only take inputs and create correct tensor.empty automatically.

That would be a good thing that we are missing in the current approach, at least for tensors (not memrefs). Because otherwise it just becomes a boilerplate code in the lowering part instead of having written it once as a part of the dialect/infrastructure. So similar boilerplate infrastructure will appear multiple times in each project that uses Linalg on tensors as the next dialect in its lowering path. If each OV operation is lowered individually it will generate a lot of such sub-expressions that are taking various dimensions dynamically just to tailor shape and pass it to the next op. So you will need to optimize the additionally generated expressions that appear even if they are dealing with the same dynamic dimensions along the whole program.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants