-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Proper lowering of constant alloca operations #1060
Comments
Very nice write up, thanks for considering diferent options.
Neat!
The option (3) looks more appealing to me since (1) has the weird This is a good use of a CIR abstraction that allow us to give extra information to LLVM, cool to see work in this direction! On a tagencial subject, should this be done in -O1? Or do we want to emit this even in -O0? |
This is a good point. Actually the metadata and intrinsics involved in approach 3 is still experimental, while the
Since this is technically an optimization, I think we should emit these stuff under |
Great, happy to see PRs in either direction. Should we add a
One more idea: ask on LLVM discourse if folks know why this happens? |
Sounds good to me!
Great idea. Opened a thread here: https://discourse.llvm.org/t/why-llvm-invariant-end-intrinsic-would-prevent-optimizations/82984 |
This is related to #866 .
Since #892 , ClangIR lowers constant local variables in C/C++ to
cir.alloca
operations with aconst
flag. The presence of theconst
flag implies:cir.store
operation, andcir.load
operation that loads thecir.alloca
result must produce the value stored by thecir.store
operation.An obvious optimization here is that we could eliminate all the loads and replace the loaded values with the stored initial value. LLVM already implements similar optimizations, but we need to tweak the generated LLVM IR to teach LLVM to apply those optimizations. I'm proposing several approaches here that could lead to such optimizations in LLVM, and hope we could choose one that best fits our needs.
Approach 1: use the
llvm.invariant.start
intrinsicThe first approach would be using the
llvm.invariant.start
and thellvm.invariant.end
intrinsic. This pair of intrinsics tell the optimizer that a specified memory location will never change within the region bounded by the intrinsics. With this approach, the following CIR:would generate the following LLVM IR:
Theoretically, the optimizer would be able to at least fold
%3
into%2
. Ironically, it seems that the optimizer refuses to optimize if thellvm.invariant.end
intrinsic call is present, see https://godbolt.org/z/5dMv7T77e. To bypass this limitation, simply remove the call to thellvm.invariant.end
intrinsic, and the optimizer works as expected.Approach 2: use the
!invariant.load
metadataA load instruction could have an
!invariant.load
metadata attached. The LLVM language reference says:With this approach, the CIR snippet listed earlier would emit the following LLVM IR:
The optimizer could then fold both load instructions to just
%init
, see https://godbolt.org/z/Exnh85zhx.It's worth mentioning here that the
!invariant.load
metadata is already supported by the MLIR LLVMIR dialect.Approach 3: use the
!invariant.group
metadataA load instruction or a store instruction could have an
!invariant.group
metadata attached. Unlike!invariant.load
, the!invariant.group
only requires that every value loaded or stored by such instructions must be the same if those instructions load or store to the same pointer. With this approach, the CIR snippet listed earlier would emit the following LLVM IR:The optimizer could then fold both load instructions to just
%init
, see https://godbolt.org/z/8MsxcoqTY.Constant local variables in inner scopes
Let's consider a slightly more complex example:
Upon each iteration, the local variable
item
would reuse the same memory location. But ideally we would like to still teach LLVM thatitem
is constant during a single iteration. The second approach is infeasible since the value in the memory location changes between iterations. Thus only the first approach and the third approach is suitable for such a case.The first approach would emit code like this:
The third approach would emit code like this:
The call to the
llvm.launder.invariant.group
intrinsic makes sure that each iteration creates a "distinct invariant group". Without this intrinsic call, the optimizer could assume that the load and store instructions would load and store the same value across all iterations of the loop.So what do you think about these 3 lowering approaches? Or do you know any other approaches that this proposal does not mention?
The text was updated successfully, but these errors were encountered: