Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New lock allocator for bf16 element wise fusion #573

Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 49 additions & 4 deletions mlir/lib/Conversion/AIRToAIEPass.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2575,6 +2575,24 @@ class AIRToAIEPass : public air::impl::AIRToAIEBase<AIRToAIEPass> {
isAIE2 ? AIE::LockAction::AcquireGreaterEqual
: AIE::LockAction::Acquire,
lockAqValue);

// check if a buffer is both read from and written to by dma bds. This
// pattern indicates RAW dependency that the locks need to respect.
auto getReadInRAW = [](air::MemcpyInterface memcpyOpIf, Value alloc) {
air::ChannelPutOp readerInRAW = nullptr;
if (!isa<AIE::BufferOp>(alloc.getDefiningOp()))
return readerInRAW;
if (!isa<air::ChannelGetOp>(memcpyOpIf))
return readerInRAW;
for (auto user : alloc.getUsers()) {
if (user->getParentRegion() != memcpyOpIf->getParentRegion())
continue;
if (auto put = dyn_cast<air::ChannelPutOp>(user))
readerInRAW = put;
}
return readerInRAW;
};

// try to find a place to put the unlock. If there are deallocs,
// replace them with unlock. Otherwise, put them at the end.
bool need_unlock = true;
Expand All @@ -2588,11 +2606,38 @@ class AIRToAIEPass : public air::impl::AIRToAIEBase<AIRToAIEPass> {
need_unlock = false;
}
}
if (need_unlock) {
auto t = memcpyOpIf->getBlock()->getTerminator();
// merge adjacent lock uses, if they have the same lock and action.
auto mergePrevLockUse = [&](AIE::UseLockOp prevLockUse,
int64_t &lockRelValue) {
if (!prevLockUse)
return;
if (prevLockUse.getLockOp() != relLockOp)
return;
if (prevLockUse.getAction() != AIE::LockAction::Release)
return;
lockRelValue++;
prevLockUse->erase();
return;
};

auto readInRAW = getReadInRAW(memcpyOpIf, alloc);
auto t = memcpyOpIf->getBlock()->getTerminator();
if (need_unlock && !readInRAW) {
builder.setInsertionPoint(t);
builder.create<AIE::UseLockOp>(t->getLoc(), relLockOp,
AIE::LockAction::Release, lockRelValue);
} else if (need_unlock && readInRAW) {
if (t->getPrevNode() && isa<AIE::UseLockOp>(t->getPrevNode())) {
auto prevLockUse = dyn_cast<AIE::UseLockOp>(t->getPrevNode());
mergePrevLockUse(prevLockUse, lockRelValue);
}
auto br = dyn_cast<cf::BranchOp>(readInRAW->getBlock()->getTerminator());
if (br)
builder.setInsertionPointToStart(br.getDest());
else
builder.setInsertionPointToStart(readInRAW->getBlock());
builder.create<AIE::UseLockOp>(builder.getUnknownLoc(), relLockOp,
AIE::LockAction::Release, lockRelValue);
Comment on lines +2629 to +2640
Copy link
Collaborator

@fifield fifield May 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit convoluted with all the conditionals and the calls to getReadInRAW and mergePrevLockUse. Do you have an example of the IR you're trying to match and replace here? Does the merge work for AIE1 locks? Or if the get and put are in the same block?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a prototype of a new lock allocator. The goal here is to have the elementwise fusion example (the new board test 12) work on npu, where the challenge is that the same accumulator buffer has (1) dada copied in,(2) compute on and (3) data copied out.

Although the new board test seems to pass with the pr, it seems to be too complicated and not compatible with a number of existing tests, or AIE1. So I think I'll close the PR and find another solution.

}
allocs_to_remap.insert(alloc.getDefiningOp());
}
Expand Down Expand Up @@ -2782,7 +2827,7 @@ class AIRToAIEPass : public air::impl::AIRToAIEBase<AIRToAIEPass> {
// lock_allocation_list lock_allocs;
std::unordered_set<Operation *> allocs_to_remap;

for (auto &alloc : tileDmaAlloc.mm2s_allocs) {
for (auto &alloc : tileDmaAlloc.s2mm_allocs) {
if (alloc.foundAlloc(x, y)) {
for (auto o : alloc.memcpyOps) {
assert(o);
Expand All @@ -2795,7 +2840,7 @@ class AIRToAIEPass : public air::impl::AIRToAIEBase<AIRToAIEPass> {
}
}
}
for (auto &alloc : tileDmaAlloc.s2mm_allocs) {
for (auto &alloc : tileDmaAlloc.mm2s_allocs) {
if (alloc.foundAlloc(x, y)) {
for (auto o : alloc.memcpyOps) {
assert(o);
Expand Down