Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocr/aie: AIE Queue Processing #251

Draft
wants to merge 6 commits into
base: amd-staging
Choose a base branch
from

Conversation

eddierichter-amd
Copy link

This PR add's functionality to submit commands to the NPU xdna driver via a soft queue.

eddierichter-amd and others added 6 commits October 22, 2024 17:36
#19)

* Using BOs of type BO_SHMEM instead of BO_CMD for kernarg memory region

* Changing test to use kernarg memory region for operand allocation
…er of packets the queue supports. (#23)

* Fixed workarounds in the AIE soft queue regarding the size of the command chain and the individual commands.

* Added the functionality to aie_hsa_dispatch_test.cc to query the size of the AIE queue and issue the maximum number of packets it supports.

* Some additional small fixes on aie_hsa_dispatch_test.cc test.

* Adding links to driver source
* Freeing the commands and the command chain created during dispatch

* Unmaping and closing cmd BOs as well as freeing the queue ring buffer
@@ -316,6 +317,16 @@ hsa_status_t XdnaDriver::InitDeviceHeap() {
return HSA_STATUS_SUCCESS;
}

hsa_status_t XdnaDriver::GetHandleMappings(std::unordered_map<uint32_t, void*> &vmem_handle_mappings) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could return a const std::unordered_map<uint32_t, void*>& and avoid the copy here. Do we ever expect the function to fail?

@@ -117,7 +117,8 @@ hsa_status_t XdnaDriver::GetAgentProperties(core::Agent &agent) const {
return HSA_STATUS_ERROR;
}

aie_agent.SetNumCols(aie_metadata.cols);
// Right now can only target N-1 columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need some additional context here on why we can only target N-1 columns (e.g., current hardware limitations, something else).

return HSA_STATUS_SUCCESS;
}

hsa_status_t XdnaDriver::GetFd(int &fd) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we can just return the fd.

@@ -100,7 +128,7 @@ class AieAqlQueue : public core::Queue,
void *value) override;

// AIE-specific API
AieAgent &GetAgent() { return agent_; }
AieAgent &GetAgent() const { return agent_; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're giving non-const access to agent_ then this function should be non-const.

constexpr int NON_OPERAND_COUNT = 6;

// Used to transform an address into a device address
constexpr int DEV_ADDR_BASE = 0x04000000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a fixed-width integer types for these?

cmd_pkt_payload->data[operand_starting_index + 2 * operand_iter]);
// clang-format off
cmd_pkt_payload->data[operand_starting_index + 2 * operand_iter + 1] =
(uint64_t)vmem_handle_mappings[cmd_pkt_payload->data[operand_starting_index + 2 * operand_iter]] >> 32 & 0xFFFFFFFF;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected that if an entry does not exist in vmem_handle_mappings, it should be replaced with a nullptr? unordered_map::operator[] will default construct a value if the key-value does not exist.

If not, then vmem_handle_mappings should be taked by const& and use find() instead of []

cmd_pkt_payload->data[operand_starting_index + 2 * operand_iter]);
// clang-format off
cmd_pkt_payload->data[operand_starting_index + 2 * operand_iter + 1] =
(uint64_t)vmem_handle_mappings[cmd_pkt_payload->data[operand_starting_index + 2 * operand_iter]] >> 32 & 0xFFFFFFFF;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(uint64_t)vmem_handle_mappings[cmd_pkt_payload->data[operand_starting_index + 2 * operand_iter]] >> 32 & 0xFFFFFFFF;
static_cast<uint64_t>(vmem_handle_mappings[cmd_pkt_payload->data[operand_starting_index + 2 * operand_iter]]) >> 32 & 0xFFFFFFFF;

// Transform the instruction sequence address into device address
cmd_pkt_payload->data[CMD_PKT_PAYLOAD_INSTRUCTION_SEQUENCE_IDX] =
DEV_ADDR_BASE |
(reinterpret_cast<uint64_t>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(reinterpret_cast<uint64_t>(
(static_cast<uint64_t>(

if (ioctl(fd, DRM_IOCTL_AMDXDNA_GET_BO_INFO, &cmd_bo_get_bo_info))
return HSA_STATUS_ERROR;

*cmd = static_cast<amdxdna_cmd *>(mmap(nullptr, create_cmd_bo.size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should check for MAP_FAILED here.

switch (pkt->opcode) {
case HSA_AMD_AIE_ERT_START_CU: {
std::vector<uint32_t> bo_args;
std::vector<uint32_t> cmd_handles;
Copy link
Contributor

@ypapadop-amd ypapadop-amd Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we know the sizes for those, you can call reserve on each to avoid resizes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants