-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Update the response queue in the server to reuse response slots #7879
base: main
Are you sure you want to change the base?
Conversation
src/grpc/infer_handler.h
Outdated
// Gets the response at the specified index | ||
ResponseType* GetResponseAt(const uint32_t index) | ||
{ | ||
std::lock_guard<std::mutex> lock(mtx_); | ||
|
||
// Check if the index is valid for allocated responses | ||
if (index >= alloc_count_) { | ||
LOG_ERROR << "[INTERNAL] Attempting to access response which is not yet " | ||
"allocated"; | ||
return nullptr; | ||
} | ||
return responses_[index]; | ||
if (index < pop_count_) { | ||
LOG_ERROR << "[INTERNAL] Attempting to access a response that has " | ||
"already been removed from the queue."; | ||
return nullptr; | ||
} | ||
|
||
// Adjust index based on number of popped responses to get actual index in | ||
// 'responses_' | ||
return responses_[index - pop_count_]; | ||
} | ||
|
||
// Pops the response from the tail of the queue | ||
// Removes the current response from the front of the queue | ||
void PopResponse() | ||
{ | ||
std::lock_guard<std::mutex> lock(mtx_); | ||
current_index_++; | ||
|
||
// Ensure there are responses in the queue to pop | ||
if (responses_.empty()) { | ||
LOG_ERROR << "[INTERNAL] No responses in the queue to pop."; | ||
return; | ||
} | ||
|
||
// Clear and move the current response to the reusable pool | ||
auto response = responses_.front(); | ||
response->Clear(); | ||
reusable_pool_.push_back(response); | ||
responses_.pop_front(); | ||
pop_count_++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking my understanding:
Previously, we just increment the current_index_
/pop_count_
when a response is popped from the responses_
queue, while the index
always refer to the actual location on the responses_
queue. Now, a popped response is removed from the responses_
queue, changing the index of the remaining elements, so an "offset"/pop_count_
is necessary to correctly access the elements, given the callers are unaware of the change when providing the index
.
I assume the response is fully written to gRPC and there will be no more access to the response object by the time PopResponse()
is called? In other words, the response object can be safely reused/modified after it is popped? Given the previous implementation keeps the popped response objects intact, until the ~ResponseQueue()
is destructed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @kthui. Once we call PopResponse(), the response object will no longer be needed.
cc: @tanmayv25
a9ab1e8
to
32490e1
Compare
1948b34
to
596925a
Compare
The PR was automatically closed on forced rebase with the main, reopened with a new commit. |
What does the PR do?
The current response queue allocates memory for each response.
This PR aims to enhance the response queue by reusing response slots across multiple responses within the same request once they have been written (completed) to the network. This may help reduce active memory utilization.
PopResponse()
function, we clear the response content and return it to the reusable pool.AllocateResponse()
function, we check for any available responses in the reusable pool; if present, we use it, otherwise, we allocate a new response.--grpc-max-response-pool-size
option) to limit the number of active response protobuf allocations in the gRPC response queue.Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
Where should the reviewer start?
Test plan:
Caveats:
Background
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)