Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix server stuck issue when test timeout exception #61
Fix server stuck issue when test timeout exception #61
Changes from all commits
28789e1
efa7d15
4c155b5
6522137
d4a89c0
6c2ce81
726b7ca
1aa7f02
0ed57e4
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we removing the calls to delete these keys? Won't that result in a memory leak?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They will be deleted by
CompleteActivityTask
. Additionally, if theGetWorkItems
strema is closed while the operation is in progress, the cleanup logic added toGetWorkItems
will delete any "leftover" item in this mapThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason for moving the delete operation from here to other places? If I understand corretly, the delete then needs to be added in
CompleteActivityTask
, in the defer block inGetWorkItems
and potentiallyShutdown
as well. Instead if we just keep it here, it would handle all scenarios. Also looks more intuitive to me(We would still need to close thecomplete
channel in getWorkItems andShutdown
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I wrote this a week ago and now I cannot remember why, even after reading the code :( It may work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then this may help to avoid closing a closed channel, as we delete it in time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's impossible currently for
close()
to be called on the same channel twiceThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I mean currently implementation will help avoid closing a closed channel, but with
defer executor.pendingActivities.Delete(key)
, we can potentially close a closed channel if the stream is closed and then the shutdown is called.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these changes to the
Shutdown
routine related to the original issue or is this unrelated?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my first attempt at fixing the issue. It turned out it didn't fix it, but it seemed like something that would be helpful to have nevertheless, to ensure we don't leave goroutines waiting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not delete the items as well from pendingActivities/pendingOrchestrators as well in case of shutdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that shutdown is called before the object is abandoned. The memory is released by the GC regardless. The important thing is to close the channels in case there are goroutines waiting on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we check if the
complete
channel is already closed as it's closed at multiple places.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the channel is in the map, it's not been closed. Before closing the channel, the calls use
LoadAndDelete
which means that the channel is "removed from the map" beforeclose
is called on itThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use of
pendingActivities
orpendingOrchestrators
here? Since we just need to delete all pending keys that are there ing.pendingActivities
org.pendingOrchestrators
, can we not simply iterate in the map and delete? This would avoid sending the items to pendingChannel as well for every item being processed..There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we do not delete all keys in g.pendingActivities.
There could be multiple connections to GetWorkItems, so each connection needs its own map to track what pending items are related to that connection.
Using a local variable, rather than depending on the global g.pendingActivities/g.pendingOrchestrators, means we can avoid using locks (because of how gRPC works, there's no issue with concurrent access there) and simplify the cleanup process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting. We support multiple connections to GetWorkItems? I didn't know and was actually going to open an issue for it!
A question though: In case we have say 2 connections to the server, when the workItem gets added to
g.workItemQueue
, how does it properly gets allocated to the correct stream? It seems like two concurrent streams trying to read a common channel.. won't that have any issue?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I had assumed it was supported to have multiple GetWOrkItems streams. But this patch doesn't prevent it.
With multiple GetWorkItems, there's multiple goroutines blocked here:
https://github.com/microsoft/durabletask-go/pull/61/files#diff-fe6ca2dcacabca215bef6921c53b3c197d9c9ea4d1febcf15392a1e441dddc07R166
In this case, one of those listening, at "random", will get the message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. In that case, it's required to maintain the stream local map. Thanks for explaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a debug log here as well
work item stream closed
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add
g.logger.Errorf("encountered an error while sending work item: %v", err)
, I think error here does not always mean stream close right.