-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvs: dangling fence requests can eat up memory in kvs forever #6112
Comments
hmmm, was doing some brainstorming / review on this and it's trickier than originally thought
it certainly makes me think / believe we should have some timeout mechanism on fence requests. perhaps similar and related to #6124 if we go down that route |
A very "simple" solution to this would have been to support a However, that would require us to iterate through all kvsroots. Per a prior issue #3633, we really would like to avoid iterating through all the kvs roots regularly. Granted doing this on heartbeats vs prep/check is not as bad, but it's still costly. One thought is to have a "fence timeout checker" that wakes up once in awhile that isn't tied to heartbeats, lets say every 5 or 10 minutes. It'll iterate through all kvs roots and do this check then. So overall it's not too costly, with the tradeoff that a |
At this point maybe we should explore removing the fence code if the only user is the alternate pmi kvs implementation that nobody uses. Same with the |
I did ponder this after working on #6581. My initial inclination was A) there is code that uses it B) it is minimally "neat" and potentially useful down the road and C) it's possible someone is using it??? (super low odds of course) But perhaps the odds of its use is low enough, it would simplify a non-trivial amount of code in the KVS and solve some problems. I guess we could remove it and see what the fallout is??? |
Thinking about this a bit more, removing the fence could would clean things up a lot. Minimally, the whole "transaction" internal API exists only b/c of commits vs fences (i.e. a commit is a special fence of just 1 request). A ridiculous amount of code could probably be collapsed. (leading to issue #6570) |
closing, with #6587 removing fence support, this issue is no longer relevant |
while working on #6049 it occurred to me that "dangling" fence requests eat up memory forever. e.g. if only 3 out of 4 fence requests are ever sent, then that transaction and the messages sit in the kvs forever waiting for the 4th fence request to arrive.
in the disconnect callback, we could minimally cleanup the message response so that doesn't eat up memory and we don't send out a response to someone that is not waiting for a response
there is no mechanism to 'timeout' a fence transaction or cancel it. so it sits there forever
The text was updated successfully, but these errors were encountered: