-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust CopyAggregation aggregators to have the option of not freeing buffers on flush #26681
Conversation
--- Signed-off-by: Michael Ferguson <[email protected]>
--- Signed-off-by: Michael Ferguson <[email protected]>
--- Signed-off-by: Michael Ferguson <[email protected]>
--- Signed-off-by: Michael Ferguson <[email protected]>
I think the original use case when the initial API was done was an SPMD region doing something like: // Collect remote data
var srcAgg = new SrcAgg();
for i in 1..n do srcAgg.copy(...);
srcAgg.flush();
// permute the local data in some way
// Put local data back to remote (maybe it was in transposed order)
var dstAgg = new DstAgg();
for i in 1..n do dstAgg.copy(...);
dstAgg.flush(); So we needed to flush the aggregator before it would naturally fall out of scope, but never used it again. I didn't want to add an artificial scope just to deinit the aggregator and the combined flush+free-remote-buffers eliminates any comm when it finally does go out of scope. I agree there are use-cases where you want it either way so adding an option makes sense to me. Most of the time aggregators get used as a task-private variable in a forall and without an explicit flush, so I don't think it's a very common idiom to manually flush anyways. Given that I don't think you need to worry too much about which default is better and I would also just default to the current behavior (but if designing from scratch would probably default to false or have a different operation that does the flush+free) |
Perhaps "no default" would. be better, given that? |
--- Signed-off-by: Michael Ferguson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't your test case pass when --freeBuffers=false
and fail when --freeBuffers=true
?
Or was it the case that this behavior always worked (a new buffer would get automatically reallocated when necessary) but maybe wasn't as performant as possible. And now your PR improves performance in the case where the user asserts that reallocation won't be necessary.
Note that, unless I'm mistaken, Arkouda still has its own copy of the aggregators, so there's no need to worry about it when changing the code here. If I were choosing a default, I think I'd also have it default to not freeing the buffers on flush (because nothing about flush suggests to me that the buffers should be freed; and if they are freed and I didn't want them to be, what happens? Does my program stop working until I add the "don't free" request, or does it just allocate new buffers again when needed? |
Right, the buffers are automatically reallocated as necessary, but it wasn't as performant. |
@bradcray @stonea -- what do you think I should do with regard to the default? Options are
I had been doing (1) so as not to disrupt Arkouda but Brad pointed out (and I checked) that Arkouda has its own copy of the aggregator code so that's not an issue. Given that, I'm inclined to do (2) for now (in case there is anyone migrating code) and consider relaxing it to (3) at some point in the future. Does that sound OK or do you have some other request? |
I'd probably go straight to (3) unless @ronawho strenuously objects. I think the fact that the module is unstable permits us to change that default without providing a gentle path since it's not a breaking change. |
Hmm not sure. I'll lay out my argument against each choice to try and figure out what the "lesser evil" is.
Seems like you've been burned by (or at least encountered) the first scenario so there's some data against doing that. I'm sympathetic to the "don't bug me with unnecessary decisions" argument, so I guess go with the third approach until or unless we have some reason to suspect that to be a concern in the "real world". |
I'm not sure I buy this argument since the aggregator itself is still in scope. That is, when I create an aggregator, I anticipate that what I'm doing is allocating buffers to aggregate my communications and imagine those buffers to generally be 1:1 with the type itself. Since the type doesn't generally support an "allocate buffers now" / "deallocate buffers now" interface (I think?), it seems reasonable to me to assume that, generally, when the type is in scope buffers are allocated. Unless I'm mistaken and we don't allocate buffers when the type is initially declared, but only lazily as needed? |
I'll adjust the PR to default to not freeing the buffers on |
AFAIK it does both:
|
--- Signed-off-by: Michael Ferguson <[email protected]>
A follow-up question here is whether we should / would benefit from updating the Arkouda snapshot of the copy aggregators. I think with a quick grep this morning I saw explicit calls to flush, but am not sure whether they would benefit from this, or are the kind Elliot mentioned above that would not want it. Of course, ultimately, we'd like to avoid the code clone here, in which case we'll need to update to match and work with this version anyway. Tagging @e-kayrakli in case he's not watching this topic. |
Before this PR, CopyAggregation aggregators always free the buffers on a
flush
, because that function was written primarily for the case of freeing the aggregator, in which case freeing the buffers immediately avoids the need to do it later.However, in some application use cases, it's useful to keep aggregators around for longer to avoid overheads of creating and destroying aggregators. It might be necessary to call 'flush' on SrcAggregator so that the appropriate data is loaded, but that aggregator might be used again, so it's not helpful to free the buffers.
This PR adjusts 'flush' to accept an argument to indicate if the buffers should be freed or not. Additionally, it changes the default to not free the buffer, which is a difference from previous behavior that should only impact performance.
This change was suggested by @ronawho.
Reviewed by @stonea - thanks!