Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust CopyAggregation aggregators to have the option of not freeing buffers on flush #26681

Merged
merged 6 commits into from
Feb 11, 2025

Conversation

mppf
Copy link
Member

@mppf mppf commented Feb 7, 2025

Before this PR, CopyAggregation aggregators always free the buffers on a flush, because that function was written primarily for the case of freeing the aggregator, in which case freeing the buffers immediately avoids the need to do it later.

However, in some application use cases, it's useful to keep aggregators around for longer to avoid overheads of creating and destroying aggregators. It might be necessary to call 'flush' on SrcAggregator so that the appropriate data is loaded, but that aggregator might be used again, so it's not helpful to free the buffers.

This PR adjusts 'flush' to accept an argument to indicate if the buffers should be freed or not. Additionally, it changes the default to not free the buffer, which is a difference from previous behavior that should only impact performance.

This change was suggested by @ronawho.

Reviewed by @stonea - thanks!

  • full comm=none testing
  • full comm=gasnet oversubscribed testing

mppf added 4 commits February 7, 2025 17:05
---
Signed-off-by: Michael Ferguson <[email protected]>
---
Signed-off-by: Michael Ferguson <[email protected]>
@ronawho
Copy link
Contributor

ronawho commented Feb 8, 2025

I think the original use case when the initial API was done was an SPMD region doing something like:

// Collect remote data
var srcAgg = new SrcAgg();
for i in 1..n do srcAgg.copy(...);
srcAgg.flush();

// permute the local data in some way

// Put local data back to remote (maybe it was in transposed order)
var dstAgg = new DstAgg();
for i in 1..n do dstAgg.copy(...);
dstAgg.flush();

So we needed to flush the aggregator before it would naturally fall out of scope, but never used it again. I didn't want to add an artificial scope just to deinit the aggregator and the combined flush+free-remote-buffers eliminates any comm when it finally does go out of scope.

I agree there are use-cases where you want it either way so adding an option makes sense to me. Most of the time aggregators get used as a task-private variable in a forall and without an explicit flush, so I don't think it's a very common idiom to manually flush anyways. Given that I don't think you need to worry too much about which default is better and I would also just default to the current behavior (but if designing from scratch would probably default to false or have a different operation that does the flush+free)

@mppf
Copy link
Member Author

mppf commented Feb 10, 2025

Perhaps "no default" would. be better, given that?

---
Signed-off-by: Michael Ferguson <[email protected]>
Copy link
Contributor

@stonea stonea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't your test case pass when --freeBuffers=false and fail when --freeBuffers=true?

Or was it the case that this behavior always worked (a new buffer would get automatically reallocated when necessary) but maybe wasn't as performant as possible. And now your PR improves performance in the case where the user asserts that reallocation won't be necessary.

@bradcray
Copy link
Member

Note that, unless I'm mistaken, Arkouda still has its own copy of the aggregators, so there's no need to worry about it when changing the code here.

If I were choosing a default, I think I'd also have it default to not freeing the buffers on flush (because nothing about flush suggests to me that the buffers should be freed; and if they are freed and I didn't want them to be, what happens? Does my program stop working until I add the "don't free" request, or does it just allocate new buffers again when needed?

@mppf
Copy link
Member Author

mppf commented Feb 11, 2025

Shouldn't your test case pass when --freeBuffers=false and fail when --freeBuffers=true?

Or was it the case that this behavior always worked (a new buffer would get automatically reallocated when necessary) but maybe wasn't as performant as possible. And now your PR improves performance in the case where the user asserts that reallocation won't be necessary.

Right, the buffers are automatically reallocated as necessary, but it wasn't as performant.

@mppf
Copy link
Member Author

mppf commented Feb 11, 2025

@bradcray @stonea -- what do you think I should do with regard to the default? Options are

  1. Leave the default as "free the buffers" to match historical behavior
  2. Remove the default, so user has to provide it when calling flush
  3. Change the default to not free the buffers.

I had been doing (1) so as not to disrupt Arkouda but Brad pointed out (and I checked) that Arkouda has its own copy of the aggregator code so that's not an issue. Given that, I'm inclined to do (2) for now (in case there is anyone migrating code) and consider relaxing it to (3) at some point in the future. Does that sound OK or do you have some other request?

@bradcray
Copy link
Member

I'd probably go straight to (3) unless @ronawho strenuously objects. I think the fact that the module is unstable permits us to change that default without providing a gentle path since it's not a breaking change.

@stonea
Copy link
Contributor

stonea commented Feb 11, 2025

what do you think I should do with regard to the default? Options are

Hmm not sure. I'll lay out my argument against each choice to try and figure out what the "lesser evil" is.

  • Some folks might dislike the first approach (defaulting to free) because your performance might stink due to unnecessarily reallocating buffers. Whether or not to reallocate the buffer seems like an easy detail for users to overlook.
  • Other folks might dislike the second approach (forcing the user to choose) because its annoying to have your work interrupted to have to make a decision, especially in cases where the decision wouldn't have a significant impact on performance (which I'm guessing is most of the time?)
  • And still others might dislike the third approach (defaulting to not free) because you could potentially hold onto a block of unnecessary memory.

Seems like you've been burned by (or at least encountered) the first scenario so there's some data against doing that. I'm sympathetic to the "don't bug me with unnecessary decisions" argument, so I guess go with the third approach until or unless we have some reason to suspect that to be a concern in the "real world".

@bradcray
Copy link
Member

And still others might dislike the third approach (defaulting to true) because you could potentially hold onto a block of unnecessary memory.

I'm not sure I buy this argument since the aggregator itself is still in scope. That is, when I create an aggregator, I anticipate that what I'm doing is allocating buffers to aggregate my communications and imagine those buffers to generally be 1:1 with the type itself. Since the type doesn't generally support an "allocate buffers now" / "deallocate buffers now" interface (I think?), it seems reasonable to me to assume that, generally, when the type is in scope buffers are allocated.

Unless I'm mistaken and we don't allocate buffers when the type is initially declared, but only lazily as needed?

@mppf
Copy link
Member Author

mppf commented Feb 11, 2025

I'd probably go straight to (3)

so I guess go with the third approach until or unless we have some reason to suspect that to be a concern in the "real world".

I'll adjust the PR to default to not freeing the buffers on flush.

@mppf
Copy link
Member Author

mppf commented Feb 11, 2025

Unless I'm mistaken and we don't allocate buffers when the type is initially declared, but only lazily as needed?

AFAIK it does both:

  • allocate local buffers when declared
  • allocate remote buffers lazily

@mppf mppf merged commit d615b0e into chapel-lang:main Feb 11, 2025
9 checks passed
@mppf mppf deleted the aggregators-flush-nofree branch February 11, 2025 21:15
@bradcray
Copy link
Member

A follow-up question here is whether we should / would benefit from updating the Arkouda snapshot of the copy aggregators. I think with a quick grep this morning I saw explicit calls to flush, but am not sure whether they would benefit from this, or are the kind Elliot mentioned above that would not want it. Of course, ultimately, we'd like to avoid the code clone here, in which case we'll need to update to match and work with this version anyway. Tagging @e-kayrakli in case he's not watching this topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants