You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The premise of calling GA_Fence_init() and later ending the fence with GA_Fence() is by design not thread safe since state is stored globally between these functions.
Do we consider an API change where we return a handle?
The text was updated successfully, but these errors were encountered:
Another possibility is to assume/require that the main thread calls GA_Init_fence and GA_Fence, during which it will allocate and free global state. You can make the intervening put/get/acc (PGA) calls thread-safe by having them atomically update the global state. If you go with O(nproc) state, the PGA calls just increment a counter associated with each target. You can also do O(ntarget) state with a linked list or similar and use slightly more expensive atomic operations to update the list. GA_Fence just walks the list and calls ARMCI_Fence for all the targets.
The other option, which is what I did in my thread-safe branch, is to make GA_Init_fence a no-op and call ARMCI_AllFence in GA_Fence, which is thread-safe as long as ARMCI is. This is more expensive than necessary if ARMCI_AllFence is O(nproc), but in the cases where ARMCI already tracks the active target list, then it is basically equivalent. In the case where ARMCI_AllFence is essentially O(1) because all outstanding remote ops are tracked via a single counter, then the lazy approach is in fact optimal. We assume networks will implement the latter optimization, and that it can be exploited in e.g. MPI_Win_flush_all. I think dmapp_gsync is an example of this.
In any case, I don't care that much what happens here, because I don't believe NWChem uses GA_Fence.
@jeffhammond. Good point. This requires a bit more discussion -- hence it would be safe to assume that this would not be thread-safe, till we get to understand the implications.
It might be prudent to start by adding a "sparse" fence to ARMCI, meaning an ARMCI fence routine that takes a list or array of targets to fence. Once you know what the most efficient implementation is inside of ARMCI, just map GA to that.
The premise of calling GA_Fence_init() and later ending the fence with GA_Fence() is by design not thread safe since state is stored globally between these functions.
Do we consider an API change where we return a handle?
The text was updated successfully, but these errors were encountered: