review GA_Fence() et al for thread safety #27

jeffdaily · 2017-04-07T22:22:49Z

The premise of calling GA_Fence_init() and later ending the fence with GA_Fence() is by design not thread safe since state is stored globally between these functions.

Do we consider an API change where we return a handle?

jeffhammond · 2017-04-07T22:44:35Z

Another possibility is to assume/require that the main thread calls GA_Init_fence and GA_Fence, during which it will allocate and free global state. You can make the intervening put/get/acc (PGA) calls thread-safe by having them atomically update the global state. If you go with O(nproc) state, the PGA calls just increment a counter associated with each target. You can also do O(ntarget) state with a linked list or similar and use slightly more expensive atomic operations to update the list. GA_Fence just walks the list and calls ARMCI_Fence for all the targets.

The other option, which is what I did in my thread-safe branch, is to make GA_Init_fence a no-op and call ARMCI_AllFence in GA_Fence, which is thread-safe as long as ARMCI is. This is more expensive than necessary if ARMCI_AllFence is O(nproc), but in the cases where ARMCI already tracks the active target list, then it is basically equivalent. In the case where ARMCI_AllFence is essentially O(1) because all outstanding remote ops are tracked via a single counter, then the lazy approach is in fact optimal. We assume networks will implement the latter optimization, and that it can be exploited in e.g. MPI_Win_flush_all. I think dmapp_gsync is an example of this.

In any case, I don't care that much what happens here, because I don't believe NWChem uses GA_Fence.

abhinavvishnu · 2017-04-08T00:12:34Z

@jeffhammond. Good point. This requires a bit more discussion -- hence it would be safe to assume that this would not be thread-safe, till we get to understand the implications.

jeffhammond · 2017-04-08T16:31:49Z

It might be prudent to start by adding a "sparse" fence to ARMCI, meaning an ARMCI fence routine that takes a list or array of targets to fence. Once you know what the most efficient implementation is inside of ARMCI, just map GA to that.

jeffdaily added the thread-safety label Apr 7, 2017

jeffdaily assigned jeffdaily, abhinavvishnu and bjpalmer Apr 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

review GA_Fence() et al for thread safety #27

review GA_Fence() et al for thread safety #27

jeffdaily commented Apr 7, 2017

jeffhammond commented Apr 7, 2017 •

edited

Loading

abhinavvishnu commented Apr 8, 2017

jeffhammond commented Apr 8, 2017

review GA_Fence() et al for thread safety #27

review GA_Fence() et al for thread safety #27

Comments

jeffdaily commented Apr 7, 2017

jeffhammond commented Apr 7, 2017 • edited Loading

abhinavvishnu commented Apr 8, 2017

jeffhammond commented Apr 8, 2017

jeffhammond commented Apr 7, 2017 •

edited

Loading