Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic store operations are not detected on cached private pages #52

Open
lundgren87 opened this issue Feb 25, 2021 · 0 comments
Open

Comments

@lundgren87
Copy link
Member

Issue

When an ArgoDSM page is cached by a remote node, and the page is in private state (caching node is single writer or no writer, caching node is sharer), subsequent atomic writes to the page are not detected on this node by regular reads even after self-invalidation.

The question is whether this is a bug (I believe so), or if the semantics of the ArgoDSM atomic functions simply do not define such behavior in the first place. This is related to #20, but is further exposed by Ioannis work on memory allocation policies which fails some of the atomic tests when the allocated data is owned by a node other than 0.

Reproduction

This following test exposes the bug on 2 or more nodes with the naive allocation policy. In order to display the second case (no writer, caching node is sharer), simply substitute *counter = 0; with volatile int temp = *counter. The important point is that counter is allocated on node 0, and that exactly one other node performs a read or write to it.

include "argo/argo.hpp"

int main(){
	argo::init(1*1024*1024);
	argo::data_distribution::global_ptr<int> counter(argo::conew_<int>());
	
	if(argo::node_id()==1) {
		*counter = 0; // Node 0 owns the data, node 1 becomes single writer
	}
	argo::barrier(); // Barrier makes every node aware of the initialization
	
	// Atomically increment counter on each node
	for(int i=0; i<10; i++){
		argo::backend::atomic::fetch_add(counter, 1);
	}

	argo::barrier(); // Make sure every node has completed execution
	if(*counter == argo::number_of_nodes()*10) {
		printf("Node %d successful (counter: %d).\n", argo::node_id(), *counter);
	}else{
		printf("Node %d failed (counter: %d).\n", argo::node_id(), *counter);
	}
	
	argo::finalize();
}

Detail

This issue is courtesy of the following optimization:

if(
// node is single writer
(globalSharers[classidx+1]==id)
||
// No writer and assert that the node is a sharer
((globalSharers[classidx+1]==0) && ((globalSharers[classidx]&id)==id))
){
MPI_Win_unlock(workrank, sharerWindow);
touchedcache[i] =1;
/*nothing - we keep the pages, SD is done in flushWB*/
}

The reason is that ArgoDSM atomics do not alter the Pyxis directory (globalSharers) state, and therefore cached remote pages in single writer or no writer, shared state are not invalidated upon self-invalidation causing the node to miss updates until the state of the page changes.

Solution?

The fact that cached private pages are not downgraded to shared on atomic writes means that it is never completely safe to mix atomic writes and regular reads/writes. I believe that the correct solution would be to write atomic changes to the cache and to update local (and remote when needed as a result) Pyxis directories to the correct state.

ioanev pushed a commit to ioanev/argodsm that referenced this issue Mar 1, 2021
As per issue etascale#52, changes were made to mempool and the collective
allocators, in order for only the home node of the data structure
allocated to apply changes on it (initialization/modification).
As the above didn't solve all issues, further modifications were
applied on the failing tests. More specifically:

- PerfectForwardingCollectiveNew under allocators.cpp
In this test, a global pointer object is attached to the collecti-
vely allocated tuple, in order to identify the home node of this
page and allow only that process to proceed on the modifications.

- selectiveArray, selectiveUnaligned, writeBufferLoad under backend.cpp
In these tests, only node_0 was set to handle the initialization
of the global data structures worked on, thus leading to failure
when utilizing the first-touch memory policy, as the backing store
of node_0 is not sufficient to host all pages. To spread the pages
across all nodes and avoid this issue, the initialization is done
collectively. The ArgoDSM global memory size is also increased
from 16 to 32MB, as writeBufferLoad was still failing even by team
process initializing the data structures.

- ReadUninitializedSinglenode under uninitialized.cpp
This test was also modified to collectively initialize the data
structure worked on due to the aforementioned limitation.

The case where if one node under first-touch initializes all data
leads to failure is left to be handled on future work, by moving
the allocations to the next node of sufficient backing store size.
ioanev pushed a commit to ioanev/argodsm that referenced this issue Mar 1, 2021
As per issue etascale#52, changes were made to mempool and the collective
allocators, in order for only the home node of the data structure
allocated to apply changes on it (initialization/modification).
As the above didn't solve all issues, further modifications were
applied on the failing tests. More specifically:

- PerfectForwardingCollectiveNew under allocators.cpp
In this test, a global pointer object is attached to the collecti-
vely allocated tuple, in order to identify the home node of this
page and allow only that process to proceed on the modifications.

- selectiveArray, selectiveUnaligned, writeBufferLoad under backend.cpp
In these tests, only node_0 was set to handle the initialization
of the global data structures worked on, thus leading to failure
when utilizing the first-touch memory policy, as the backing store
of node_0 is not sufficient to host all pages. To spread the pages
across all nodes and avoid this issue, the initialization is done
collectively. The ArgoDSM global memory size is also increased
from 16 to 32MB, as writeBufferLoad was still failing even by team
process initializing the data structures.

- ReadUninitializedSinglenode under uninitialized.cpp
This test was also modified to collectively initialize the data
structure worked on due to the aforementioned limitation.

The case where if one node under first-touch initializes all data
leads to failure is left to be handled on future work, by moving
the allocations to the next node of sufficient backing store size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant