open_mpi_spec.txt

0. ================================================ Things to know before you start
- this spec is not final, it's just a guide
- anything that you think is wrong should get corrected
- all communication between nodes is asynchronous
- meaning of constants:
  - N - number of nodes
  - K - length of side of world
  - G - number of generations

1. ================================================ Data format
cell_t {
  int x,y;			//position of cell in the world
  cell_habitant_t type;		//what is living inside
  int starvation, breeding;	//properties of living creature
}

2. ================================================ Master/Servant
Nodes will be split in 2 groups:
- master (1 node) - will be handling communication, synchronisation, propagation and evenly divide the work between servants
- servant (N-1 nodes) - will be doing all computation and logic
Implementation: http://dvbmonkey.wordpress.com/2009/03/02/an-open-mpi-master-servant-example/

3. ================================================ Messages types
- NEW_BOARD(side:int)
- UPDATE_CELL(c : cell_t)
- FINISHED()
- START_NEXT_GENERATION(RED or BLACK)

4. ================================================ Master
Data:
- array of all cells
- size of board
Algorithm:
- loads world from file
- splits the world in N-1 parts
- sends NEW_BOARD(K) to every node
- sends correct all the board to every node + 1 cell border around it by sending UPDATE_CELL messages (now the nodes have data to work on)
- sends FINISHED to confirm sending all cells
- in 2*G step loop:
  - sends START_NEXT_GENERATION(color) to every node
  - starts listening on all channels for incoming updates, saves them
  - counts FINISHED messages - if count == N-1 then sub-gen finished
  - sends stored updates to all servants
  - sends FINISHED to all servants
- sends FINISHED to all servants to notify that all generations are finished
- starts to listen for UPDATE_CELL from all servants, save incoming cell into a world array
- counts incoming FINISHED messages - if N-1 the comutation is finished
- prints the output to stdout
- exits

5. ================================================ Servant
Data:
- array of ~(1/N) cells
- startX, endX, startY, endY - position of board piece
Algorithm:

/---- All of these parts can be skipped by giving the full board to each servant (less efficient but more easy, and time is missing...)
- starts listening for NEW_BOARD message -> allocates memory for its board part
- listens for UPDATE_CELL messages, saves messages to board
- listens for FINISHED meaning all cells are in place
/----
- in loop:
  - listens for message
    - if it is FINISHED - all generations are finished - break the loop
    - if START_NEXT_GENERATION(genInfo) - start next generation with (startX, startY, endX, endY, color)
  - do the computation of its part of the board
  - send cells there were changed on a border to master with message UPDATE_CELL
  - when done sends FINISHED to master
  - listens for UPDATE_CELL messages from master
  - if cells are in its part of the board updates them (potentially resolves conflicts)
  - listens for FINISHED message from master
- sends all cells from board to master with UPDATE_CELL
- exits

6. =============================================== Additional notes
Since the initialization of the program is done by command line, all copies of the program receive
the initialization values passed by the command line call. There is no need to pass this values by
an MPI message, since these values are readily available to all copies of the program.


7. =============================================== Matrix Decomposition

Three options for decomposition
1. Rowwise block striped decomposition
– Divide matrix elements into group of rows (same as Floyd’s algorithm)
– Each process responsible for a contiguous group of either bm=pc or dm=pe rows
2. Columnwise block striping
– Divide matrix elements into group of columns
– Each process responsible for a contiguous group of either bn=pc or dn=pe columns
3. Checkerboard block decomposition
– Form a virtual grid
– Matrix is divided into 2D blocks aligning with the grid
– Let the grid have r rows and c columns
– Each process responsible for a block of matrix containing at most dm=re rows and dn=ce columns