-
Notifications
You must be signed in to change notification settings - Fork 30
Dynamic Scaling Concept
This is the place to get all ideas for a dynamic scaling concept in a single place.
So, the overall goal is to get libcircle working with minimal dependencies and configuration. Unfortunately, this means getting rid of MPI complexity. The current idea is to replace the MPI code with ZeroMQ and protobuf. Consensus for creating a node pool, as well as dynamic scaling, can lean on a Kademlia DHT. With this core design, nodes can enter and exit the compute "swarm" while a job is running.
In the case of an MPI execution, the API changes are minimal, we simply use the CIRCLE_init function with a new flag specifying execution under MPI. If MPI is not compiled into the version of libcircle currently in use, CIRCLE_init will return -1 and set errno. Dynamic scaling is not supported with MPI execution.
In the case of a ZeroMQ execution, we specify a new ZeroMQ flag to the CIRCLE_init function. As with MPI, if ZeroMQ is not compiled into the current version of libcircle, we return -1 and set errno. After CIRCLE_init is called and before CIRCLE_begin is called, we must call a new method called CIRCLE_bootstrap with an array of nodes which can be used for bootstrapping into the DHT. After CIRCLE_bootstrap has been called, we may call CIRCLE_begin and continue execution as usual.
There should also be something like a CIRCLE_drain function that we can use within a process callback to drain a node before it drops out of the swarm (such as when the node receives a shutdown signal).