-
Notifications
You must be signed in to change notification settings - Fork 461
Sporadic Applications
BOINC was originally designed as a batch processing system: you submit jobs, they run (independently of one another) and eventually they finish. Some potential uses of volunteer computing don't fit this model. They may require that their apps run simultaneously, and perhaps that they communicate directly with each other. Examples include MPI-type parallel apps and distributed machine learning. BOINC's 'sporadic application' mechanism is designed to support these types of systems, and to allow them to coexist with batch processing.
The jobs of a sporadic app run (i.e. are present in memory) all the time (like non-CPU-intensive jobs) but compute only some of the time.
Like regular apps, a sporadic app can have multiple app versions. Each of these app versions has a plan class, which determines the processor usage (CPUs and GPUs) of its jobs. A project's BOINC scheduler can send multiple jobs for a given sporadic app, using the same or different app versions.
Like regular jobs, a sporadic job can compute only when
- computing (and GPU computing if relevant) is enabled by user preferences
- there are sufficient processing resources and RAM
In addition, a sporadic job computes only when it wants to. A sporadic app is typically part of another distributed system - a 'guest system' - that exists outside of BOINC. The guest system typically has its own server that handles requests and dispatches them to 'worker nodes' (running BOINC). Its worker nodes may communicate directly with each other - peer-to-peer - as well as with the server.
A sporadic job engages in conversations with both the BOINC client and with the guest system server; it computes only when the server asks it to, and when the client says it's OK to.
The client/app protocol uses the following messages:
Client to app:
DONT_COMPUTE
: you can't compute now (e.g. because resources are not available)
COULD_COMPUTE
: you could compute if you want
COMPUTING
: you're computing as far as I'm concerned
App to client:
DONT_WANT_COMPUTE
: I don't want to compute now
WANT_COMPUTE
: I want to compute
The protocol between the app and the guest server isn't specified. It could be based on polling from the app, or bidirectional requests.
A typical scenario is as follows:
sequenceDiagram
participant C as BOINC client
participant A as sporadic job
participant S as guest system server
A->>C: DONT_WANT_COMPUTE
A->>S: I cannot compute
C->>A: DONT_COMPUTE
C->>A: COULD_COMPUTE
A->>S: I can compute
S->>A: here is a request
A->>C: WANT_COMPUTE
C->>A: COMPUTING
A->>S: request confirmed, computing
A->>S: I am done computing
A->>C: DONT_WANT_COMPUTE
C->>A: COULD_COMPUTE
The steps are:
- Initially the client tells the app that it can't compute, perhaps because the user has suspended computation.
- The app relays this to the server; this tells the server not to send any requests.
- Eventually the user enables computing;
the client relays this as a
COULD_COMPUTE
message to the app, and the app relays it to the server, indicating that it can now accept requests. - The server sends a request to the app, asking it to do some computing (and possibly some network communication with other workers).
- The app sends WANT_COMPUTE to the client.
- The client reserves that needed computing resources and sends COMPUTING to the app
- The app computes. When it's done, it sends DONT_WANT_COMPUTE to the client.
- The client (assuming computing is not suspended) sents COULD_COMPUTE
It's also possible that the app must stop computing before the request is finished - for example, because the user suspends computing. In this case:
- The client sends DONT_COMPUTE to the app
- The app notifies the server that it can't finish the request (or it might wait before doing this, in case computing is re-enabled quickly).
Thus, the app must continuously check for message from the client, even while it's computing.
The API interfaces for communicating with the client are:
enum SPORADIC_CA_STATE {
CA_NONE = 0,
CA_DONT_COMPUTE = 1,
CA_COULD_COMPUTE = 2,
CA_COMPUTING = 3
};
enum SPORADIC_AC_STATE {
AC_NONE = 0,
AC_DONT_WANT_COMPUTE = 1,
AC_WANT_COMPUTE = 2
};
extern void boinc_sporadic_set_ac_state(SPORADIC_AC_STATE);
extern SPORADIC_CA_STATE boinc_sporadic_get_ca_state();