-
Notifications
You must be signed in to change notification settings - Fork 459
Sporadic Applications
BOINC was originally designed as a batch processing system: you submit jobs, they run (independently of each other) and eventually finish. But some potential uses of volunteer computing don't fit this model. They may require that apps run simultaneously on different computers, and perhaps that they communicate directly with each other. Examples include MPI-type parallel computing and distributed machine learning.
BOINC's 'sporadic application' mechanism is designed to support these types of systems, and to allow them to coexist with batch processing. The jobs of a sporadic app run (i.e. are present in memory) all the time, like non-CPU-intensive jobs, but compute only some of the time.
Like regular apps, a sporadic app can have multiple app versions. Each of these has a plan class, which determines the processor usage (CPUs and GPUs) of its jobs. A project's BOINC scheduler can send multiple jobs for a given sporadic app, using the same or different app versions.
A BOINC project can provide any combination of regular, sporadic, and non-CPU-intensive apps. A client can be connected to multiple projects with sporadic apps.
Like regular jobs, a sporadic job can compute only when
- computing (and GPU computing if relevant) is enabled by user preferences
- there are sufficient processing resources and RAM
In addition, a sporadic job computes only when it wants to. A sporadic app is typically part of another distributed system - a 'guest system' - that exists outside of BOINC. The guest system typically has its own server that handles requests and dispatches them to 'worker nodes' (running BOINC). Its worker nodes may communicate directly with each other - peer-to-peer or via a relay - as well as with the server.
A sporadic job engages in conversations with both the BOINC client and with the guest system server; it computes only when the server asks it to, and when the client says it's OK to.
The client/app protocol uses the following messages:
Client to app:
DONT_COMPUTE
: the app can't compute now (e.g. because resources are not available)
COULD_COMPUTE
: the app could potentially compute
COMPUTING
: the app is computing as far as the client is concerned
App to client:
DONT_WANT_COMPUTE
: the app doesn't want to compute now
WANT_COMPUTE
: the app wants to compute
The protocol between the app and the guest server isn't specified. It could be based on polling from the app, or bidirectional requests.
A typical scenario is as follows:
sequenceDiagram
participant C as BOINC client
participant A as sporadic job
participant S as guest system server
A->>C: DONT_WANT_COMPUTE
A->>S: I cannot compute
C->>A: DONT_COMPUTE
C->>A: COULD_COMPUTE
A->>S: I can compute
S->>A: here is a request
A->>C: WANT_COMPUTE
C->>A: COMPUTING
A->>S: request confirmed, computing
A->>S: I am done computing
A->>C: DONT_WANT_COMPUTE
C->>A: COULD_COMPUTE
The steps are:
- Initially the client tells the app that it can't compute, perhaps because the user has suspended computation.
- The app relays this to the server; this tells the server not to send any requests. The server can keep track of which worker nodes are available for computing at a given point.
- Eventually the user enables computing;
the client relays this as a
COULD_COMPUTE
message to the app, and the app relays it to the server, indicating that it can now accept requests. - The server sends a request to the app, asking it to do some computing (and possibly some network communication with other workers).
- The app sends
WANT_COMPUTE
to the client. - The client reserves that needed computing resources
and sends
COMPUTING
to the app - The app computes. When it's done, it sends
DONT_WANT_COMPUTE
to the client. - The client (assuming computing is not suspended) sends
COULD_COMPUTE
It's also possible that the app must stop computing before the request is finished - for example, because the user suspends computing. In this case:
- The client sends
DONT_COMPUTE
to the app - The app notifies the server that it can't finish the request (or it might wait before doing this, in case computing is re-enabled quickly).
Thus, the app must continuously check for message from the client, even while it's computing.
The API interfaces for communicating with the client are:
enum SPORADIC_CA_STATE {
CA_NONE = 0,
CA_DONT_COMPUTE = 1,
CA_COULD_COMPUTE = 2,
CA_COMPUTING = 3
};
enum SPORADIC_AC_STATE {
AC_NONE = 0,
AC_DONT_WANT_COMPUTE = 1,
AC_WANT_COMPUTE = 2
};
extern void boinc_sporadic_set_ac_state(SPORADIC_AC_STATE);
extern SPORADIC_CA_STATE boinc_sporadic_get_ca_state();
In the initial implementation of sporadic apps (client 7.26), sporadic apps have strict priority over regular apps. Thus if a sporadic app does lots of computing it can starve regular app. If multiple sporadic apps compete for a resource (say, a GPU) the prioritization is fixed; one of them can starve the others.
In a later version, sporadic apps will be scheduled using the same scheme that is used for regular apps, in which resource share determines prioritization and starvation is eliminated.