Skip to content

Commit

Permalink
broker: provision dead brokers for flub replacement
Browse files Browse the repository at this point in the history
Problem: there is no way to replace a node in Flux instance
that goes down.

Call overlay_flub_provision () when a rank goes offline
so that the flub allocator can allocate its rank to a replacement.
Unprovision ranks when they return to online.
  • Loading branch information
garlick committed Mar 1, 2024
1 parent 99a107b commit aff2d5f
Showing 1 changed file with 19 additions and 0 deletions.
19 changes: 19 additions & 0 deletions src/broker/state_machine.c
Original file line number Diff line number Diff line change
Expand Up @@ -836,6 +836,25 @@ static void broker_online_cb (flux_future_t *f, void *arg)
return;
}

/* A broker that drops out of s->quorum.online is provisioned
* for replacement via flub, and unprovisioned if returns.
*/
if (s->quorum.online) {
unsigned int id;
id = idset_first (s->quorum.online);
while (id != IDSET_INVALID_ID) { // online -> offline
if (!idset_test (ids, id))
(void)overlay_flub_provision (s->ctx->overlay, id, id, true);
id = idset_next (s->quorum.online, id);
}
id = idset_first (ids);
while (id != IDSET_INVALID_ID) { // offline -> online
if (!idset_test (s->quorum.online, id))
(void)overlay_flub_provision (s->ctx->overlay, id, id, false);
id = idset_next (ids, id);
}
}

idset_destroy (s->quorum.online);
s->quorum.online = ids;
if (idset_count (s->quorum.online) >= s->quorum.size)
Expand Down

0 comments on commit aff2d5f

Please sign in to comment.