Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: improve behavior of starting VMs that are waiting for Crucible activation #873

Open
wants to merge 9 commits into
base: gjcolombo/refactor-server-request-queue
Choose a base branch
from

Conversation

gjcolombo
Copy link
Contributor

@gjcolombo gjcolombo commented Feb 26, 2025

(N.B. Currently stacked on #880.)

Modify the state driver's VM startup procedure to allow the driver to process Crucible volume configuration changes while block backends are being activated. This fixes a livelock that occurs when starting a VM with a Crucible VCR that points to an unavailable downstairs: the unavailable downstairs prevents Crucible activation from proceeding; Nexus sends a corrected VCR that, if applied, would allow the upstairs to activate; but the state driver never applies the new VCR because it's blocked trying to activate using the broken VCR.

Since the state driver can now dequeue VM state change requests during startup, also teach it to abort startup if a stop request is received while block backends are starting. This is very slightly spicy, because it can cancel a block backend startup future in a way that was not possible before, but (a) the only affected backend is the Crucible backend, and (b) there should be no requests in flight anyway because, if this case is reached, the VM's vCPUs have not started.

Add PHD coverage of the new behaviors:

  • Modify the PHD VCR replacement smoke test to check that start requests with a bad set of Crucible targets can be unblocked by replacing a bad target with a good target.
  • Add a server state machine test that checks that a VM that is blocked waiting for Crucible activation can be explicitly stopped.

To assist with this, add a feature to PHD Crucible disks that allows a test to specify that a disk's generated VCRs should contain an invalid downstairs IP address. Starting a VM with a disk configured this way will cause activation to block until the disk's VCR is replaced with a corrected VCR.

Tests: cargo test, PHD w/Alpine and Debian 11 guests.

Fixes #841.

@gjcolombo
Copy link
Contributor Author

@hawkw Thanks for the review! Everything should be addressed in 0c7a689 (except for the "wait for five seconds to make sure we don't start running" bit; I'll think a bit more about whether there's some other signal we could look at, but this mostly seems like one of those cases where we kind of want to prove a negative...).

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New changes look great! I imagine you might want to wait for an approval from @leftwo as well?

Base automatically changed from gjcolombo/phd-vcr-smoke-test to master February 27, 2025 00:18
Modify the state driver's VM startup procedure to allow the driver to
process Crucible volume configuration changes while block backends are
being activated. This fixes a livelock that occurs when starting a VM
with a Crucible VCR that points to an unavailable downstairs: the
unavailable downstairs prevents Crucible activation from proceeding;
Nexus sends a corrected VCR that, if applied, would allow the upstairs
to activate; but the state driver never applies the new VCR because it's
blocked trying to activate using the broken VCR.

Modify the PHD VCR replacement smoke test so that it checks this
behavior. Add an affordance to PHD Crucible disks that allows a test to
specify that the disk's generated VCRs should contain an invalid
downstairs IP. Start a VM with a disk configured this way, then replace
the broken VCR with a corrected VCR and verify that the VM boots
normally.
@gjcolombo gjcolombo force-pushed the gjcolombo/vcr-replace-during-activate branch from 0c7a689 to e794e68 Compare February 27, 2025 00:21
let downstairs_addrs = self
.downstairs_instances
.iter()
.map(|ds| ds.vcr_address())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'm surprised Clippy hasn't suggested

Suggested change
.map(|ds| ds.vcr_address())
.map(Downstairs::vcr_address)

since it usually prefers point-free style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 65f862f.

Add a PHD test case for this. This also revealed a bug in the request
queue: if a VM is stuck waiting to activate a Crucible volume, and a
stop request arrives, the VCR replacement that might have fixed it will
be rejected, because the queue rejects replacement requests that arrive
after a stop request is queued. Fix this by keeping the same VCR
replacement request disposition when a stop request is received.
@gjcolombo gjcolombo requested review from hawkw and mkeeter March 4, 2025 01:34
@gjcolombo
Copy link
Contributor Author

@mkeeter: Thanks for the detailed review! I think I've addressed all your feedback in the latest push.

@hawkw: There are enough changes in the new commits that I'd appreciate another review from you when you have a moment. (In particular I've changed things so that Propolis now reports that a VM is "Starting" and not "Creating" before the component start sequence actually begins. IIRC, we discussed this in DMs recently and decided this would be OK to do, since Omicron collapses these two states into a single "Starting" VMM state anyway.)

@gjcolombo
Copy link
Contributor Author

Ran PHD, but forgot to run cargo test before pushing. I'll fix the request queue unit test to match the new expected behavior and push that tomorrow.

@mkeeter
Copy link
Contributor

mkeeter commented Mar 4, 2025

@gjcolombo the new changes look pretty good! I'm now wondering if self_request is necessary at all, since it only ever stores a single ExternalRequest::Stop – could we just push that to the normal external_requests queue instead?

@gjcolombo
Copy link
Contributor Author

I'm now wondering if self_request is necessary at all, since it only ever stores a single ExternalRequest::Stop – could we just push that to the normal external_requests queue instead?

Interesting question. I think this actually illuminates a subtle problem with the queue dispositions we expose with this change:

  1. VM begins to start
  2. External requestor asks to stop the VM; this sets the queue disposition to ignore new stop requests (for idempotency), reject reboot/migration requests (since the VM will stop before these are processed), and allow VCR change requests (new in this PR)
  3. start_vm pops the stop request and sets stopped_while_starting
  4. Block backends finish starting up
  5. start_vm calls publish_steady_state(Running); this sets the queue dispositions to appropriate dispositions for a running VM (reboot/migration out requests are allowed again; VCR change requests are still allowed; stop requests are enqueued, not ignored)
  6. the state driver processes its buffered stop request; this corrects the queue dispositions again (to reject everything but stop requests, which are ignored)

So the answer to your question is "we could do that," at least insofar as the request queue logic will allow a new stop request to be enqueued if we do that after marking the VM as Running in step 5. But that disposition change is itself a problem--the goal is to reject requests that won't be handled because the VM is going to stop first, and setting the "now running" dispositions allows that.

I'm going to mull this over a little bit more--the more I look at the request queue's complexity, the less I like it, and the more I want to try to see how much of it is really essential.

@gjcolombo
Copy link
Contributor Author

I'm going to mull this over a little bit more--the more I look at the request queue's complexity, the less I like it, and the more I want to try to see how much of it is really essential.

So this blew up. (Shame I don't have a SoundCloud.) I've described this problem in more detail and proposed a fix in #879.

My current plan is to put up a request queue refactoring PR that will fix #879, then amend this PR to take advantage of it. I have a working version of this PR that uses the improved request queue API to ditch the "self request" queue (the improved API lets the state driver pop VCR replacement requests without popping or reordering "state change" requests like stop requests). Assuming the fix for #879 is acceptable, I'll merge it and then push the necessary amendments to this PR.

@gjcolombo gjcolombo changed the base branch from master to gjcolombo/refactor-server-request-queue March 7, 2025 00:44
@gjcolombo gjcolombo changed the title server: allow VCR replacement during block backend start server: improve behavior of starting VMs that are waiting for Crucible activation Mar 7, 2025
@gjcolombo
Copy link
Contributor Author

I've opened #880 to deal with the request queue improvements I discussed above and have changed this PR's merge base to that branch.

I also added commit 0fa8e37, which changes what happens when the state driver gets a stop request while block backends are still activating: instead of buffering the request, the driver simply returns and drops the block backend startup future. The only block backend that has an actually-asynchronous startup procedure is Crucible, and its only async call is to activate, so I believe this should be safe. I've added a warning label to the Backend trait describing this. (Note that, in any case, the "abort" path can only be reached if the VM's vCPUs haven't started yet, so there should be no outstanding I/O that could be damaged by unceremoniously dropping the future and the backends.)

Doing this eliminates the need for a "self-request" queue in the state driver, so the state driver logic turns out to be a bit simpler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

want Crucible VCR replacement to be able to proceed before a VM has fully started
3 participants