Skip to content

Commit

Permalink
examples: Have UFFD handler kill Firecracker should it die
Browse files Browse the repository at this point in the history
If the UFFD handler exits abnormaly for some reason, have it take down
Firecracker as well by SIGKILL-ing it from a panic hook. For this,
reintroduce the "get peer creds" logic. We have to use SIGKILL because
Firecracker could be inside the handler for a KVM-originated page fault
that is not marked as interruptible, in which case all signals but
SIGKILL are ignored (happens for example during KVM_SET_MSRS when it
triggers the initialization of a gfn_to_pfn_cache for the kvm-clock
page, which uses GUP without FOLL_INTERRUPTIBLE).

While we're at it, add a hint to the generic "process not found" error
message to indicate that potentially Firecracker died, and that the
cause of this could be the UFFD handler crashing (for example, in firecracker-microvm#4601
the cause of the mystery hang is the UFFD handler crashing, but we were
stumped by what's going on for over half a year. Let's avoid that going
forward).

We can't enable this by default because it interferes with unittests,
and also the "malicious_handler", so expose a function on `Runtime` to
enable it only in valid_handler and fault_all_handler.

Signed-off-by: Patrick Roy <[email protected]>
  • Loading branch information
roypat committed Jan 10, 2025
1 parent 2919a3a commit ca9bbcd
Show file tree
Hide file tree
Showing 4 changed files with 42 additions and 0 deletions.
1 change: 1 addition & 0 deletions src/firecracker/examples/uffd/fault_all_handler.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ fn main() {
let (stream, _) = listener.accept().expect("Cannot listen on UDS socket");

let mut runtime = Runtime::new(stream, file);
runtime.install_panic_hook();
runtime.run(|uffd_handler: &mut UffdHandler| {
// Read an event from the userfaultfd.
let event = uffd_handler
Expand Down
37 changes: 37 additions & 0 deletions src/firecracker/examples/uffd/uffd_utils.rs
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,43 @@ impl Runtime {
}
}

fn peer_process_credentials(&self) -> libc::ucred {
let mut creds: libc::ucred = libc::ucred {
pid: 0,
gid: 0,
uid: 0,
};
let mut creds_size = size_of::<libc::ucred>() as u32;
let ret = unsafe {
libc::getsockopt(
self.stream.as_raw_fd(),
libc::SOL_SOCKET,
libc::SO_PEERCRED,
&mut creds as *mut _ as *mut _,
&mut creds_size as *mut libc::socklen_t,
)
};
if ret != 0 {
panic!("Failed to get peer process credentials");
}
creds
}

pub fn install_panic_hook(&self) {
let peer_creds = self.peer_process_credentials();

let default_panic_hook = std::panic::take_hook();
std::panic::set_hook(Box::new(move |panic_info| {
let r = unsafe { libc::kill(peer_creds.pid, libc::SIGKILL) };

if r != 0 {
eprintln!("Failed to kill Firecracker process from panic hook");
}

default_panic_hook(panic_info);
}));
}

/// Polls the `UnixStream` and UFFD fds in a loop.
/// When stream is polled, new uffd is retrieved.
/// When uffd is polled, page fault is handled by
Expand Down
1 change: 1 addition & 0 deletions src/firecracker/examples/uffd/valid_handler.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ fn main() {
let (stream, _) = listener.accept().expect("Cannot listen on UDS socket");

let mut runtime = Runtime::new(stream, file);
runtime.install_panic_hook();
runtime.run(|uffd_handler: &mut UffdHandler| {
// Read an event from the userfaultfd.
let event = uffd_handler
Expand Down
3 changes: 3 additions & 0 deletions tests/framework/microvm.py
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,9 @@ def kill(self):
if self.screen_pid:
os.kill(self.screen_pid, signal.SIGKILL)
except:
LOG.error(
"Failed to kill Firecracker Process. Did it already die (or did the UFFD handler process die and take it down)?"
)
LOG.error(self.log_data)
raise

Expand Down

0 comments on commit ca9bbcd

Please sign in to comment.