-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Hyperlight KVM guest debugging using gdb #111
base: main
Are you sure you want to change the base?
Conversation
be21b85
to
ca61def
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super excited to see this landing. Nice work! My feedback mostly consists of nits and questions.
4a44b2e
to
16492de
Compare
Signed-off-by: Doru Blânzeanu <[email protected]>
…en gdb feature is on Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
- this type will be used by the gdb and the hypervisor handler to send requests and receive responses Signed-off-by: Doru Blânzeanu <[email protected]>
- the target implements the traits to provide callbacks for the gdb commands Signed-off-by: Doru Blânzeanu <[email protected]>
- adds a specific handler for the vcpu exit debug that waits for debug messages and processes them Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
- also adds handling of gdb client disconnect by resuming execution Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
2a36f06
to
d0db26e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome!! Looked briefly though everything and I will give a more in-depth review tomorrow
@@ -136,6 +139,9 @@ impl SandboxConfiguration { | |||
pub const MIN_KERNEL_STACK_SIZE: usize = 0x1000; | |||
/// The default value for kernel stack size | |||
pub const DEFAULT_KERNEL_STACK_SIZE: usize = Self::MIN_KERNEL_STACK_SIZE; | |||
#[cfg(gdb)] | |||
/// The minimum value for debug port | |||
pub const MIN_GUEST_DEBUG_PORT: u16 = 9000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason the port must be >=9000? Otherwise the name could be DEFAULT_GUEST_DEBUG_PORT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to have this constraint to not interfere with any other services that might be running on the machine, but I thing the MIN should be around 1024.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @ludfjig that an arbitrary minimum of 9000 here is not great. You probably don't need to do anything special to prevent ports below 1024, since Linux won't allow binding those without CAP_NET_BIND_SERVICE (by default granted only to root processes) anyway.
regs.regs[15] = read_regs.r15; | ||
regs.rip = read_regs.rip; | ||
regs.eflags = u32::try_from(read_regs.rflags) | ||
.expect("Couldn't convert rflags from u64 to u32"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should return an error instead of panic here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've missed this, I'll change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the extra bits in rflags
vs eflags
are currently reserved, but going by the general x86 conventions, I would expect eflags
to always refer to the bottom half of rflags
, so maybe instead of doing a checked conversion here it's semantically cleaner to just truncate?
- improve documentation - improve gdb errors by adding a specific error variant for when the rflags conversion from u64 to u32 fails - fix clippy issue on windows Signed-off-by: Doru Blânzeanu <[email protected]>
- the user experience is not great as the fields and methods related to gdb port are not accessible unless the gdb feature is turned on. - if one runs the build command in the cli and provides the --features gdb the IDE will not know that the feature is enabled and will show errors as it doesn't know about the fields/methods Signed-off-by: Doru Blânzeanu <[email protected]>
- when using visual studio code, sometimes it requests to read from unexpected addresses which can cause an error on the hypervisor side. This fix signals this to the gdb thread which marks it as a non-fatal error Signed-off-by: Doru Blânzeanu <[email protected]>
17a5d14
to
98ac60e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome. I tried it out and it works great. Left some more comments, feel free to dismiss if you disagree with any of them. I realize some of them might be hard to implement as well. GREAT WORK!!
#[instrument(skip_all, parent = Span::current(), level= "Trace")] | ||
/// Sets the configuration for the guest debug | ||
pub fn set_guest_debug_port(&mut self, port: u16) { | ||
self.guest_debug_port = Some(max(port, Self::MIN_GUEST_DEBUG_PORT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I called this function but forgot to enable the gdb feature. The code ran fine, but didn't wait for gdb to connect, and I got confused. In my opinion, maybe we should feature-gate this function, or return error somewhere (maybe not in this func) if a user is trying to use a sandbox with this setting set, without using the gdb feature. Maybe others have better ideas. Maybe some tests for this could be good to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the previous intention, however, I disabled it with the last commit.
The reasoning behind was that the rust-analyzer didn't know about the feature being enabled and it showed errors in IDE as it doesn't know the methods.
I am not sure which is the best approach, you might be right with being confusing when it doesn't do anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to enable the feature for rust-analyzer in .vscode/settings.json
|
||
### End to end example | ||
|
||
Using the [Sandbox configuration](#sandbox-configuration) above to configure the [hello-world](https://github.com/hyperlight-dev/hyperlight/blob/main/src/hyperlight_host/examples/hello-world/main.rs) example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful if we created a brand new example for this instead of using hello-world. I read through this doc but missed the part where it said "Using the sandbox configuration above", so I was confused why it wasn't working. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I can add another example
```bash | ||
# Terminal 2 | ||
$ cat .gdbinit | ||
file file src/tests/rust_guests/bin/debug/simpleguest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file file typo?
@@ -128,6 +130,8 @@ kvm = ["dep:kvm-bindings", "dep:kvm-ioctls"] | |||
mshv2 = ["dep:mshv-bindings2", "dep:mshv-ioctls2"] | |||
mshv3 = ["dep:mshv-bindings3", "dep:mshv-ioctls3"] | |||
inprocess = [] | |||
# This enables compilation of gdb stub for easy debug in the guest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can drop the first part of this comment, since we probably don't care too much about what compilation it enables, but rather that it allows for easy gdb debugging of a guest.
self.dbg_cfg.arch.debugreg = [0; 8]; | ||
for (k, addr) in addrs.iter().enumerate() { | ||
self.dbg_cfg.arch.debugreg[k] = *addr; | ||
self.dbg_cfg.arch.debugreg[7] |= 1 << (k * 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding a comment about what these bit operations does/means?
if self.debug.is_some() { | ||
log::debug!("Setting entrypoint bp {:X}", self.entrypoint); | ||
let mut entrypoint_debug = KvmDebug::new(); | ||
entrypoint_debug.hw_breakpoints.push(self.entrypoint); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make an method on KvmDebug for adding a breakpoint that does error checking (>4 breapoints), instead of manually modifying the vec? It seems fragile to do the error checking during the set_breakpoints
, and to expose the inner vec like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I'll change it 👍
let addr = self.translate_gva(addr)?; | ||
|
||
if let Some(debug) = self.debug.as_mut() { | ||
if debug.hw_breakpoints.contains(&addr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the .contains
is unnecessary given that you use .position
later anyway. You might be able to simplify this whole block using hw_breakpoints.retain(|a| a != addrv)
or something along those lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the suggestion
} | ||
} | ||
|
||
if conn.peek().map(|b| b.is_some()).unwrap_or(false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I really understand what this is doing. Maybe a comment here could help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment
/// Enumerates the possible responses that a hypervisor can provide to a debugger | ||
#[derive(Debug)] | ||
pub enum DebugResponse { | ||
AddHwBreakpoint(bool), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need all of these responses, or would it be sufficient to combine some of them into something like Success()? I looked briefly and I didn't see AddHwBreakpoint
and AddSwBreakpoint
being used for anything for example. ( can see DebugResponse::Step | DebugResponse::Continue | DebugResponse::DisableDebug are being used though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The need stems from the return value the add_sw_breakpoint
and the others expect.
An example signature is:
fn add_sw_breakpoint(
&mut self,
addr: <Self::Arch as Arch>::Usize,
_kind: <Self::Arch as Arch>::BreakpointKind,
) -> TargetResult<bool, Self> {
and it expects true/false on the success path signaling whether the breakpoint is active, true being it was just added, false being it was already active and the new call didn't actually do anything.
I could put all of them together to make a generic response that returns Result<bool, Error>
but I am not sure it would make it look a lot better getting rid of only 3 variants. Let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the extra semantic information of the extra variants is probably useful
enable pretty-printer | ||
layout src | ||
|
||
$ gdb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This markdown is great. One thing this gdb feature introduces is some complexity regarding threading. Would you be able to write down (or maybe create a diagram) for how the threads interact with each other, for example the handler thread and so on? It's a bit confusing to me which thread waits for what thread, etc :D. A simple list about what happens in what order when you create a sandbox with gdb enabled would be sufficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a good idea, I thought about adding a diagram but I was afraid it would get desynchronized with the code when there would be a change as the documentation usually doesn't get updated unless it is close to the code being changed.
I'll work on a diagram to help with the understanding of the inner workings of the feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few minor issues, but overall this is great. I don't know why the spelling issues are not getting picked up by the spell check in GH actions
|
||
The Hyperlight `gdb` feature enables: | ||
|
||
1. KVM guest debugging: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call out more clearly that this only works on KVM for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to advertise it as only for KVM for now
## How it works | ||
|
||
The gdb feature is designed to work like a Request - Response protocol between | ||
a thread that accepts commands from a gdb cliend and the hypervisor handler over |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a thread that accepts commands from a gdb cliend and the hypervisor handler over | |
a thread that accepts commands from a gdb client and the hypervisor handler over |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird I haven't spotted (I enabled a couple of local tools for spelling) it and neither did the typo checker.
layout src | ||
|
||
$ gdb | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful to have a walkthrough of how to set up debugging using vscode with a few screenshots etc. ?
@@ -89,6 +89,7 @@ fn main() -> Result<()> { | |||
// Essentially the kvm and mshv features are ignored on windows as long as you use #[cfg(kvm)] and not #[cfg(feature = "kvm")]. | |||
// You should never use #[cfg(feature = "kvm")] or #[cfg(feature = "mshv")] in the codebase. | |||
cfg_aliases::cfg_aliases! { | |||
gdb: { all(feature = "gdb", debug_assertions, target_os = "linux") }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since right now we only support debugging with kvm should we be explicit about that here? (and maybe fail with a compilation error if kvm is not set and debug is)
pub rflags: u64, | ||
} | ||
|
||
/// Defines the possible reasons for which a vCPU ce be stopped when debugging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Defines the possible reasons for which a vCPU ce be stopped when debugging | |
/// Defines the possible reasons for which a vCPU can be stopped when debugging |
|
||
/// Software Breakpoint size in memory | ||
pub const SW_BP_SIZE: usize = 1; | ||
/// Software Breakpoinnt opcode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Software Breakpoinnt opcode | |
/// Software Breakpoint opcode |
@@ -44,3 +46,56 @@ pub(crate) fn mem_access_handler_wrapper( | |||
let mem_access_hdl = MemAccessHandler::from(mem_access_func); | |||
Arc::new(Mutex::new(mem_access_hdl)) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason why these functions are not gated by the gdb feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how this looks overall! I've left a few small things in the code comments. I also have a couple of nits about the commit organization:
Commit message for 579b324: s/tread/thread/
Commits b175f41, af0a52a, 6cc6184, and 98ac60e: please convert
each one to a series of fixup! commits to the original commits
introducing code, so that we can autosquash before merging, or just
rebase the changes into the original commits now if nobody else
objects to that.
@@ -597,11 +597,15 @@ impl HypervisorHandler { | |||
/// and still have to receive after sorting that out without sending | |||
/// an extra message. | |||
pub(crate) fn try_receive_handler_msg(&self) -> Result<()> { | |||
match self | |||
#[cfg(gdb)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe include a comment here mentioning why this is? (I presume it's because the thread may pause for a long time waiting for gdb commands?)
@@ -136,6 +139,9 @@ impl SandboxConfiguration { | |||
pub const MIN_KERNEL_STACK_SIZE: usize = 0x1000; | |||
/// The default value for kernel stack size | |||
pub const DEFAULT_KERNEL_STACK_SIZE: usize = Self::MIN_KERNEL_STACK_SIZE; | |||
#[cfg(gdb)] | |||
/// The minimum value for debug port | |||
pub const MIN_GUEST_DEBUG_PORT: u16 = 9000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @ludfjig that an arbitrary minimum of 9000 here is not great. You probably don't need to do anything special to prevent ports below 1024, since Linux won't allow binding those without CAP_NET_BIND_SERVICE (by default granted only to root processes) anyway.
@@ -36,6 +36,13 @@ use crate::sandbox_state::sandbox::EvolvableSandbox; | |||
use crate::sandbox_state::transition::Noop; | |||
use crate::{log_build_details, log_then_return, new_error, MultiUseSandbox, Result}; | |||
|
|||
#[cfg(gdb)] | |||
/// Used for passing debug configuration to a sandbox | |||
pub struct DebugInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have this struct? If the goal is to make it easy to pass more information that's needed for the debug configuration in the future, should the SandboxConfiguration
directly contain one of these, instead of a raw port?
} | ||
} | ||
|
||
fn on_interrupt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly confused here---it looks to me like this is reporting to the gdbstub state machine that the thread has stopped, but isn't actually stopping it? We could accomplish actually stopping it by sending a signal to the handler thread to kick the vcpu, or report an error if we don't want to support it?
regs.regs[15] = read_regs.r15; | ||
regs.rip = read_regs.rip; | ||
regs.eflags = u32::try_from(read_regs.rflags) | ||
.expect("Couldn't convert rflags from u64 to u32"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the extra bits in rflags
vs eflags
are currently reserved, but going by the general x86 conventions, I would expect eflags
to always refer to the bottom half of rflags
, so maybe instead of doing a checked conversion here it's semantically cleaner to just truncate?
|
||
#[instrument(skip_all, parent = Span::current(), level= "Trace")] | ||
pub(crate) fn dbg_mem_access_handler_wrapper( | ||
mut wrapper: MemMgrWrapper<HostSharedMemory>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of cloning the hshm wraper 3 times to allow it to be closed over by 3 independent closures, why not define a struct that contains a single hshm wrapper and implements the handler trait directly, or perhaps do away with the trait entirely and pass the wrapper directly to the kvm/gdb thread?
/// Enumerates the possible responses that a hypervisor can provide to a debugger | ||
#[derive(Debug)] | ||
pub enum DebugResponse { | ||
AddHwBreakpoint(bool), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the extra semantic information of the extra variants is probably useful
@@ -83,6 +83,11 @@ impl Target for HyperlightSandboxTarget { | |||
type Arch = GdbTargetArch; | |||
type Error = GdbTargetError; | |||
|
|||
#[inline(always)] | |||
fn guard_rail_implicit_sw_breakpoints(&self) -> bool { | |||
true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gdbstub docs seem to imply that this should only be true when we don't implement our own support for sw breakpoints, since this lets gdb do it by itself?
addr | ||
}; | ||
|
||
let mut save_data = [0; SW_BP_SIZE]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to make sure that when resuming after a software breakpoint we put the saved instruction back, single step, and then replace it with brk again? Or does the gdb frontend guarantee it will call remove/add_breakpoint in the correct sequence for that, in which case we should probably have a note about that?
The current implementation supports:
The overall architecture is designed to work like a
Request - Response
protocol from the gdb thread to the hypervisor handler over aDebugCommChannel
.gdbstub
to handle the communication with the gdb client