gdbremote: Initial (and minimal) support for remote debugging #444

daniel-thompson · 2024-10-12T09:46:25Z

Both usage and limitations are described in docs/advanced_usage.rst.

Testing is been but follows pretty much all the new code paths:

the reported register values were compared between gdb and drgn
x0 (argc) and x1 (argv) were checked and the pointers chased to verify that argv[0] contains the right value

At present I have not implemented any new automatic tests. A mock gdbserver that looks up replies from a python dictionary should be fairly easy to put togther but I wanted to make what I've got available for code review first.

daniel-thompson · 2024-10-14T21:46:59Z

At present I have not implemented any new automatic tests. A mock gdbserver that looks up replies from a python dictionary should be fairly easy to put togther but I wanted to make what I've got available for code review first.

The automatic tests are now implemented and are passing the CI tests (took me a couple of goes to get things running well on i686 but it's all good now).

osandov · 2024-10-14T22:17:39Z

This is super cool, thank you! I'm happy with this functionality as a starting point. I only skimmed the code so far, so I'll have to give it a proper review in the next couple of days.

BTW, I have a new guy on my team taking a stab at implementing the vmcoreinfo query packet (https://github.com/osandov/drgn/wiki/gdbstub-protocol-proposal:-linux.vmcoreinfo-query-packet) that will make it possible to support KASLR, so hopefully you'll be seeing patches for that soon.

osandov

I really appreciate this! I left several comments, mostly questions to help think about how to extend this further.

osandov · 2024-10-16T21:07:02Z

libdrgn/arch_aarch64.c

+	// gdbremote uses the same binary format as struct user_pt_reg
+	// so we can just reuse that code.


Is that true in general or only for some architectures?

I have to admit I'm not entirely sure. kgdb uses lookup tables to convert from gdb format to struct (plain, ordinary, not-user) pt_reg. Hopefully we do that for a reason although I haven't, as yet, gone digging to find out why.

osandov · 2024-10-16T21:52:57Z

_drgn.pyi

+        """
+        Set the program to the specificed elffile and connect to a gdbserver.
+
+        :param conn: gdb connection string (e.g. localhost:2345)


We'll likely also want to support serial devices and Unix socket domain sockets natively in the future. This interface should still be sufficient, since we could differentiate by:

If conn starts with / or ., it must be a path, and we can check whether the path is a socket or a character device.

If conn looks like a host:port, assume it is a TCP connection. If a user has a path that looks like a host:port, they can prefix it with ./ to disambiguate it.

Otherwise, assume it is a path.

(Nothing for you to act on here, I'm just thinking ahead.)

We'd have to double check but gdb's heuristic looks even simpler than that!

libdrgn/platform.h

libdrgn/program.c

osandov · 2024-10-18T20:11:17Z

libdrgn/program.c

+	prog->main_thread =
+	    drgn_thread_set_search(&prog->thread_set, &thread.tid).entry;


I don't think the thread set is the best fit for this, since the available threads are dynamic based on whatever the remote server tells us, right?

Another thing I'm curious about, does the remote protocol have a notion of a main or current thread? From what I can tell, you're free to start and stop threads at will via protocol commands, right?

(I'm assuming you just did it this way to make it easy for the minimal initial support, so I'm mainly asking to think about how the drgn API may need to be extended to represent remote protocol concepts.)

I don't think the thread set is the best fit for this, since the available threads are dynamic based on whatever the remote server tells us, right?

I think it would only be dynamic if we put the remote side into non-stop mode. Otherwise when we are talking to the remote then the whole system is stopped. In that case I figured using the thread_set would allow us to cache things (which will probably be desirable on a 115200 baud link). Having said that if we needed to cache more aggressively (including agressive lazyiness) we would probably need a different approach.

Another thing I'm curious about, does the remote protocol have a notion of a main or current thread? From what I can tell, you're free to start and stop threads at will via protocol commands, right?

Likewise, that can happen in non-stop mode but I'd suggest the initial development focus should be for all-stop mode, if only because that "feels" much like a core dump, only with specialist memory and register options.

osandov · 2024-10-18T20:11:51Z

libdrgn/program.c

@@ -1131,6 +1199,9 @@ drgn_thread_iterator_destroy(struct drgn_thread_iterator *it)
 		if (it->prog->flags & DRGN_PROGRAM_IS_LINUX_KERNEL) {
 			drgn_object_deinit(&it->entry.object);
 			linux_helper_task_iterator_deinit(&it->task_iter);
+		} else if (it->prog->flags & DRGN_PROGRAM_IS_GDBREMOTE) {


The various combinations of flags are starting to get out of hand, but I'll leave that as a followup for me to clean up.

libdrgn/gdbremote.c

Introduce the basic infrastructure to communicate with remote debuggers using the gdbremote protocol. See docs/advanced_usage.rst for both instructions on usages and a summary of the current limitations. Testing is been fairly modest but does cover pretty much all the new code paths: 1. the reported register values were compared between gdb and drgn 2. frame pointer based (fallback) stack tracing 3. x0 (argc) and x1 (argv) were checked and the pointers chased to verify that argv[0] contains the right value Signed-off-by: Daniel Thompson <[email protected]>

daniel-thompson · 2024-10-29T21:30:37Z

Thanks for the review. I think I've fixed all the actionable bits. Let me know if there is anything more!

brenns10 · 2024-10-29T22:34:51Z

Hi @daniel-thompson, I'm excited to see this after our discussion at LPC!

I have yet to dive too far into the code. I actually wanted to take a "testing-first" approach here, so I cherry-picked your changes onto the main branch and tested it out. I tried it on a rather ambitious first test case: the drgn vmtest kernel running within GDB.

I launched it like so:

python3.13 -m vmtest.vm --qemu-options '-gdb tcp::4000'

And then tried to attach to it with:

$ python3.13 -m drgn --gdbremote localhost:4000 -s build/vmtest/x86_64/kernel-6.12.0-rc5-vmtest34.1default/build/vmlinux
drgn 0.0.29+24.g2a8eff01 (using Python 3.13.0, elfutils 0.190, with libkdumpfile)
For help, type help(drgn).
>>> import drgn
>>> from drgn import FaultError, NULL, Object, alignof, cast, container_of, execscript, implicit_convert, offsetof, reinterpret, sizeof, stack_trace
>>> from drgn.helpers.common import *
>>> prog.read(0, 64)
Traceback (most recent call last):
  File "<python-input-0>", line 1, in <module>
    prog.read(0, 64)
    ~~~~~~~~~^^^^^^^
Exception: cannot read past end of gdbremote packet
>>>

I understand that KASLR is not supported, which is totally fine for now, but think I expected to see the ability to read from arbitrary virtual memory addresses (or physical, which I also tested to similar results).

I also went ahead and exited my drgn, only to find that the QEMU instance was still hung. I assume the VM was still halted?

I'll definitely take a look into the code soon as well, though I'm not nearly the reviewer that @osandov is. But I wanted to share the above so you could try it out or point out the error in my ways :)

daniel-thompson · 2024-10-30T08:46:29Z

I understand that KASLR is not supported, which is totally fine for now, but think I expected to see the ability to read from arbitrary virtual memory addresses (or physical, which I also tested to similar results).

I'll take a look. I suspect what you are seeing is a memory fault that hasn't been translated well (since we don't yet decode error packets). However whatever it is I'll take a look, adding error packet handling was next on my list anyway. I'm deliberately trying to keep each step as small as the next (useful) step can possibly be since I have to work on this in fits and starts.

For physical access you should have received an exception since there is no support for physical memory access... but it shouldn't have been cryptic ("Cannot read from physical memory at 0").

I also went ahead and exited my drgn, only to find that the QEMU instance was still hung. I assume the VM was still halted?

drgn doesn't issue a detach packet on exit meaning QEMU should remain in the same run state it was when we connected (drgn also won't stop the target if it is running). Adding detach support would be trivial but it sits outside the minimum useful initial set I was aiming for (and IMHO it makes more sense once there is support for stopping the target during connect).

brenns10 · 2024-10-30T16:06:09Z

Sorry, you're right, I can read valid virtual memory addresses with no issues. And if I boot with nokaslr then drgn reads variables with no issues! Plus I get that using the kernel as a test is a bit silly given that so far there's no linux-specific support. If I had read the code first I would have seen that.

I do still see the behavior that as soon as drgn attaches to QEMU's GDB, the VM seems to be halted and it doesn't respond to any inputs. Even after I exit drgn. So I think QEMU may be automatically stopping the guest when a connection is accepted.

daniel-thompson force-pushed the gdbremote branch 5 times, most recently from b81f461 to 2928020 Compare October 14, 2024 21:17

daniel-thompson force-pushed the gdbremote branch from 2928020 to b5a2b83 Compare October 15, 2024 21:23

osandov reviewed Oct 18, 2024

View reviewed changes

daniel-thompson force-pushed the gdbremote branch from b5a2b83 to cbb6112 Compare October 29, 2024 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gdbremote: Initial (and minimal) support for remote debugging #444

gdbremote: Initial (and minimal) support for remote debugging #444

daniel-thompson commented Oct 12, 2024

daniel-thompson commented Oct 14, 2024

osandov commented Oct 14, 2024

osandov left a comment

osandov Oct 16, 2024

daniel-thompson Oct 29, 2024

osandov Oct 16, 2024

daniel-thompson Oct 28, 2024 •

edited

Loading

osandov Oct 18, 2024

daniel-thompson Oct 29, 2024

osandov Oct 18, 2024

daniel-thompson commented Oct 29, 2024

brenns10 commented Oct 29, 2024

daniel-thompson commented Oct 30, 2024

brenns10 commented Oct 30, 2024

		// gdbremote uses the same binary format as struct user_pt_reg
		// so we can just reuse that code.

		prog->main_thread =
		drgn_thread_set_search(&prog->thread_set, &thread.tid).entry;

gdbremote: Initial (and minimal) support for remote debugging #444

Are you sure you want to change the base?

gdbremote: Initial (and minimal) support for remote debugging #444

Conversation

daniel-thompson commented Oct 12, 2024

daniel-thompson commented Oct 14, 2024

osandov commented Oct 14, 2024

osandov left a comment

Choose a reason for hiding this comment

osandov Oct 16, 2024

Choose a reason for hiding this comment

daniel-thompson Oct 29, 2024

Choose a reason for hiding this comment

osandov Oct 16, 2024

Choose a reason for hiding this comment

daniel-thompson Oct 28, 2024 • edited Loading

Choose a reason for hiding this comment

osandov Oct 18, 2024

Choose a reason for hiding this comment

daniel-thompson Oct 29, 2024

Choose a reason for hiding this comment

osandov Oct 18, 2024

Choose a reason for hiding this comment

daniel-thompson commented Oct 29, 2024

brenns10 commented Oct 29, 2024

daniel-thompson commented Oct 30, 2024

brenns10 commented Oct 30, 2024

daniel-thompson Oct 28, 2024 •

edited

Loading