GdbServer: Implement new netstream that can be interrupted #4223

Sonicadvance1 · 2024-12-18T21:47:29Z

A major limitation of iostream is that you can't have reads or writes
with a safe interrupt. Instead rewrite the interface with Linux ppoll so
that these can be safely interrupted with a signal and return early.

neobrain · 2024-12-23T09:05:39Z

Not sure timeouts are the way to go here, but I may be swayed by seeing a PR for the corresponding gdb stub changes.

We briefly talked about this but it would still be good to write down the actual scenario that can trigger problems otherwise.

Sonicadvance1 · 2025-01-08T20:48:55Z

Not sure timeouts are the way to go here, but I may be swayed by seeing a PR for the corresponding gdb stub changes.

We briefly talked about this but it would still be good to write down the actual scenario that can trigger problems otherwise.

Instead of using a timeout, I have removed the timeout capability and left the remaining ppoll implementation. This allows us to safely get interrupted with a signal, returning EINTR, and passing that up as no data.
This will let me instead signal to the gdbserver thread that a thread has some messages to be sent.

neobrain

Added some preliminary comments.

This allows us to safely get interrupted with a signal, returning EINTR, and passing that up as no data.

Is the occurrence of a signal here a future implementation choice or something that somehow happens during regular GDB use? Do we need any special code to handle this signal or do send/recv simply return an error when a signal fires?

As before, could you sketch out the actual scenario that triggers problems if we don't support interruptible netstreams?

neobrain · 2025-01-09T09:49:32Z

Source/Tools/LinuxEmulation/LinuxSyscalls/NetStream.h

+  struct ReturnGet {
+    char data;
+    bool Hangup;
+  };
+  std::optional<ReturnGet> get();


This would be better modeled using an std::variant with a monostate:

struct ReturnGet : std::variant<...> { bool ShouldHangup() { ... } ... };

You'll have to excuse me for not being as good at C++ as you, I don't know how to do what you want here.

I guess something like I pushed? I've never used std::variant like this.

Yes, looks good.

Now that the code is easier to follow, I'm noticing this however: Is there any difference between the {true} and {false} configurations? {false} indicates no a Hangup was received but no data can be received either, which effectively means we can't progress. Is there any benefit in handling this differently than a hangup?

If there's no difference, the entire return value can actually be condensed down to just a std::optional<char>.

false means there was no data and no hangup, meaning it was interrupted.

Source/Tools/LinuxEmulation/LinuxSyscalls/NetStream.cpp

Source/Tools/LinuxEmulation/LinuxSyscalls/GdbServer.h

Source/Tools/LinuxEmulation/LinuxSyscalls/GdbServer.cpp

A major limitation of iostream is that you can't have reads or writes with a safe interrupt. Instead rewrite the interface with Linux ppoll so that these can be safely interrupted with a signal and return early.

Sonicadvance1 · 2025-01-10T16:54:15Z

Added some preliminary comments.

This allows us to safely get interrupted with a signal, returning EINTR, and passing that up as no data.

Is the occurrence of a signal here a future implementation choice or something that somehow happens during regular GDB use? Do we need any special code to handle this signal or do send/recv simply return an error when a signal fires?

As before, could you sketch out the actual scenario that triggers problems if we don't support interruptible netstreams?

The signal is a future implementation choice, because we need to send packets from different threads in a sorted fashion and only when gdbserver is in a "running" state. So threads will queue packets in fifo, and signal the gdbserver thread.

Netstream doesn't need anything special to handle the signal in this case, because this will cause ppoll to early exit with EINTR, and then we can start consuming the packet queue.

If we can't interrupt the netstream then the gdbserver just...can't handle queued packets because it will forever be waiting to receive with an infinite timeout? And we can't be sending the packets from other threads because of strict ordering guarantees that gdb actually requires, which is one of the reasons why gdbserver in FEX has historically been a buggy mess.

neobrain · 2025-01-13T13:46:07Z

Source/Tools/LinuxEmulation/LinuxSyscalls/GdbServer.cpp

-      char escaped;
-      stream >> escaped;
-      packet.push_back(escaped ^ 0x20);
+      auto escaped = CommsStream.get();


Isn't there a potential race condition here? An interrupt exactly after } would be caught here but silently be skipped.

Now it will just permanently hang trying to receive more data, won't it?

Unless a hangup occurs, similar to previous behaviour.

Can you fix it? Not sure it makes sense to add the capability for interruptions and then only use it half of the time...

There's no fix in this case. We need to wait until gdb has finished sending the message, otherwise the command stream is borked.

Is this a scenario that can happen in practice or must other FEX code somehow ensure no interruptions in this specific part?

How do you currently plan to deal with it?

gdb is likely to have already sent the message and its in our socket's buffer. The extreme case will be gdb has crashed which we notice as a hangup and early terminate.

When interrupted, it will continue waiting for its next message from gdb and handle whatever the interruption was about afterwards. There's nothing more to do.

I'm not seeing it, which I think relates back to my unanswered questions at the end of #4223 (comment) . I need to see an actual rundown of the problem scenario(s) you're addressing. It's unclear where interruptions happen and why, and which elements of this PR are mandated by GDB and which ones are pending on implementation choices.

Normally I just wouldn't expect async-aware code to be able to enter a permanently hanging state. It looks like a design flaw in the current implementation, but I can't pin it down more specifically without more context.

neobrain · 2025-01-13T13:48:15Z

Source/Tools/LinuxEmulation/LinuxSyscalls/GdbServer.cpp

      break;
    }
    case '#': // end of packet
    {
      char hexString[3] = {0, 0, 0};
-      stream.read(hexString, 2);
+      CommsStream.read(hexString, 2);


Similarly to the above, this doesn't seem to handle hangup events properly.

Source/Tools/LinuxEmulation/LinuxSyscalls/NetStream.cpp

neobrain · 2025-01-13T14:01:35Z

Source/Tools/LinuxEmulation/LinuxSyscalls/NetStream.h

+  struct ReturnGet {
+    char data;
+    bool Hangup;
+  };
+  std::optional<ReturnGet> get();


Yes, looks good.

Now that the code is easier to follow, I'm noticing this however: Is there any difference between the {true} and {false} configurations? {false} indicates no a Hangup was received but no data can be received either, which effectively means we can't progress. Is there any benefit in handling this differently than a hangup?

If there's no difference, the entire return value can actually be condensed down to just a std::optional<char>.

neobrain · 2025-01-13T14:15:59Z

The signal is a future implementation choice, because we need to send packets from different threads in a sorted fashion and only when gdbserver is in a "running" state. So threads will queue packets in fifo, and signal the gdbserver thread.

Considering we're already using poll, what's the benefit of using a signal over adding e.g. an eventfd to the fd list?

Netstream doesn't need anything special to handle the signal in this case, because this will cause ppoll to early exit with EINTR, and then we can start consuming the packet queue.

What happens if the interrupt is sent while outside of poll?

If we can't interrupt the netstream then the gdbserver just...can't handle queued packets because it will forever be waiting to receive with an infinite timeout? And we can't be sending the packets from other threads because of strict ordering guarantees that gdb actually requires, which is one of the reasons why gdbserver in FEX has historically been a buggy mess.

There seem to be a lot of assumptions about common knowledge here. What kind of packet queuing would leave us waiting indefinitely? What ordering guarantees does gdb require? I know the answer is probably "lots of guarantees", but it would be good to see literally any concrete example of this. What I'm looking for is something like "guest app is running 2 threads; GDB user presses ctrl+c in gdb shell while simultaneously X is happening on the guest; Y must happen now, but it can't because FEX's GDB code is sending data that the GDB client does not expect".

neobrain

This would be easier to review if the actual interruption logic were illustrated in a followup PR already. It's difficult to see if signals are indeed the way forward here without that, but at least probably most of this code should apply when using other wakeup mechanisms are used...

neobrain · 2025-01-13T14:23:40Z

Source/Tools/LinuxEmulation/LinuxSyscalls/GdbServer.cpp

      }
    }

+    if (c.HasHangup()) {


The control flow has become weird here (maybe since the latest changes?).

Why do we skip the InvalidateSocket call when we're receiving a Hangup?

Why is there still an outer while (!CoreShuttingDown.load()) { loop? I don't see how that outer loop would ever run more than one iteration, since we reach this line only when a Hangup is received or CoreShuttingDown==true.

Yep, that was a bug introduced due to the refactoring request to delete an outer loop. Added the loop back because it's required to allow gdb connections to disconnect and reconnect at a later time.

There's three levels of nesting loops now, two of them on the same variable...

while (!CoreShuttingDown.load()) { // ... while (!CoreShuttingDown.load()) { // ... while ((c = CommsStream.get()).HasData()) {

Sonicadvance1 · 2025-01-13T15:22:52Z

This would be easier to review if the actual interruption logic were illustrated in a followup PR already. It's difficult to see if signals are indeed the way forward here without that, but at least probably most of this code should apply when using other wakeup mechanisms are used...

#4275 There you go.

neobrain · 2025-01-20T10:19:55Z

#4275 There you go.

What am I looking at? It's hard to make out the relevant change among 14 commits titled "More gdbserver" and a 1000 lines diff. Is it just the last patch that's relevant or is there more?

Also I was hoping for something that explains your design decisions on using signals (as opposed to other interruption mechanisms), not just a dump of your WIP branch.

Sonicadvance1 force-pushed the netstream_timeout branch from cf4f99f to 03762db Compare December 20, 2024 01:40

Sonicadvance1 force-pushed the netstream_timeout branch from 03762db to 4b7b295 Compare January 8, 2025 20:45

Sonicadvance1 changed the title ~~GdbServer: Implement new netstream that can have timeouts~~ GdbServer: Implement new netstream that can be interrupted Jan 8, 2025

neobrain reviewed Jan 9, 2025

View reviewed changes

GdbServer: Implement new netstream that can be interrupted

de5cdae

A major limitation of iostream is that you can't have reads or writes with a safe interrupt. Instead rewrite the interface with Linux ppoll so that these can be safely interrupted with a signal and return early.

Sonicadvance1 force-pushed the netstream_timeout branch from 4b7b295 to de5cdae Compare January 10, 2025 16:48

Netstream: Use a std::variant

95ee828

neobrain reviewed Jan 13, 2025

View reviewed changes

review

facec98

Sonicadvance1 force-pushed the netstream_timeout branch from efb44d7 to facec98 Compare January 13, 2025 16:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GdbServer: Implement new netstream that can be interrupted #4223

GdbServer: Implement new netstream that can be interrupted #4223

Sonicadvance1 commented Dec 18, 2024 •

edited

Loading

neobrain commented Dec 23, 2024

Sonicadvance1 commented Jan 8, 2025

neobrain left a comment

neobrain Jan 9, 2025

Sonicadvance1 Jan 10, 2025

Sonicadvance1 Jan 10, 2025

neobrain Jan 13, 2025

Sonicadvance1 Jan 13, 2025

Sonicadvance1 commented Jan 10, 2025

neobrain Jan 13, 2025

Sonicadvance1 Jan 13, 2025

neobrain Jan 20, 2025

Sonicadvance1 Jan 20, 2025

neobrain Jan 20, 2025

Sonicadvance1 Jan 20, 2025

neobrain Jan 20, 2025

Sonicadvance1 Jan 20, 2025

neobrain Jan 20, 2025

neobrain Jan 13, 2025

Sonicadvance1 Jan 13, 2025

neobrain Jan 13, 2025

neobrain commented Jan 13, 2025

neobrain left a comment

neobrain Jan 13, 2025

Sonicadvance1 Jan 13, 2025

neobrain Jan 20, 2025

Sonicadvance1 commented Jan 13, 2025

neobrain commented Jan 20, 2025

GdbServer: Implement new netstream that can be interrupted #4223

Are you sure you want to change the base?

GdbServer: Implement new netstream that can be interrupted #4223

Conversation

Sonicadvance1 commented Dec 18, 2024 • edited Loading

neobrain commented Dec 23, 2024

Sonicadvance1 commented Jan 8, 2025

neobrain left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sonicadvance1 commented Jan 10, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neobrain commented Jan 13, 2025

neobrain left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sonicadvance1 commented Jan 13, 2025

neobrain commented Jan 20, 2025

Sonicadvance1 commented Dec 18, 2024 •

edited

Loading