-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GdbServer: Implement new netstream that can be interrupted #4223
base: main
Are you sure you want to change the base?
Conversation
cf4f99f
to
03762db
Compare
Not sure timeouts are the way to go here, but I may be swayed by seeing a PR for the corresponding gdb stub changes. We briefly talked about this but it would still be good to write down the actual scenario that can trigger problems otherwise. |
03762db
to
4b7b295
Compare
Instead of using a timeout, I have removed the timeout capability and left the remaining ppoll implementation. This allows us to safely get interrupted with a signal, returning EINTR, and passing that up as no data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some preliminary comments.
This allows us to safely get interrupted with a signal, returning EINTR, and passing that up as no data.
Is the occurrence of a signal here a future implementation choice or something that somehow happens during regular GDB use? Do we need any special code to handle this signal or do send/recv simply return an error when a signal fires?
As before, could you sketch out the actual scenario that triggers problems if we don't support interruptible netstreams?
struct ReturnGet { | ||
char data; | ||
bool Hangup; | ||
}; | ||
std::optional<ReturnGet> get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be better modeled using an std::variant
with a monostate:
struct ReturnGet : std::variant<...> {
bool ShouldHangup() { ... }
...
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll have to excuse me for not being as good at C++ as you, I don't know how to do what you want here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess something like I pushed? I've never used std::variant like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, looks good.
Now that the code is easier to follow, I'm noticing this however: Is there any difference between the {true}
and {false}
configurations? {false}
indicates no a Hangup was received but no data can be received either, which effectively means we can't progress. Is there any benefit in handling this differently than a hangup?
If there's no difference, the entire return value can actually be condensed down to just a std::optional<char>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
false means there was no data and no hangup, meaning it was interrupted.
A major limitation of iostream is that you can't have reads or writes with a safe interrupt. Instead rewrite the interface with Linux ppoll so that these can be safely interrupted with a signal and return early.
4b7b295
to
de5cdae
Compare
The signal is a future implementation choice, because we need to send packets from different threads in a sorted fashion and only when gdbserver is in a "running" state. So threads will queue packets in fifo, and signal the gdbserver thread. Netstream doesn't need anything special to handle the signal in this case, because this will cause ppoll to early exit with EINTR, and then we can start consuming the packet queue. If we can't interrupt the netstream then the gdbserver just...can't handle queued packets because it will forever be waiting to receive with an infinite timeout? And we can't be sending the packets from other threads because of strict ordering guarantees that gdb actually requires, which is one of the reasons why gdbserver in FEX has historically been a buggy mess. |
char escaped; | ||
stream >> escaped; | ||
packet.push_back(escaped ^ 0x20); | ||
auto escaped = CommsStream.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a potential race condition here? An interrupt exactly after }
would be caught here but silently be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it will just permanently hang trying to receive more data, won't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless a hangup occurs, similar to previous behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fix it? Not sure it makes sense to add the capability for interruptions and then only use it half of the time...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no fix in this case. We need to wait until gdb has finished sending the message, otherwise the command stream is borked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a scenario that can happen in practice or must other FEX code somehow ensure no interruptions in this specific part?
How do you currently plan to deal with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gdb is likely to have already sent the message and its in our socket's buffer. The extreme case will be gdb has crashed which we notice as a hangup and early terminate.
When interrupted, it will continue waiting for its next message from gdb and handle whatever the interruption was about afterwards. There's nothing more to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing it, which I think relates back to my unanswered questions at the end of #4223 (comment) . I need to see an actual rundown of the problem scenario(s) you're addressing. It's unclear where interruptions happen and why, and which elements of this PR are mandated by GDB and which ones are pending on implementation choices.
Normally I just wouldn't expect async-aware code to be able to enter a permanently hanging state. It looks like a design flaw in the current implementation, but I can't pin it down more specifically without more context.
break; | ||
} | ||
case '#': // end of packet | ||
{ | ||
char hexString[3] = {0, 0, 0}; | ||
stream.read(hexString, 2); | ||
CommsStream.read(hexString, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to the above, this doesn't seem to handle hangup events properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
struct ReturnGet { | ||
char data; | ||
bool Hangup; | ||
}; | ||
std::optional<ReturnGet> get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, looks good.
Now that the code is easier to follow, I'm noticing this however: Is there any difference between the {true}
and {false}
configurations? {false}
indicates no a Hangup was received but no data can be received either, which effectively means we can't progress. Is there any benefit in handling this differently than a hangup?
If there's no difference, the entire return value can actually be condensed down to just a std::optional<char>
.
Considering we're already using
What happens if the interrupt is sent while outside of
There seem to be a lot of assumptions about common knowledge here. What kind of packet queuing would leave us waiting indefinitely? What ordering guarantees does gdb require? I know the answer is probably "lots of guarantees", but it would be good to see literally any concrete example of this. What I'm looking for is something like "guest app is running 2 threads; GDB user presses ctrl+c in gdb shell while simultaneously X is happening on the guest; Y must happen now, but it can't because FEX's GDB code is sending data that the GDB client does not expect". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be easier to review if the actual interruption logic were illustrated in a followup PR already. It's difficult to see if signals are indeed the way forward here without that, but at least probably most of this code should apply when using other wakeup mechanisms are used...
} | ||
} | ||
|
||
if (c.HasHangup()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The control flow has become weird here (maybe since the latest changes?).
Why do we skip the InvalidateSocket
call when we're receiving a Hangup?
Why is there still an outer while (!CoreShuttingDown.load()) {
loop? I don't see how that outer loop would ever run more than one iteration, since we reach this line only when a Hangup
is received or CoreShuttingDown==true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that was a bug introduced due to the refactoring request to delete an outer loop. Added the loop back because it's required to allow gdb connections to disconnect and reconnect at a later time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's three levels of nesting loops now, two of them on the same variable...
while (!CoreShuttingDown.load()) {
// ...
while (!CoreShuttingDown.load()) {
// ...
while ((c = CommsStream.get()).HasData()) {
#4275 There you go. |
efb44d7
to
facec98
Compare
What am I looking at? It's hard to make out the relevant change among 14 commits titled "More gdbserver" and a 1000 lines diff. Is it just the last patch that's relevant or is there more? Also I was hoping for something that explains your design decisions on using signals (as opposed to other interruption mechanisms), not just a dump of your WIP branch. |
A major limitation of iostream is that you can't have reads or writes
with a safe interrupt. Instead rewrite the interface with Linux ppoll so
that these can be safely interrupted with a signal and return early.