-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address the telnet console death bug. #2347
Conversation
…re stuck on drain then the buffer isn't getting emptied. 5 seconds after drain() blocks, exception will be thrown and client will be removed from connection table and will no longer be a problem.
oops, just noticed its a 10 second timeout and not 5. |
One additional note about the fix. I switched to looking up items by keys instead of values and returning a list so that we can safely remove items from the dictionary while transversion. |
@spikefishjohn so merging this pull request will fix completely the console freeze bug when using telnet ? |
It does for me. May want to lower the timeout as 10 seconds feels a little long. That being said the patch could use additional testing. Do you have a way to replicate the issue btw? The way I recreate the issue is clear the firewall's connection table for the telnet session (I'm accessing GNS3 through a firewall to be clear). This will cause the firewall to do a silent drop on the tcp session. Then open a 2nd telnet session and do something that creates a large amount of output on the console. I use find / on a linux node for example. Without the patch the console will output data for about 10 seconds or so then lock up. You'll be able to open new sessions and get connected and even send data but you won't see anything happening on the console as the echo of your data doesn't get sent to you. With the patch there will be a pause while GNS3 is trying to communicate with the down telnet session. After 10 seconds it will give up, delete the session and then session 2 will start working again as well as any new session. Also note the next version of GNS3 will have tcp keepalives timers lowered so that it sends keep alives more often then every 2 hours. This will prevent a firewall from killing a session from being idle for too long but won't fix the broken console issue. |
I can only replicate this only by leaving the telnet console session opened for a while (no activity for several hours I think) until I receive the message connection refused. |
hmmmm.. connection refused. Ok I can't say for sure this will address your issue but only because I don't know what happens if the telnet console is left in a bad state for long enough. I personally have never seen it throw connection refused, which would mean the listening socket closed. That being said if the GNS3 node continues to try to push data out and it can't write data at some point I would expect the service to do something bad (like crash) ... so its possible. If you can give me more details i'll see if I can replicate or by all means give the update a try and report back. For me to replicate can you tell me how your accessing the telnet console network topology wise? To give you an idea of bullet points i'm looking for.
|
Sorry I got a bit confused. I have two different issues related to telnet console connections.
I have two different setups for GNS3.
The first issue with the telnet console freeze happens only with the remote server setup not when running both server and client on same machine. |
ah ok. For sure, this will fix issue number 1. I'll be my burrito on it. Please give the patch a try to see if I owe you one. Issue 2 sounds like a completely different thing. Like a timing issue where it hasn't waited for the device to return running or maybe for the console to have started. I would make a new bug report for that. |
@grossmj 3.0 branch does not need this patch applied or the other pull request using telnetlib3 rewrite should in theory fix this annoying issue ? |
The patch will be merged into 3.0 and the other pull request using telnetlib3 should make it there as well once it has been sufficiently tested. |
EDIT: This is completely wrong. Doh! I need to stop the late night bug research. I did notice something. this calls except if it fails which calls close() which then does this. async def close(self):
for writer, connection in self._connections.items():
try:
writer.write_eof()
await writer.drain()
writer.close()
# await writer.wait_closed() # this doesn't work in Python 3.6
except (AttributeError, ConnectionError):
continue So my theory is the drain call needs to have a wait_for as well. I'm going to see if I can recreate the problem then take a swing at it again. |
Add wait_for for drain() call with a 5 second timeout. If we're stuck on drain then the buffer isn't getting emptied. 5 seconds after drain() blocks, exception will be thrown and client will be removed from connection table and will no longer be a problem. There will be a pause in console output while in this 5 second wait, after that the console will return to normal.
When the exception is triggered the follow debug will be logged.
2024-02-03 03:07:15 DEBUG telnet_server.py:309 Timeout while sending data to client: ('REMOTE_IP', REMOTE_PORT), closing and removing from connection table.
It just dawned on me that peerinfo() is a list so if you want that reformatted it would be easy enough to do.
Fixes Issue 2344