-
Notifications
You must be signed in to change notification settings - Fork 341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Make remote reads asynchronous #3633
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Not sure whether any of my comments has anything to do with the lab error ...
Updated, but still having intermittent IPC problems that I can't get to the bottom of. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, not sure what's wrong. Two things you could try:
- Straighten out the trigger mess by using a dedicated handler for that.
- Drop the "in a loop" commits in case we're overlooking some way a loop makes sense.
@@ -1404,7 +1469,7 @@ add_tls_to_mainloop(lrmd_t *lrmd, bool do_handshake) | |||
native->server, native->port); | |||
|
|||
struct mainloop_fd_callbacks tls_fd_callbacks = { | |||
.dispatch = lrmd_tls_dispatch, | |||
.dispatch = lrmd_tls_dispatch_async, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The notify trigger just after this (which reuses lrmd_tls_dispatch()
as its handler) is interesting.
lrmd_server_send_notify()
sends messages with type "notify", and read_remote_reply()
appends those to the pending_notify
list and sets the trigger (if they arrive while trying to read a reply).
Using lrmd_tls_dispatch()
as the trigger handler means it will process the pending_notify
list, and then try to read another message. (And that read will be blocking. With this current PR, lrmd_tls_dispatch_async()
will never see any pending notifies since it's not set as the trigger handler.)
It seems to me that something like process_pending_notifies()
should be the trigger handler. I don't know why we'd want to try to read another message at that point; the dispatch function should be called separately if there's any data on the socket.
The trigger handler should return G_SOURCE_CONTINUE
(TRUE) or G_SOURCE_REMOVE
(FALSE), but since we're currently overloading lrmd_tls_dispatch()
as the handler, we fudge that. The socket handler should return a negative value to drop the connection or anything else to continue. Presumably returning 1 or -1 works the same as G_SOURCE_CONTINUE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to address these issues, but it doesn't seem to fix the problem. I'm not exactly sure on the return values here, but I think I got them right.
I tried this yesterday and it doesn't seem to make a difference. |
Well that's actually reassuring if not helpful :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latest changes look good
* dispatch function means keep the mainloop source, and as a | ||
* trigger dispatch function, 0 means remove the trigger from the | ||
* mainloop while 1 means keep it (and job completed) | ||
* \return -1 on error to remove the trigger from the mainloop, or 0 otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/trigger/source/
Instead of calling pcmk__read_remote_message, which blocks, use pcmk__read_available_remote_data instead. This function will attempt to read a message but if the other side of the connection isn't ready or doesn't give us the complete message, we'll go back to the main loop and try again later instead of block. Additionally, clean up the code formatting a little bit and make the error messages more helpful. Related T855
… loop. There can only ever be one message in the buffer, so the loop here will always return NULL the second time. Simplifying this code doesn't actually change anything, but I think it makes it easier to understand what is going on.
This splits the process_pending_notify part out of lrmd_tls_dispatch. The latter function was really doing two things at once, which makes it confusing. Instead split the two event sources into two different handlers.
This is just like lrmd_tls_dispatch except that it uses pcmk__read_available_remote_data to read data. Related T855
Get rid of the received variable and just make this a couple if blocks. It's a little easier to follow this way.
Rebased on main, no actual changes. |
I've been working through the comments in T855 and everything's been pretty easy up to the lrmd_tls_dispatch patch. When running the RemoteMigrate test in cts-lab, I'm seeing this:
And then the errors go on from there. It looks like we notice there's going to be more to the message, but then never get back around to reading the next chunk and then somewhere, an empty message is being read. I would have expected to see "BEFORE REMOTE MSG" in the log if it were coming from lrmd_tls_dispatch_async. I could use a pointer on how I might further debug this.