-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libsubprocess: support flow control on writes via credits #6353
libsubprocess: support flow control on writes via credits #6353
Conversation
In that PR I proposed that an {
"type":"add-credit",
"channels":{
"stdin":42
}
} while I think you are using {
"type":"add-credit",
"channel":{
"stream":"stdin",
"bytes":42
}
} The former is fewer bytes on the wire and leaves the door open to combine messages for multiple streams, should that be useful. What issue did you run into? BTW the main use case I was thinking of for combining messages was at initialization (server sends |
Lemme see if I can refactor it that way. |
Ahhh tried to refactor and realized why it initially didn't work out. The
i.e. only one stream and "bytes" is reported at a time. For remote subprocesses I just report the initial credits that the local subprocess reports. Could have a special case flag where local subprocess doesn't report initial credits and remote subprocess could "collate" them in one RPC instead? |
It's a bit unfortunate if we have to wait for the reactor loop to tell us the initial buffer size before sending it to the remote user since we should already know what it is and that adds latency. Still, is there a reason not to support multiple streams in the protocol even if this server doesn't use it yet? There are other subprocess servers ( |
Perhaps some special casing should be done here. Let me think about it.
Fair enough. |
d29b711
to
eb6e6e5
Compare
added tweaks to |
Ooh nice, does it work for an even smaller buffer like 4K? |
Yup, seems to work! |
eb6e6e5
to
4f7aa82
Compare
re-pushed, fixing up a lot of things based on comments above. Most notably, the remote subprocess sends out a single "initial credits" RPC back to the client for all channels. Could use some tests that have more than just the single stdin channel. That's a TODO. But I think I can split out an obvious commit from here. |
4f7aa82
to
8825537
Compare
Is this ready for another look? I wasn't sure if you were still working on it. |
8825537
to
0bdbf8e
Compare
0bdbf8e
to
b4e4e26
Compare
Good timing! I was cleaning it up a bit and think it's ready for a review. Removed WIP. Edit: Side note, some updates to RFC coming too. |
b4e4e26
to
938f560
Compare
re-pushed, changing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd better get your feedback on my one comment so far before I proceed, as addressing it would change things a bit.
938f560
to
8717bce
Compare
re-pushed, re-working the PR per discussion above, all "space" changes are now via the fbuf notify callback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good!
Here's a batch of comments. I just reviewed code not tests.
Still hate that flag :-)
8717bce
to
c6b593e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - couple comments inline.
The big one is why is the initial on_credit callback still special cased with its own watcher?
9fbd8d7
to
ec06861
Compare
re-pushed per comments / discussion above. Moving the "initial credits" to the fbuf watcher wasn't as ugly as I thought it would be. The main trickiness is that we have to "special case" the first "initial credits" callback at both the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! Here are some comments just reviewing the code, not the tests. All my comments are fairly superficial. Great work here.
I'll try the flux-exec changes on top of this on my test cluster and see how it goes moving some serious data and watching the message traces.
f791d0e
to
34c52df
Compare
Problem: A comment for fbuf_write_watcher_create() indicates that a FLUX_POLLOUT event can occur "fd closed". This comment is lacking and can be ambiguous. Update the comment for clarity.
Problem: libsubprocess iostress tests only support "API" or "direct" writes in tests. Currently a simple boolean is used in the tests to determine which write to use. But in the near future other options may be available. Refactor the iostress tests to use an enum rather than boolean to indicate how subprocess writes should happen.
Problem: In libsubprocess iostress tests, stdin input for tests is generated in a prep watcher. However, in future tests the input will be needed in other callbacks. Generate the stdin input earlier during test launch time. It will be available for all future callbacks.
Problem: an implementation of credit-based flow control will require the underlying buffer implementation to provide notifications for any change in free buffer space, not just full/empty transitions. Call the notifier callback on any change to the amount of free buffer space. The only current users of the notifiers are the ev_fbuf_read and ev_fbuf_write classes, which need not change. Although their notifiers may be called more frequently, they simply start watchers if the buffer is non-empty or non-full (respectively), so the extra calls are merely fast no-ops. Update fbuf unit test.
Problem: subprocess channel writers have no way to know how much remote buffer space is available, and writing too much may overflow the remote buffer and cause data loss. Support a new on_credit callback that will inform the user that space is available in the write buffer. In order to support this, the fbuf write watcher callback behavior has been altered. It now also issues a callback when there is a change in the amount of buffer space available. This includes an initial callback indicating all of the buffer space is available. Adjust unit tests accordingly. Fixes flux-framework#6291
Problem: There is no coverage for the new libsubprocess on_credit callback. Add unit tests in libsubprocess/test/stdio.c and libsubprocess/test/iostress.c.
34c52df
to
02689dd
Compare
re-pushed, fixing up from comments above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is working as advertised! Awesome.
Also, since nothing really uses the flow control yet I don't think this could have much effect on critical stuff like stdin handling in the shell or PMI.
I say let's merge!
Thank you for all the iterations!
sounds good, setting MWP |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6353 +/- ##
==========================================
- Coverage 83.52% 83.50% -0.02%
==========================================
Files 525 525
Lines 87825 87900 +75
==========================================
+ Hits 73353 73405 +52
- Misses 14472 14495 +23
|
Per flux-framework/rfc#427 and discussion in #4572.
Some minor tweaks to the
add-credit
RPC. I think the design in the RFC assumed multiple channels would return credits at the same time, but I think that is impractical, so it's just credits for each channel.Haven't tested that this works by adding support in
flux-exec
yet. Just thought I'd post this initial work. May split off the cleanup into another PR.