-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flux-exec: use subprocess credits to avoid overflowing stdin buffers #6370
Conversation
ad1e566
to
aaa7f50
Compare
removed WIP, I think the tweak to longer out, a |
5b7f812
to
ee93c43
Compare
ee93c43
to
b13ee16
Compare
a8256a5
to
5da21f7
Compare
as an aside, I looked into using "negative credits" to get the |
I noticed two problems here:
Also I worry a bit about about iterating over all subprocs to find the minimum each time on_credit is called (think el cap size). On the first problem, we could make stdin flow control optional, and then turn it off if the service is sdexec (for now). That would also allow us to more carefully evaluate the performance vs what we had before. On the performance concern, maybe subprocesses could go in a zlistx ordered by credit count. When on_credit changes the credit value of one proc, call |
FWIW this adds an option to enable/disable flow control and disables flow control by default if the service is sdexec: 6ede342 |
Thinking about this, this probably would be faster, but it appears to be a slightly faster O(n) vs a slightly slower O(n) (i.e. the reorders should average n/2 iterations of the list vs n for min-credits, and when Per my comment above, maybe we want to go down the Edit: as I am skimming the ccan heap data structure, I think this is a good move, i'm gonna give it a whirl Edit2: Oh crap, I cannot remove an element from the ccan heap ... |
Since we only reorder after adding credits (not when subtracting), it
_seems_ like the list should be O(1) for the common case where the same
number of credits is being added across many subprocesses. IOW there will
be a bunch of clients that all end up with the same credit value at the end
of the list, so if we tell `zlistx_reorder()` to search for the new
position at the tail of the list after an add-credit response, it is likely
to be added as the last item, O(1).
I'm thinking that there won't be a lot of chaos in the way the server
responds since it's same server and same consumer process running in
parallel, and each parallel write sends the same number of bytes to all
ranks.
…On Mon, Oct 28, 2024 at 11:26 AM Al Chu ***@***.***> wrote:
On the performance concern, maybe subprocesses could go in a zlistx
ordered by credit count. When on_credit changes the credit value of one
proc, call zlistx_reorder with low_value=false. When it changes for all
after a write, the sort order shouldn't change since all procs' credit
changes by the same amount. Call zlistx_first() to find the minimum value.
Failed/exited procs could be removed from the list (triggering a check for
the new minimum).
Thinking about this, this probably would be faster, but it appears to be a
slightly faster O(n) vs a slightly slower O(n) (i.e. the reorders should
average n/2 iterations of the list vs n for min-credits, and when stdin_cb
is called, we don't iterate the credits list).
Per my comment above, maybe we want to go down the min-heap data
structure path? It seems ccan has a heap data structure. So now we're
guaranteed O(log n) for every min-credits reorder.
—
Reply to this email directly, view it on GitHub
<#6370 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABJPWYHA2FTIEEJWKDH5LLZ5Z6WPAVCNFSM6AAAAABP6NFSE2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBSGMZDEMRQGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Ahhh, I think you're right. I got the |
Skimming through the code, we don't handle this generically for |
9c0eefa
to
57671eb
Compare
re-pushed with fixes per discussion above, and added a tweaked version of the No tests at the moment. Wondering if we could use a similar hidden option to create a very small buffer and run something simliar to the original |
What was tweaked? I'm not spotting it.
Sounds good! Note there is a hidden |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple quick initial comments
void subprocess_destroy (void *arg) | ||
void subprocess_destroy (void **arg) | ||
{ | ||
flux_subprocess_t *p = arg; | ||
flux_subprocess_t *p = *arg; | ||
flux_subprocess_destroy (p); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set *arg = NULL
after free.
src/cmd/flux-exec.c
Outdated
if (!(handle = zlistx_add_end (subprocess_credits, credits))) | ||
log_err_exit ("zlistx_add_end"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh I would have thought you would leave the credits in the subprocess aux container like you had before and put the subprocess in the zlistx, not put credits in the zlistx and the list handle in the subprocess aux container.
The only worry I have about caching list handles is that one has to remember the undocumented fact that the handles remain valid on a zlistx_reorder()
but are invalidated by zlistx_sort()
. What if someone who doesn't remember this comes along and changes something and needs to sort the list? The result might be a hard to detect bug.
If you want to keep it this way better add a comment like
Beware: calling
zlistx_sort()
onsubprocess_credits
invalidates cached list handles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh I would have thought you would leave the credits in the subprocess aux container like you had before and put the subprocess in the zlistx, not the put credits in the zlistx and the list handle in the subprocess aux container.
Ya know, originally I stored the integer credits in the zlistx
but disliked the fact I also had to store the "handle" as an aux as well. It's sort of fallout from the fact that most uses of zlistx
have a struct that contains the item + handle in one handy location. But I didn't think of storing the subprocess pointer in the "credits" list. Let me try it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized this is the first go and "like you had it before" was just how I imagined it :-)
Sorry, only minor tweaks (fixed a typo, fix conflicts), nothing major.
Oh sweet! Didn't see that. |
57671eb
to
de3c4af
Compare
re-pushed, tweaked things by putting the subprocess pointer into the "credits" list. Wrapping all of the "get handle" and "get credits" pointers into functions and hiding all those details makes it not as annoying as I thought it would be. Also added a test. Piping a 50 meg file via a 4096 stdin buffer. If flow control wasn't working, would definitely overflow. Hopefully this is not too large of a test for the CI. We'll see. Could scale it down if it turns out to be. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
Maybe it would be good to add a variant of the test you added where one remote proc exits immediately but the other procs are checked to ensure they still get all the data?
src/cmd/flux-exec.c
Outdated
if (!(credits = calloc (1, sizeof (int)))) | ||
log_err_exit ("calloc"); | ||
if (!zlistx_add_end (subprocesses, p)) | ||
log_err_exit ("zlistx_add_end"); | ||
if (!(handle = zlistx_add_end (subprocess_credits, p))) | ||
log_err_exit ("zlistx_add_end"); | ||
if (flux_subprocess_aux_set (p, | ||
"credits", | ||
credits, | ||
(flux_free_f) free) < 0) | ||
log_err_exit ("flux_subprocess_aux_set"); | ||
if (flux_subprocess_aux_set (p, | ||
"credits_handle", | ||
handle, | ||
NULL) < 0) | ||
log_err_exit ("flux_subprocess_aux_set"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: put credits and handle in a struct and alloc the struct. Then you have only one aux_set/aux_get to do to access, and fewer memory allocs.
src/cmd/flux-exec.c
Outdated
if (!p) | ||
return 0; | ||
credits = flux_subprocess_aux_get (p, "credits"); | ||
assert (credits); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the assertions that the aux items exist are particularly helpful since we treat failing to set one as a fatal error during initialization.
t/t0005-exec.t
Outdated
cat 50Mfile | flux exec -r 0 --setopt=stdin_BUFSIZE=4096 sed -n "w 50Mcpy" && | ||
diff 50Mfile 50Mcpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use cmp
instead of diff
since the 50Mfile may not be text.
Maybe dd of=50Mcpy
would be clearer than the sed
call?
src/cmd/flux-exec.c
Outdated
int subprocess_credits_compare (const void *item1, const void *item2) | ||
{ | ||
flux_subprocess_t *p1 = (flux_subprocess_t *) item1; | ||
flux_subprocess_t *p2 = (flux_subprocess_t *) item2; | ||
int *credits1 = flux_subprocess_aux_get (p1, "credits"); | ||
int *credits2 = flux_subprocess_aux_get (p2, "credits"); | ||
return (*credits1 - *credits2); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't strictly follow instructions of returning -1, 0, or 1 from the zlistx docs:
// Set a user-defined comparator for zlistx_find and zlistx_sort; the method
// must return -1, 0, or 1 depending on whether item1 is less than, equal to,
// or greater than, item2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh, good catch. I was actually copying from another place in flux-core
. Perhaps we should audit and correct that elsewhere.
de3c4af
to
897f83b
Compare
re-pushed, fixing per comments above and adding some flux-exec to > 1 rank tests, and per comment above, one test the subprocess exits early. It's a good test, caught a corner case! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of minor comments but I think this looks pretty good!
src/cmd/flux-exec.c
Outdated
else | ||
flux_watcher_stop (stdin_w); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this else clause is needed since this only adds credits, so the current min_credits
is never reduced over what might have come before when the watcher was started?
|
||
if (!(ptr = fbuf_read (fb, -1, &lenp))) | ||
if (!(ptr = fbuf_read (fb, min_credits, &lenp))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a yes but are we certain min_credits
is always > 0 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmmmm. It should be impossible, discounting unknown or future bugs. Think we should assert check just in case/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe nothing is needed, but I suppose it could just stop the watcher and return if credits is zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sounds like a decent idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh wait! i think it is possible to be 0, although extremely unlikely, when the stream is closed. should add some smarts in there for that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i had to think about it a bit. I think it is possible min_credits could be zero if the remote subprocess has a fatal error (like libsubprocess ENOMEM kinda thing), so we're at 0 credits locally and never get credits back. But since this is only in an error scenario, I don't think it matters. And since the stdin_w
is already stopped, it doesn't matter in another way :-)
Although, this did make me think of a corner case to fix. If I call subprocess_remove_credits()
, it's possible that min_credits
may no longer be zero. So I should re-check and start the stdin_w
if necessary.
Edit: actually i don't need to make a change, the code actually does this already by chance
src/cmd/flux-exec.c
Outdated
if (updated_credits == 0) | ||
flux_watcher_stop (stdin_w); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just not have subprocess_update_credits()
return anything and make a call to subprocess_min_credits()
once after the loop and stop the watcher there if zero, since that's now a cheapish operation.
re-pushed per comments above, only minor tweaks
|
39041e4
to
a21ae85
Compare
LGTM, although this comment is not really applicable anymore and could be deleted /* if one subprocess has no more credits, stop stdin watcher */ |
thanks ... and i see I accidentally added a "fdsafsdf" commit while I was fixing the other issue, oops! I'll re-push and add MWP |
a21ae85
to
54c2a46
Compare
Problem: One test in t0005-exec.t uses spaces instead of a tab for indentation. Fix tabbing for consistency.
Problem: In the near future additional code flux-exec will use the zlistx_t data structure. Some other code in flux-exec already uses the zhashx_t data structure. One lingering data structure still uses the zlist_t data structure. For consistency going forward, convert the lingering use of zlist_t to zlistx_t.
54c2a46
to
aa98560
Compare
it appears the system coverage builder has failed on the 60m timeout 3 times in a row. Unclear to me if this is due to the extra tests, which are "beefier" with all of the 50 meg file copies. Could be slowing things down a lot there or eating up too much disk. I'm going to temporarily cut those down to 5 megs, see if that makes a difference (locally running tests, 5 megs is still enough to trigger buffer overflow on a 4K buffer without flow control). Edit: locally even 1 meg is enough to trigger the overflow. |
Only tests with Also, just noticed the commit message |
Didn't think of that, should try locally in docker.
Sounds good |
hmm, ok atleast got a clue now (I'm not sure why the timeout message is before the last test message, possible output race)
Update: The test immediately after test 11 in t2409-sdexec.t is Update2: reproduces in docker, so that's step 1 :-) |
doh, took me like 3 seconds to realize the bug
will push a fix. Also, I'll keep the 5M file size instead of the 50M file size, since it still reproduces reliably with 5M and makes things run faster. |
Problem: libsubprocess now supports stdin flow control via credits, but that is not used in flux-exec. Support credits and flow control in flux-exec to avoid overflowing the stdin buffer. Fixes flux-framework#4572
Problem: sdexec does not yet support flow control Disable flow control if the service is set to sdexec. Add a --stdin-flow=on|off hidden option to force it either way.
Problem: There is no coverage to ensure that stdin flow control truly works with flux-exec. Add a test to t0005-exec.t that sends a 5 meg file while only using a 4K buffer. A buffer overflow would almost certainly happen if flow control was not working.
81cdb5d
to
4aea6bb
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6370 +/- ##
==========================================
+ Coverage 83.57% 83.60% +0.02%
==========================================
Files 524 524
Lines 87634 87700 +66
==========================================
+ Hits 73244 73320 +76
+ Misses 14390 14380 -10
|
built on top of #6353, splitting it off since it's a separate issue