-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RQ2 minimal functionality and bug fix to reapi match allocate #1101
RQ2 minimal functionality and bug fix to reapi match allocate #1101
Conversation
7de27dd
to
09c7f15
Compare
@vsoch do you mind taking a look at #1098 and this commit . It changes the behavior of the return value for What are your thoughts on this? Should |
Sure! So that function is used in the go bindings here: flux-sched/resource/reapi/bindings/go/src/fluxcli/reapi_cli.go Lines 102 to 109 in aae6506
flux-sched/resource/reapi/bindings/go/src/fluxcli/reapi_cli.go Lines 40 to 44 in aae6506
Can you link me to the failing test? Why would it be correct that match allocate fails (returns non zero?)
I don't know the details of traverser_run - if it fails does that mean the match allocate has failed too? In that case why would we want to hide the error from the Go bindings? Looking closer - it looks like you are adding an rc2 to the function, and returning the sum of rc1 and rc2. Can we make a small table of the possible return values given this addition, and the action (if it's considered a failure or not) for each? I'm guessing if rc is -1 and rc2 is 0 we'd get a value of -1, and maybe that doesn't indicate an actual error has occurred (and we just need to account for that case in the Go bindings)? |
It should fail if you try a match allocate and no resources are available for that allocate. In this case, nothing is actually scheduled or reserved. In any case, whether we determine this from return codes or from some other means, it would be important to know if a match was successful or not. Note that a
traverser_run is a pass through function for
rc = 0 on success or -1 on failure and has to do more infrastructure types of things, memory error, not able to get time of day, etc. Failure indicates that the process is partially completed and sets
I guess it depends on what you consider an error, is it an error for the function to try the match allocate and not do anything if the allocate is not possible? Or is that expected behavior and the results should be checked by some other means? |
I don't think I know enough about flux-sched to address your questions, going to ping @milroy here. I can help with any changes to the Go that are warranted when it's decided how to handle it. |
2b794f5
to
9b80bc3
Compare
@milroy this should be ready for a review! As discussed, I changed the go tests to set orelse_reserve to true and updated the expected output accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM generally, some small comments. And I'm sure you'll still want Dan to review.
std::cerr << optarg << std::endl; | ||
usage (1); | ||
} | ||
graph_format_to_ext (format, token); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be simplified a bit? It looks like you take the string, turn it into a format object, and then turn the format object back into the same string. I don't see graph_format_to_ext()
and string_to_graph_format()
being used elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this was part of the functionality for writing out graphs of different formats from resource query. I had started to fill in the functionality entirely, but decided to add that in later. Must not have removed it all yet. I can take this out and save it for later. Thanks for catching it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I remember now, this does convert it from string to format object and back to string. This was beneficial in original rq because it stored the format object along the way and this only required a single string comparison. Since we don't have the functionality to use the format object, it doesn't really make a lot of sense to do it this way right now, but this does in its current form accomplish something. boost::iequals
is case insensitive, so this does convert the string to lowercase.
I am happy to change it to something that's a bit more clear or keep it as is, though we will want to store that format object at some point
9b80bc3
to
47f7979
Compare
Approving reviews have been dismissed because this pull request
was updated.
After discussing the changes with @milroy, we found that changing the interface for reapi_clit_t::match_allocate has bigger implications and the return behavior should be left as is. A non 0 return code represents an error when running the code rather than a failure to find a match. |
2a6a3ca
to
5147da0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is looking really good and just requires a few minor changes.
|
||
out: | ||
rq->incr_job_counter (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current behavior in resource-query
increments the counter if print_schedule_info
is called:
flux-sched/resource/utilities/command.cpp
Line 236 in 0b70113
print_schedule_info (ctx, out, jobid, |
gettimeofday
fails here, or here- The
match
subcommand isn't recognized - The writer fails to output
- The satisfiability check is performed
I think placing rq->incr_job_counter ();
before out:
will generate the desired behavior or enable it to be generated when new rq2
features are added:
rq->incr_job_counter (); | |
rq->incr_job_counter (); | |
out: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out! I think this change will mean that we will not increment when a match is not found. I can add the increment into the block for that check though
5147da0
to
c374426
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the added check for the return code from rq->traverser_run
will break the future implementation of match_multi
.
2556828
to
b87592b
Compare
Problem: reapi_cli_t::match_allocate does not return or track whether a match was successful or not. It also sets the job info with an unset value for overhead. Check return code from traverser run to determine if a match is successful. If a match is not successful, increase the job counter and return 0. errno is set with the correct error value to be handled by the calling function. Move code blocks that set overhead value ov above blocks where job info is created.
Problem: current info function only provides a subset of fields from job_info_t. Add an overloaded info function that mutates job parameter to job_info_t value corresponding to jobid.
Problem: new resource query and potentially other tools using this api may need access to the traverser pre and post order counts. Accessors for these fields do not exist. Create accessors for these fields in reapi_cli_t and resource_query_t.
Problem: resource query does not utilize reapi. This means that new functionality must be implemented in both places and that reapi is not tested with automated ci tests and code coverage. Create a new resource query that uses reapi.
Problem: rq2 needs tests to verify that it can be a drop in replacement for rq1 Add some simple tests for current functionality. Generate the expected files using rq1
Problem: a recent fix to reapi changed the output of the golang bindings tests. This changed a value that was once static to a value that changes with every run. Use a sed command to remove this value from the output of the test. Edited tests to use orelse_reserve flag.
b87592b
to
e4f265d
Compare
@milroy I added in the most recent changes requested, should be ready now! |
Thanks for the great work! The PR looks good and is ready to merge. Go ahead and set MWP. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #1101 +/- ##
========================================
- Coverage 71.8% 70.1% -1.7%
========================================
Files 88 90 +2
Lines 11562 12261 +699
========================================
+ Hits 8309 8603 +294
- Misses 3253 3658 +405
|
This MR includes changes to the reapi to support baseline rq2 functionality and some basic tests to test out rq2.
rq2 in this MR can handle all of the cli arguments, though the write_to_graph function is not yet implemented so the
-g
and-o
flags do not yet do anything. rq2 only implements 3 commands,help
,quit
, andmatch
. Formatch
only the sub commandsallocate
andallocate_orelse_reserve
are currently supported. This MR introduces a lot of infrastructure around rq2 and should set the stage for future development in this effort.The test expected output was generated with resource-query and should be a good measure of whether rq2 is behaving properly.