Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Experimental: Add JobList constraint parser and support constraint query strings in flux jobs -f #5711

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Jan 29, 2024

This is an experimental PR that adds a special JobListConstraintParser class to support a specialized constraint parser syntax for JobList meant to be used with flux jobs -f, --filter. This is built on top of @chu11's PR #5656.

At this point, this is just a proof-of-concept of how this could work to extend the -f, --filter option of flux jobs to support generalized constraint queries while being backwards compatible.

The JobListConstraintParser class is a two pass parser. The first pass converts tokens without an operator to states and/or results bitmasks, using or to join these together. The first pass can also be used to do other convenience conversions (like host: or hosts: to hostlist:, or as is done in this prototype @dt converted to (not (t_cleanup:<dt or t_run:>dt)) to find jobs that were running at a given time). The second pass converts the now modified query string to a JSON constraint object suitable for the job-list service.

Finally, a constraint parameter is added to the Python job listing interfaces, and flux job list -f, --filter is converted to pass the option argument as a constraint.

e.g.:

ƒ(s=5,d=1,builddir) grondo@pi3:~/git/flux-core.git$ flux jobs -f host:pi0
       JOBID USER     NAME       ST NTASKS NNODES     TIME INFO
   ƒ4RtqYfo5 grondo   hostname   CD      1      1   0.259s pi0
   ƒ4Rtgek71 grondo   hostname   CD      1      1   0.274s pi0
ƒ(s=5,d=1,builddir) grondo@pi3:~/git/flux-core.git$ flux jobs -f host:pi[2-3]
       JOBID USER     NAME       ST NTASKS NNODES     TIME INFO
   ƒ4Rts2f5R grondo   hostname   CD      1      1   0.303s pi2
   ƒ4Rtm6hx3 grondo   hostname   CD      1      1   0.304s pi2
   ƒ4RtnahEQ grondo   hostname   CD      1      1   0.240s pi3
   ƒ4RtfAkpf grondo   hostname   CD      1      1   0.235s pi3

after running some sleep jobs:

ƒ(s=5,d=1,builddir) grondo@pi3:~/git/flux-core.git$ flux jobs -f 'host:pi0 running'
       JOBID USER     NAME       ST NTASKS NNODES     TIME INFO
   ƒ6gveP5Us grondo   sleep       R      1      1   3.658s pi0
   ƒ6gvYT8MV grondo   sleep       R      1      1   3.673s pi0
ƒ(s=5,d=1,builddir) grondo@pi3:~/git/flux-core.git$ flux jobs -f 'host:pi0 completed'
       JOBID USER     NAME       ST NTASKS NNODES     TIME INFO
   ƒ4RtqYfo5 grondo   hostname   CD      1      1   0.259s pi0
   ƒ4Rtgek71 grondo   hostname   CD      1      1   0.274s pi0

@grondo
Copy link
Contributor Author

grondo commented Jan 30, 2024

We should decide if adding the constraint query support to -f, --filter is the right approach. It is more powerful, and flux-jobs(1) automatically gets support for the match operators supported by job-list, but on the other hand it is slightly less convenient in the common case, and the command already has option-based constraints like --since, --user, --name and --queue - so maybe the more consistent thing is to add --hosts and/or --ranks as @chu11 had initially proposed. For a more flexible and powerful interface, users could use flux-pgrep(1) which can parse the constraint query as free arguments which is slightly more convenient.

@chu11
Copy link
Member

chu11 commented Jan 30, 2024

We should decide if adding the constraint query support to -f, --filter is the right approach. It is more powerful, and flux-jobs(1) automatically gets support for the match operators supported by job-list, but on the other hand it is slightly less convenient in the common case, and the command already has option-based constraints like --since, --user, --name and --queue - so maybe the more consistent thing is to add --hosts and/or --ranks as @chu11 had initially proposed.

IMO I think the the advanced query via --filter is definitely a go no matter what. It provides the advanced query that the command line options can't provide. Like hypothetically user:42 and states:active OR user:43 and states:inactive, etc.

I think it's more of an open debate if --hosts and/or --ranks should be done. Will those be common / popular enough to warrant am option on the command line? I'm not sure.

likewise, with the advanced filter/query in place, dunno if we could view some current command line options worthy of retirement (will I guess hidden)? I think --name, --user, and --queue are popular enough keepers. --since is more debatable?

@grondo
Copy link
Contributor Author

grondo commented Jan 30, 2024

IMO I think the the advanced query via --filter is definitely a go no matter what. It provides the advanced query that the command line options can't provide. Like hypothetically user:42 and states:active OR user:43 and states:inactive, etc.

Agreed. However, we could provide this feature on another command (e.g. flux pgrep) and keep the flux jobs interface simple. However, we could also do both, that's fine with me.

@grondo
Copy link
Contributor Author

grondo commented Jan 30, 2024

The issue with supporting the query via an option is that quoting can get annoying. Note that in this WIP PR, I've allowed multiple -f, --filter options to flux-jobs(1) supported with multiple use combined with and, e.g.

$ flux jobs -f host:pi4 -f completed
       JOBID USER     NAME       ST NTASKS NNODES     TIME INFO
   ƒ5hsApJ31 grondo   hostname   CD      1      1   0.205s pi4
$ flux jobs -f host:pi4 -f active
       JOBID USER     NAME       ST NTASKS NNODES     TIME INFO
   ƒ5YsGD86K grondo   sleep       R      1      1   56.27s pi4

@chu11
Copy link
Member

chu11 commented Jan 30, 2024

Agreed. However, we could provide this feature on another command (e.g. flux pgrep) and keep the flux jobs interface simple. However, we could also do both, that's fine with me.

Ahhh I see your point now. Hmmm. I think there is value in flux pgrep (or another tool) without all of those extra filter options possibly adding complexity / corner cases as a result.

One long term thought, what if inactive jobs are stored in a db long term. flux pgrep certainly could be used, although it feels not like the right tool (only going by name and nothing else, like sacct vs flux pgrep ... maybe it would just take time to sink in).

@grondo
Copy link
Contributor Author

grondo commented Jan 30, 2024

what if inactive jobs are stored in a db long term.

Yes, my hope is hat we can create a constraint->sql converter, but I haven't looked into it so that is a real concern. That might be an argument for keeping this kind of generic query syntax out of flux-jobs.

flux pgrep certainly could be used, although it feels not like the right tool (only going by name and nothing else, like sacct vs flux pgrep

Yes, I definitely see your point (though I wouldn't use sacct as a good example since it doesn't really have anything to do with querying jobs either). On the other hand, nothing is more annoying to users than having a proliferation of commands that do almost the same thing. flux-pgrep would at least mirror pgrep in that it would "look up jobs based on name and other attributes" (literally the description of pgrep)

Problem: The parse_datetime() utility function is statically configured
to assume future dates when a term like "Friday" is given. That is,
instead of giving the date for the previous Friday, the function will
return then next Friday instead. This behavior should be configurable.

Add an assumeFuture parameter to the function which defaults to True.
If set to False, then parse_datetime() will assume dates in the past
instead.
Problem: Values are always returned as strings by the ConstraintParser
parser, but there are times when another type may be more
suitable. Additionally, there is currently no way to reduce a set
of values to a single element if this is required or beneficial in
the result.

Add a convert_values mapping to the ConstraintParser class. If an
operator is in this dictionary, then the values of a term are passed
to the provided callable, which should return a new list of values
as a result.
Problem: flux_job_strtostate(3) and flux_job_strtoresult(3) are not
exposed in the Python API.

Add strtostate() and strtoresult() to flux.job.info.
Problem: There is no user friendly way to create constraint objects
that can be used with the job-list service.

Add a JobListConstraintParser class which can be used to create
RFC 31 constraing objects suitable for sending to the job-list service.
Problem: The flux-jobs(1) -f, --filter option is being updated to
take a query string, but there is no "filter" based argparse action
to handle this case.

Add flux.util.FilterActionConcatenate which concatenates multiple
--filter strings together, separated by space.
Problem: There is no way to provide an arbitrary constraint to
flux-jobs(1).

Update the `-f, --filter` option of flux-jobs(1) to take a query string
that will be passed to the JobList constraint parameter instead of
the filter paramter.
@grondo
Copy link
Contributor Author

grondo commented Aug 12, 2024

I've rebased this PR now that #6209 went in.

Copy link

codecov bot commented Sep 16, 2024

Codecov Report

Attention: Patch coverage is 65.51724% with 50 lines in your changes missing coverage. Please review.

Project coverage is 83.26%. Comparing base (d7bd7d4) to head (635a4cc).
Report is 180 commits behind head on master.

Files with missing lines Patch % Lines
src/bindings/python/flux/job/list.py 58.77% 47 Missing ⚠️
src/bindings/python/flux/util.py 72.72% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5711      +/-   ##
==========================================
- Coverage   83.32%   83.26%   -0.07%     
==========================================
  Files         521      521              
  Lines       85203    85338     +135     
==========================================
+ Hits        70994    71054      +60     
- Misses      14209    14284      +75     
Files with missing lines Coverage Δ
src/bindings/python/flux/constraint/parser.py 94.35% <100.00%> (+0.11%) ⬆️
src/bindings/python/flux/job/info.py 93.88% <100.00%> (+0.12%) ⬆️
src/cmd/flux-jobs.py 96.49% <100.00%> (ø)
src/bindings/python/flux/util.py 95.63% <72.72%> (-0.41%) ⬇️
src/bindings/python/flux/job/list.py 77.55% <58.77%> (-19.42%) ⬇️

... and 13 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants