diff --git a/README.md b/README.md index 31338456..4fee37b5 100644 --- a/README.md +++ b/README.md @@ -52,6 +52,7 @@ Table of Contents - [40/Fluxion Resource Set Extension](spec_40.rst) - [41/Job Information Service](spec_41.rst) - [42/Subprocess Server Protocol](spec_42.rst) +- [43/Job List Service](spec_43.rst) Build Instructions ------------------ diff --git a/index.rst b/index.rst index d1fa2de5..f52e527a 100644 --- a/index.rst +++ b/index.rst @@ -278,6 +278,12 @@ information for guest users. The subprocess server protocol is used for execution, monitoring, and standard I/O management of remote processes. +:doc:`spec_43` +~~~~~~~~~~~~~~ + +The Flux Job List Service provides summary information for jobs in the +system. It provides read-only access. Several ways to find / filter +jobs is also supported. .. Each file must appear in a toctree .. toctree:: @@ -323,3 +329,4 @@ standard I/O management of remote processes. spec_40 spec_41 spec_42 + spec_43 diff --git a/spec_43.rst b/spec_43.rst new file mode 100644 index 00000000..264bbb65 --- /dev/null +++ b/spec_43.rst @@ -0,0 +1,379 @@ +.. github display + GitHub is NOT the preferred viewer for this file. Please visit + https://flux-framework.rtfd.io/projects/flux-rfc/en/latest/spec_41.html + +43/Job List Service +################### + +The Flux Job List Service provides summary information for jobs in the +system. It provides read-only access. Several ways to find / filter +jobs is also supported. + +.. list-table:: + :widths: 25 75 + + * - **Name** + - github.com/flux-framework/rfc/spec_43.rst + * - **Editor** + - Albert Chu + * - **State** + - raw + +Language +******** + +.. include:: common/language.rst + +Related Standards +***************** + +- :doc:`spec_18` +- :doc:`spec_20` +- :doc:`spec_21` +- :doc:`spec_25` +- :doc:`spec_26` +- :doc:`spec_27` +- :doc:`spec_29` +- :doc:`spec_31` +- :doc:`spec_41` + +Background +********** + +Users are interested in seeing jobs that have been submitted to the +scheduler. Some reason may include: + +- See which jobs are pending, running, or inactive +- See what jobs are running on specific nodes +- Get general information about a job, such as a job's exit code +- See the order in which jobs were submitted +- See how many jobs are pending in the queue before a specific one + +While the job info service described in RFC41 is capable of providing job owners information about their +own jobs, it has several limitations: + +- job information may not be easily parsed / collated from multiple sources into one easily parsable format +- information from multiple jobs is not collated into a simple to parse list +- information about non-owned jobs is not available + +Goals +***** + +- Provide read-only access to non-sensitive information for all jobs. + +- Hide the complexity of parsing or collating data from multiple sources for commonly accessed information. + +- Provide ways to find and/or filter jobs callers are interested in. + +Implementation +************** + +The job list service SHALL provide callers the ability to read job information via identifier keys, which will be called *attributes*. See `Job Attributes` below for details. + +The job list service SHALL provide a RFC31 constraint syntax for filtering jobs. See `Constraint Operators` below for details. + +Job Attributes +============== + +Job information is defined by the following *attribute* keys listed below. + +.. list-table:: + :header-rows: 1 + + * - Attribute + - Description + - Value Encoding + * - id + - job id + - integer + * - userid + - userid of job submitter + - integer + * - urgency + - job urgency + - integer + * - priority + - job priority + - integer + * - t_submit + - time job was submitted + - real + * - t_depend + - time job entered depend state + - real + * - t_run + - time job entered run state + - real + * - t_cleanup + - time job entered cleanup state + - real + * - t_inactive + - time job entered inactive state + - real + * - state + - current job state + - integer + * - name + - job name + - string + * - cwd + - job current working directory + - string + * - queue + - job queue + - string + * - project + - job project + - string + * - bank + - job bank + - string + * - ntasks + - job task count + - integer + * - ncores + - job core count + - integer + * - nnodes + - job node count + - integer + * - ranks + - ranks a job ran on + - integer + * - nodelist + - nodes a job ran on, may accept RFC29 Hostlist + - string + * - duration + - job duration in seconds + - real + * - expiration + - time job was marked to expire + - real + * - success + - true if job was successful + - boolean + * - result + - integer indicating job success or failure type + - integer + * - waitstatus + - status of job as returned by waitpid(2) + - integer + * - exception_occurred + - true if exception occurred + - boolean + * - exception_type + - if exception occurred, exception type + - string + * - exception_severity + - if exception occurred, exception severity + - integer + * - exception_note + - if exception occurred, exception note + - string + * - annotations + - annotations as described in RFC27 + - object + * - dependencies + - current job dependencies + - array of string + +Job attributes SHALL be returned via an object where the keys are the requested job attributes. The values are the attribute values, each encoded as described in the above list. + +The *attribute* *id* SHALL always be returned for each job. Every other attribute is optional. + +Not all job attributes are available for a job. Many attributes are dependent on job state, job submission information, system configuration, or other conditions. For example: + +- a job that is pending (i.e. not yet running) does not yet have any resources to run on. Therefore, *ranks* or *nodelist* cannot yet be set. Similarly, attributes such as *success* or *result* cannot yet be determined. A timestamp like *t_run* does not yet have a value. +- a job submitted without dependencies will never have *dependencies* set +- a job cannot have belong in a *queue* on a system without a job queue +- *exception_type* will only exist if *exception_occurred* is true + +If an *attribute* has not been set for a job, it SHALL NOT be returned in the returned data object. + + +Constraint Operators +==================== + +Using the constraint syntax described by RFC31, jobs can be filtered +based on the following constraint operators. + +.. list-table:: + :header-rows: 1 + + * - Operator + - Values + - Value Encoding + - Description + * - userid + - one or more userids + - integer + - match jobs submitted by userids + * - name + - one or more job names + - string + - match jobs with job names + * - queue + - one or more queues + - string + - match jobs submitted to job queues + * - states + - one or more job states (bitmask of multiple states also allowed) + - string or integer + - match jobs in job states + * - results + - one or more job results (bitmask of multiple results also allowed) + - string or integer + - match jobs with job results + * - t_submit, t_depend, t_run, t_cleanup, t_inactive + - one timestamp prefixed with ">", "<", ">=", or "<=" + - string + - match jobs if the respective timestamp is greater than, less than, greater than or equal, or less than or equal to specified value + * - not + - one constraint object + - constraint object + - Logical negation of constraint object + * - or + - one or more constraint objects + - constraint object + - Logical or of constraint object(s) + * - and + - one or more constraint objects + - constraint object + - Logical and of constraint object(s) + +The following are several constraints examples. + +Filter jobs that belong to userid 42 or 43 + +.. code:: json + + { "userid": [ 42, 43 ] } + +Filter jobs that were not submitted to job queue "foobar" + +.. code:: json + + { "not": [ { "queue": [ "foobar" ] } ] } + +Filter jobs that pending. + +.. code:: json + + { "states": [ "depend", "priority", "sched" ] } + +Filter jobs that belong to userid 42 and were submitted after January 1, 2000. + +.. code:: json + + { "and": [ { "userid": [ 42 ] }, { "t_submit": [ ">946713600.0" ] } ] } + +List +==== + +The :program:`job-list.list` RPC fetches a list of jobs. + +The list of jobs shall be filtered in the following order. + +- pending jobs +- running jobs +- inactive jobs + +Pending jobs are returned ordered by priority (higher priority first), +running jobs ordered by start time (most recent first), and inactive +jobs ordered by completion (most recently finished first) + +The RPC payloads are defined as follows: + +.. object:: job-info.lookup request + + The request SHALL consist of a JSON object with the following keys: + + .. object:: max_entries + + (*integer*, REQUIRED) Indicate the maximum number of entries to be + returned. Specify 0 for no limit. + + .. object:: attrs + + (*array of string*, REQUIRED) List of attributes to return. The + special job attribute *all* SHALL allow a caller to request all job + attributes for a job. + + .. object:: since + + (*real*, OPTIONAL) Limit output to jobs that have been active + since a given time. + + .. object:: constraint + + (*object*, OPTIONAL) Limit output to jobs that match a constraint + object as described in RFC31. See `Constraint Operators` for + legal job list constraint operators. If not specified, match all + jobs. + +.. object:: job-info.lookup response + + The response SHALL consist of a JSON object with the following keys: + + .. object:: jobs + + (*array of objects*, REQUIRED) A list of the jobs returned from + the request. Each object will contain job information as described in + `Job Attributes`. + +List ID +======= + +The :program:`job-list.list-id` RPC fetches job attributes for a specific job ID. + +The RPC payloads are defined as follows: + +.. object:: job-list.list-id request + + The request SHALL consist of a JSON object with the following keys: + + .. object:: id + + (*integer*, REQUIRED) The job id. + + .. object:: attrs + + (*array of string*, REQUIRED) List of attributes to return. The + special job attribute *all* SHALL allow a caller to request all job + attributes for a job. + + .. object:: state + + (*integer*, OPTIONAL) Specify optional job state to wait for job + to reach, before returning job data. This may be useful so that + state specific job attributes will be available before returning. + +.. object:: job-list.list-id response + + The response SHALL consist of a JSON object with the following keys: + + .. object:: job + + (*object*, REQUIRED) The job information from the request. The + returned object will contain job information as described in + `Job Attributes`. + +List Attributes +=============== + +The :program:`job-list.list-attrs` RPC returns a list of all job attributes +that can be returned. + +The RPC payloads are defined as follows: + +.. object:: job-list.list-attrs request + + No keys are recognized for the request. + +.. object:: job-list.list-attrs response + + The response SHALL consist of a JSON object with the following keys: + + .. object:: attrs + + (*array of string*, REQUIRED) List of attributes diff --git a/spell.en.pws b/spell.en.pws index 28e9f384..8148cdb3 100644 --- a/spell.en.pws +++ b/spell.en.pws @@ -483,3 +483,7 @@ sdexec socketpair subprocess perilog +nodelist +waitstatus +userids +parsable