jobstats
is a command line utility that reports slurm job statistics that can be useful
for getting a summary of jobs, report of historical jobs, account information, fairshare, etc...
jobstats
is a wrapper to few slurm commands like sacct
, squeue
, sreport
to retrieve slurm
accounting data.
To get started, please clone this repo:
git clone https://github.com/shahzebsiddiqui/jobstats.git
You will need a python 3.x or higher to run jobstats.
You may want to add jobstats to your $PATH:
cd jobstats export PATH=$PWD:$PATH
You can run --help
option to see available options for jobstats program:
$ jobstats --help usage: jobstats [-h] [-u USER] [-S START] [-E END] [-j] [--state {COMPLETED,FAILED,TIMEOUT,CANCELLED}] [-a] [-v] slurm utility for display user job statistics, reporting, and account detail. optional arguments: -h, --help show this help message and exit -u USER, --user USER Select a user -S START, --start START Start Date Format: YYYY-MM-DD -E END, --end END End Date Format: YYYY-MM-DD -j, --jobsummary Display job summary for user --state {COMPLETED,FAILED,TIMEOUT,CANCELLED} Filter by Job State -a, --account Display information on account shares that user belongs to -v, --version Print version Developed by Shahzeb Siddiqui <[email protected]>
jobstats will report jobs completed, failed, cancelled and timeout including the default slurm account and list of slurm accounts a user belongs to
$ jobstats User: shahzeb Default Account: admin User is part of the following slurm accounts ['admin'] User Raw Share: 100 User Raw Usage: 38966329 Number of Pending Jobs: 0 Number of Running Jobs: 0 Total Jobs Completed: 1 Total Jobs Completed Successfully: 0 Total Jobs Failed: 0 Total Jobs Cancelled: 0 Total Jobs Timeout: 0 Today: 27/03/2019 14:39:27 sreport -------------------------------------------------------------------------------- Top 10 Users 2019-03-26T00:00:00 - 2019-03-26T23:59:59 (86400 secs) Usage reported in CPU Hours -------------------------------------------------------------------------------- Cluster Login Proper Name Account Used Energy --------- --------- --------------- --------------- -------- -------- slurm_cl+ shahzeb Siddiqui admin 24 0
jobstats
will display running and pending jobs if you have any active
jobs while running the command.:
$ jobstats User: shahzeb Default Account: admin User is part of the following slurm accounts ['admin'] User Raw Share: 100 User Raw Usage: 38960679 Number of Pending Jobs: 0 Number of Running Jobs: 2 Total Jobs Completed: 3 Total Jobs Completed Successfully: 0 Total Jobs Failed: 0 Total Jobs Cancelled: 0 Total Jobs Timeout: 0 Today: 27/03/2019 14:48:28 sreport -------------------------------------------------------------------------------- Top 10 Users 2019-03-26T00:00:00 - 2019-03-26T23:59:59 (86400 secs) Usage reported in CPU Hours -------------------------------------------------------------------------------- Cluster Login Proper Name Account Used Energy --------- --------- --------------- --------------- -------- -------- slurm_cl+ shahzeb Siddiqui hpceng 24 0 Running Jobs ________________________________________________________________________________ JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 19705 medium wrap shahzeb R 0:04 13 c[001-013] 19704 long wrap shahzeb R 0:11 13 c[001-013] Running + Pending Jobs ________________________________________________________________________________ JOBID PARTITION PRIOR NAME USER STATE TIME TIME_LIMIT NODES CPUS GRES START_TIME NODELIST(REASON) QOS 19705 medium 10002 wrap shahzeb RUNNING 0:04 8:20:00 13 500 (null) 2019-03-27T14:48:24 c[001-013] normal 19704 long 10002 wrap shahzeb RUNNING 0:11 8:20:00 13 500 (null) 2019-03-27T14:48:17 c[001-013] normal
jobstats
can give you a summary of jobs completed, the default time window
is current day but this can be tweaked. To see a job summary use option -j
or
--jobsummary
:
$ jobstats -j User: shahzeb Default Account: admin User is part of the following slurm accounts ['admin'] User Raw Share: 100 User Raw Usage: 39028847 Number of Pending Jobs: 0 Number of Running Jobs: 0 Total Jobs Completed: 3 Total Jobs Completed Successfully: 2 Total Jobs Failed: 0 Total Jobs Cancelled: 0 Total Jobs Timeout: 0 Today: 27/03/2019 14:51:52 sreport -------------------------------------------------------------------------------- Top 10 Users 2019-03-26T00:00:00 - 2019-03-26T23:59:59 (86400 secs) Usage reported in CPU Hours -------------------------------------------------------------------------------- Cluster Login Proper Name Account Used Energy --------- --------- --------------- --------------- -------- -------- slurm_cl+ shahzeb Siddiqui admin 24 0 Today Job Summary ________________________________________________________________________________ JobID Partition NCPUS NNodes Submit Elapsed CPUTimeRAW Start End State ------------ ---------- ---------- -------- ------------------- ---------- ---------- ------------------- ------------------- ---------- 19196 viz 1 1 2019-03-25T14:30:51 2-00:21:00 174060 2019-03-25T14:30:52 Unknown RUNNING 19704 long 500 13 2019-03-27T14:48:17 00:01:11 35500 2019-03-27T14:48:17 2019-03-27T14:49:28 COMPLETED 19705 medium 500 13 2019-03-27T14:48:24 00:01:11 35500 2019-03-27T14:48:24 2019-03-27T14:49:35 COMPLETED
By default the time window is current day but this can be controlled with
--start
and --end
option that are date fields.
If --start
is specified without --end
option then end time window will
be current time.
$ jobstats -j -S 2019-03-10 User: shahzeb Default Account: admin User is part of the following slurm accounts ['admin'] User Raw Share: 100 User Raw Usage: 39028847 Number of Pending Jobs: 0 Number of Running Jobs: 0 Total Jobs Completed: 3 Total Jobs Completed Successfully: 2 Total Jobs Failed: 0 Total Jobs Cancelled: 0 Total Jobs Timeout: 0 Today: 27/03/2019 14:55:17 sreport -------------------------------------------------------------------------------- Top 10 Users 2019-03-26T00:00:00 - 2019-03-26T23:59:59 (86400 secs) Usage reported in CPU Hours -------------------------------------------------------------------------------- Cluster Login Proper Name Account Used Energy --------- --------- --------------- --------------- -------- -------- slurm_cl+ shahzeb Siddiqui admin 24 0 Today Job Summary ________________________________________________________________________________ JobID Partition NCPUS NNodes Submit Elapsed CPUTimeRAW Start End State ------------ ---------- ---------- -------- ------------------- ---------- ---------- ------------------- ------------------- ---------- 18554 viz 1 1 2019-03-12T00:37:42 01:27:11 5231 2019-03-12T00:37:42 2019-03-12T02:04:53 CANCELLED+ 18555 long 50 2 2019-03-12T01:02:55 00:00:11 550 2019-03-12T01:02:55 2019-03-12T01:03:06 COMPLETED 18556 long 50 2 2019-03-12T01:09:10 00:01:11 3550 2019-03-12T01:09:11 2019-03-12T01:10:22 COMPLETED 18557 long 50 2 2019-03-12T01:09:11 00:01:10 3500 2019-03-12T01:09:14 2019-03-12T01:10:24 COMPLETED 18558 long 50 2 2019-03-12T01:09:12 00:01:10 3500 2019-03-12T01:09:14 2019-03-12T01:10:24 COMPLETED 18559 long 50 2 2019-03-12T01:09:12 00:01:10 3500 2019-03-12T01:09:14 2019-03-12T01:10:24 COMPLETED 18560 long 50 2 2019-03-12T01:09:13 00:01:11 3550 2019-03-12T01:09:14 2019-03-12T01:10:25 COMPLETED 18561 long 50 2 2019-03-12T01:09:13 00:01:10 3500 2019-03-12T01:09:14 2019-03-12T01:10:24 COMPLETED 18562 long 50 2 2019-03-12T01:09:13 00:01:10 3500 2019-03-12T01:09:14 2019-03-12T01:10:24 COMPLETED 18563 medium 500 13 2019-03-12T01:09:17 00:01:10 35000 2019-03-12T01:09:18 2019-03-12T01:10:28 COMPLETED 18564 medium 500 15 2019-03-12T01:09:18 00:01:10 35000 2019-03-12T01:09:18 2019-03-12T01:10:28 COMPLETED 18565 long 500 13 2019-03-12T01:09:18 00:01:10 35000 2019-03-12T01:10:29 2019-03-12T01:11:39 COMPLETED 18566 medium 500 13 2019-03-12T01:09:18 00:01:10 35000 2019-03-12T01:10:29 2019-03-12T01:11:39 COMPLETED 18567 long 500 13 2019-03-12T01:09:19 00:01:10 35000 2019-03-12T01:11:40 2019-03-12T01:12:50 COMPLETED 18568 medium 500 13 2019-03-12T01:09:19 00:01:13 36500 2019-03-12T01:11:40 2019-03-12T01:12:53 COMPLETED 18569 long 500 13 2019-03-12T01:09:20 00:01:11 35500 2019-03-12T01:12:50 2019-03-12T01:14:01 COMPLETED 18570 medium 500 13 2019-03-12T01:09:20 00:01:11 35500 2019-03-12T01:12:53 2019-03-12T01:14:04 COMPLETED 18571 long 500 13 2019-03-12T01:09:21 00:01:10 35000 2019-03-12T01:14:01 2019-03-12T01:15:11 COMPLETED 18572 long 500 13 2019-03-12T02:03:48 00:01:04 32000 2019-03-12T02:03:49 2019-03-12T02:04:53 CANCELLED+ 18573 medium 500 13 2019-03-12T02:03:49 00:01:01 30500 2019-03-12T02:03:52 2019-03-12T02:04:53 CANCELLED+ 18574 express,s+ 500 1 2019-03-12T02:03:50 00:00:00 0 2019-03-12T02:04:53 2019-03-12T02:04:53 CANCELLED+ 18575 express,s+ 500 1 2019-03-12T02:03:51 00:00:00 0 2019-03-12T02:04:53 2019-03-12T02:04:53 CANCELLED+ 18576 express,s+ 500 1 2019-03-12T02:03:51 00:00:00 0 2019-03-12T02:04:53 2019-03-12T02:04:53 CANCELLED+ 18577 express,s+ 500 1 2019-03-12T02:03:52 00:00:00 0 2019-03-12T02:04:53 2019-03-12T02:04:53 CANCELLED+ 19196 viz 1 1 2019-03-25T14:30:51 2-00:24:25 174265 2019-03-25T14:30:52 Unknown RUNNING 19704 long 500 13 2019-03-27T14:48:17 00:01:11 35500 2019-03-27T14:48:17 2019-03-27T14:49:28 COMPLETED 19705 medium 500 13 2019-03-27T14:48:24 00:01:11 35500 2019-03-27T14:48:24 2019-03-27T14:49:35 COMPLETED
Shown below is a job summary for time window 2019-01-01 - 2019-01-10.
$ jobstats -j -S 2019-01-01 -E 2019-01-10 User: shahzeb Default Account: admin User is part of the following slurm accounts ['admin'] User Raw Share: 100 User Raw Usage: 39023187 Number of Pending Jobs: 0 Number of Running Jobs: 0 Total Jobs Completed: 3 Total Jobs Completed Successfully: 2 Total Jobs Failed: 0 Total Jobs Cancelled: 0 Total Jobs Timeout: 0 Today: 27/03/2019 15:01:25 sreport -------------------------------------------------------------------------------- Top 10 Users 2019-03-26T00:00:00 - 2019-03-26T23:59:59 (86400 secs) Usage reported in CPU Hours -------------------------------------------------------------------------------- Cluster Login Proper Name Account Used Energy --------- --------- --------------- --------------- -------- -------- slurm_cl+ shahzeb Siddiqui admin 24 0 Today Job Summary ________________________________________________________________________________ JobID Partition NCPUS NNodes Submit Elapsed CPUTimeRAW Start End State ------------ ---------- ---------- -------- ------------------- ---------- ---------- ------------------- ------------------- ---------- 3558 short 8 8 2019-01-04T16:30:15 00:00:01 8 2019-01-04T16:30:16 2019-01-04T16:30:17 FAILED 3560 express 20 1 2019-01-05T17:58:40 00:03:21 4020 2019-01-05T17:58:41 2019-01-05T18:02:02 COMPLETED 3561 medium 3 3 2019-01-05T18:00:28 00:16:41 3003 2019-01-05T18:00:28 2019-01-05T18:17:09 COMPLETED
jobstats
can query historical jobs by the following job state
- FAILED
- COMPLETED
- TIMEOUT
- CANCELLED
This would be effective when used by start/end option as shown in query below
$ jobstats --state FAILED -S 2019-01-01 -E 2019-02-01 User: shahzeb Default Account: admin User is part of the following slurm accounts ['admin'] User Raw Share: 100 User Raw Usage: 39017527 Number of Pending Jobs: 0 Number of Running Jobs: 0 Total Jobs Completed: 3 Total Jobs Completed Successfully: 2 Total Jobs Failed: 0 Total Jobs Cancelled: 0 Total Jobs Timeout: 0 Today: 27/03/2019 15:11:39 sreport -------------------------------------------------------------------------------- Top 10 Users 2019-03-26T00:00:00 - 2019-03-26T23:59:59 (86400 secs) Usage reported in CPU Hours -------------------------------------------------------------------------------- Cluster Login Proper Name Account Used Energy --------- --------- --------------- --------------- -------- -------- slurm_cl+ shahzeb Siddiqui admin 24 0 Start Date: 2019-01-01 End Date: 2019-02-01 Job Summary by State: FAILED ________________________________________________________________________________ JobID User JobName Partition Account AllocCPUS ExitCode Submit Elapsed Start End State ------------ --------- ---------- ---------- ---------- ---------- -------- ------------------- ---------- ------------------- ------------------- ---------- 3558 shahzeb io500.sh short admin 8 1:0 2019-01-04T16:30:15 00:00:01 2019-01-04T16:30:16 2019-01-04T16:30:17 FAILED 4777 shahzeb helloWorl+ express admin 16 127:0 2019-01-14T14:38:36 00:00:07 2019-01-14T14:38:37 2019-01-14T14:38:44 FAILED 4778 shahzeb helloWorl+ express admin 16 127:0 2019-01-14T14:40:05 00:00:01 2019-01-14T14:40:06 2019-01-14T14:40:07 FAILED 6487 shahzeb interacti+ express admin 1 127:0 2019-01-22T19:12:44 00:00:10 2019-01-22T19:12:44 2019-01-22T19:12:54 FAILED 6490 shahzeb interacti+ express admin 1 2:0 2019-01-22T19:27:55 00:00:09 2019-01-22T19:27:55 2019-01-22T19:28:04 FAILED 6518 shahzeb hostname viz admin 0 1:0 2019-01-23T14:06:01 00:00:00 2019-01-23T14:06:01 2019-01-23T14:06:01 FAILED 6519 shahzeb hostname viz admin 0 1:0 2019-01-23T14:06:09 00:00:00 2019-01-23T14:06:09 2019-01-23T14:06:09 FAILED 6520 shahzeb hostname viz admin 0 1:0 2019-01-23T14:06:25 00:00:00 2019-01-23T14:06:25 2019-01-23T14:06:25 FAILED 6521 shahzeb hostname viz admin 0 1:0 2019-01-23T14:06:38 00:00:00 2019-01-23T14:06:38 2019-01-23T14:06:38 FAILED 6527 shahzeb hostname viz admin 30 1:0 2019-01-23T14:10:25 00:00:00 2019-01-23T14:10:25 2019-01-23T14:10:25 FAILED
jobstats defaults to current user but you can select a different user by using -u
or --user
option
and use all the above commands mentioned above.
If you want to find user association to slurm account and fairshare usage you can use the -a
option.
$ jobstats -a -u user1 User: admin Default Account: hpceng User is part of the following slurm accounts ['admin'] User Raw Share: 100 User Raw Usage: 38228406 Number of Pending Jobs: 0 Number of Running Jobs: 0 Total Jobs Completed: 1 Total Jobs Completed Successfully: 0 Total Jobs Failed: 0 Total Jobs Cancelled: 0 Total Jobs Timeout: 0 Today: 28/03/2019 14:45:52 sreport -------------------------------------------------------------------------------- Top 10 Users 2019-03-27T00:00:00 - 2019-03-27T23:59:59 (86400 secs) Usage reported in CPU Hours -------------------------------------------------------------------------------- Cluster Login Proper Name Account Used Energy --------- --------- --------------- --------------- --------- -------- Shares for Account admin Account User RawShares NormShares RawUsage EffectvUsage FairShare -------------------- ---------- ---------- ----------- ----------- ------------- ---------- admin 200 0.076894 102749687 0.041010 0.690955 admin user1 100 0.010985 0 0.000000 1.000000 admin shahzeb 100 0.010985 38228406 0.006392 0.668093