-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathsqlog.1
485 lines (438 loc) · 15.1 KB
/
sqlog.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
.\" $Id$
.\"
.TH SQLOG 1 "SLURM Query Log"
.SH NAME
sqlog \- SLURM query log utility
.SH SYNOPSIS
.B sqlog
[\fIOPTIONS\fR]...
.SH DESCRIPTION
The \fBsqlog\fR utility provides a single interface to query information
about jobs from the SLURM job log database and/or the current queue
of running jobs.
By default both the current queue of running jobs and the database
of completed jobs are queried, and a limit of 25 results is displayed.
If more results are available in the database or queue, sqlog will
note the fact with an informational message to stderr:
.nf
sqlog: [More results available....]
.fi
This message is suppressed if the \fB--no-header\fR option is provided.
.SH CONFIGURATION
\fBsqlog\fR reads configuration from the \fBsqlog.conf\fR config file
(typically in /etc/slurm). This config file provides information about
the SLURM job log database location and username. In addition,
\fBsqlog\fR reads user defaults and additional output format types
from a ~/.sqlog file if it exists. See the USER CONFIG section
below for more information.
Various config parameters are set in the following order:
Internal defaults, system configuration, user configuration,
and command-line.
.SH OPTIONS
.TP
.BI "-h, --help"
Display a summary of the command-line options.
.TP
.BI "-v, --verbose"
Increase debugging verbosity of the program.
.TP
.BI "--dry-run"
Don't actually do anything.
.TP
.BI "-j, --jobids " LIST
Provide a comma-separated list of jobids to include (or exclude if
preceded by the \fI-x\fR, \fI--exclude\fR option). This option may
be specified multiple times.
.TP
.BI "-J, --job-names " LIST
Provide a comma-separated list of job names to include (or exclude if
preceded by the \fI-x\fR, \fI--exclude\fR option). This option may
be specified multiple times.
.TP
.BI "-n, --nodes " LIST
Provide a comma-separated list of nodes or node lists to include
(or exclude if preceded by the \fI-x\fR, \fI--exclude\fR option).
This option may be specified multiple times. Node lists can be
in hostlist format, e.g. host[34-36,67].
.TP
.BI "-p, --partitions " LIST
Provide a comma-separated list of partitions to include (or exclude if
preceded by the \fI-x\fR, \fI--exclude\fR option). This option may
be specified multiple times.
.TP
.BI "-s, --states " LIST
Provide a comma-separated list of job states to include (or exclude if
preceded by the \fI-x\fR, \fI--exclude\fR option). This option may
be specified multiple times. Use --states=list to generate list of valid
job state keys.
.TP
.BI "-u, --users " LIST
Provide a comma-separated list of users to include (or exclude if
preceded by the \fI-x\fR, \fI--exclude\fR option). This option may
be specified multiple times.
.TP
.BI "--regex"
Enable regular expressions instead of exact matching for the following list
of jobids, job names, partitions, states, and user names. For example
.nf
sqlog --regex --job-names='^foo-.*'
.fi
.TP
.BI "-x, --exclude"
Exclude the following list of jobids, users, partitions, nodes, job names,
or job states. For example: \fB--exclude --jobids\fR=\fILIST\fR or
\fB-xn\fR \fINODES\fR.
.TP
.BI "-N, --nnodes " N
List jobs that ran on exactly \fIN\fR nodes. \fIN\fR may also have the
form +\fIN\fR, -\fIN\fR, or \fIN\fR..\fIM\fR, to specify a minimum,
maximum, or range of node counts. For examples see RANGE OPERATORS
section below.
.TP
.BI "--minnodes " N
Explicity specify a minimum node count. This is equivalent to using
the option \fB--nnodes\fR=+\fIN\fR.
.TP
.BI "--maxnodes " N
Explicity specify a maximum node count. This is equivalent to using
the option \fB--nnodes\fR=-\fIN\fR.
.TP
.BI "-C, --ncores " N
List jobs that ran on exactly \fIN\fR cores. \fIN\fR may also have the
form +\fIN\fR, -\fIN\fR, or \fIN\fR..\fIM\fR, to specify a minimum,
maximum, or range of node counts. For examples see RANGE OPERATORS
section below.
.TP
.BI "--mincores " N
Explicity specify a minimum core count. This is equivalent to using
the option \fB--ncores\fR=+\fIN\fR.
.TP
.BI "--maxcores " N
Explicity specify a maximum core count. This is equivalent to using
the option \fB--ncores\fR=-\fIN\fR.
.TP
.BI "-T, --runtime " DURATION
List jobs that ran or have run for \fIDURATION\fR. \fIDURATION\fR may
have the form DD-HH:MM:SS or DDdHHhMMmSSs, where DD is days, HH
hours, MM minutes, and SS seconds. In the second form, values that
are zero may be left out, e.g. 4h. In the first form, days, hours,
and minutes are optional, e.g. :04 is 4 seconds. \fIDURATION\fR may
be specified as +\fIT\fR, -\fIT\fR or \fIT1\fR..\fIT2\fR to specify
a min, max, or runtime range. For more information, see the RANGE
OPERATORS section below.
.TP
.BI "--mintime " DURATION
List all jobs that ran for at least \fIDURATION\fR.
This is equivalent to specifying \fB--runtime\fR=+\fIDURATION\fR.
.TP
.BI "--maxtime " DURATION
List all jobs that ran for at most \fIDURATION\fR.
This is equivalent to specifying \fB--runtime\fR=-\fIDURATION\fR.
.TP
.BI "-t, --time, --at " TIME
List jobs which were running at a particular date and time.
\fITIME\fR arguments are parsed by the perl \fBDate::Manip\fR(3pm)
package, so many date/time formats are allowed, i.e. 2pm or
"4/14 15:30:00". A window of time may be specified by separating the
start and end of the window with "..", e.g. \fB--time\fR=2pm..3pm.
For more information see the RANGE OPERATORS section below.
.TP
.BI "-S, --start " TIME
List all jobs that started at date and time \fITIME\fR. \fITIME\fR may
have the form +\fIT\fR, -\fIT\fR, or \fIT1\fR..\fIT2\fR to specify a
minimum, maximum, or start time range. See RANGE OPERATORS below
for more information.
.TP
.BI "--start-after " TIME
List all jobs that started after time \fITIME\fR. This is equivalent
to using \fB--start\fR=+\fITIME\fR.
.TP
.BI "--start-before " TIME
List all jobs that started before time \fITIME\fR. This is equivalent
to using \fB--start\fR=-\fITIME\fR.
.TP
.BI "-E, --end " TIME
List all jobs that ended at date and time \fITIME\fR. \fITIME\fR may
have the form +\fIT\fR, -\fIT\fR, or \fIT1\fR..\fIT2\fR to specify a
minimum, maximum, or end time range (see RANGE OPERATORS below for
more information). For running jobs, SLURM uses
an estimated end time, so end times in the future are valid and will
be used. (and using \fB--end\fR=+now would list all currently
running jobs, since they all end in the future.)
.TP
.BI "--end-after " TIME
List all jobs that ended (or will end) after time \fITIME\fR. This is
equivalent to using \fB--end\fR=+\fITIME\fR.
.TP
.BI "--end-before " TIME
List all jobs that ended (or will end) before time \fITIME\fR. This is
equivalent to using \fB--end\fR=-\fITIME\fR.
.TP
.BI "-X, --no-running"
Do not query running jobs, i.e. ignore the current queue and only
query the SLURM job log database.
.TP
.BI "--no-db"
Do not query the SLURM job log database, i.e. only query the current
queue.
.TP
.BI "-H, --no-header"
Do not display header rows in output.
.TP
.BI "-o, --format " LIST
Specify a list of format keys to display or a format type, or both
using the form \fITYPE\fR:\fIKEYS\fR... Use \fB--format\fR=list to
list valid keys and types. See OUTPUT FORMAT below for further
information.
.TP
.BI "-P, --sort " LIST
Specify a list of keys on which to sort output. Prepend a '-' to sort
in descending as opposed to ascending order. List valid keys
using \fB--sort\fR=list. The default sort method is '-start'.
.TP
.BI "-L, --limit " N
Limit the number of records to report (Default = 25).
.TP
.BI "-a, --all"
Do not limit the number of returned results. (Return all matching rows).
This is equivalent to \fB--limit\fR=0.
.SH RANGE OPERATORS
\fITIME\fR, \fIDURATION\fR, and numeric arguments may use the
RANGE OPERATORS '+', '-', and '..' to specify minimum, maximum,
or a range of values respectively. TIME arguments may also use
the '@' symbol to escape a leading + or - in the TIME itself
(e.g. '-1hr' means '1 hr ago'). The \fB--time\fB, \fB--start\fR,
\fB--end\fR, \fB--runtime\fB, and \fB--nnodes\fR options to
\fBsqlog\fR all take RANGE OPERATORS.
.TP
Examples
.TP 20
.BI "--nnodes " +8
Jobs that ran with 8 or more nodes.
.TP
.BI "--nnodes " 16..32
Jobs that ran with between 16 and 32 nodes, inclusive.
.TP
.BI "--runtime " -2h
Jobs that ran for 2 hours or less.
.TP
.BI "--runtime " 5m..1hr
Jobs that ran for between 5 minutes and 1 hour, inclusive.
.TP
.BI "--end " 2pm..3pm
Jobs that ended today between 2PM and 3PM, inclusive.
.TP
.BI "--time " 7/17..7/18
Jobs that ran anytime from 12AM, 7/17 to 12AM, 7/18.
.TP
.BI "--time " "+'1 hour ago'"
Jobs that ran in the past hour (1 hour ago or later).
.TP
.BI "--time " "+-1hr (or +@-1hr)"
Same as above.
.TP
.BI "--time " @-1hr
Jobs that were running exactly at one hour ago.
.TP
.BI "--time " @-2hr..-1hr
Jobs that were running between 2 hours ago and 1 hour ago.
.SH USER CONFIGURATION
When \fBsqlog\fR runs, it will first check for a ~/.sqlog file and
parse it if it exists. At this time, the ~/.sqlog file may be used
to set a new default limit (see \fB--limit\fR) and addtional output format
types (see \fB--format\fR). These two configuration parameters take the form:
.TP 20
\fBlimit\fR = \fIN\fR
Set the new default output limit to \fIN\fR.
.TP
\fBformat{\fINAME\fB}\fR = \fILIST...\fR
Create an alias \fINAME\fR for the format list \fILIST\fR.
.PP
For example, the following sqlog file
.nf
# Sample ~/.sqlog file
limit = 30
format{mine} = long:start,end,jobid,user,state
.fi
would set the default output limit to 30 records and
add a new format type \fImine\fR. The new format type would
be used by specifying
.nf
\fB--format\fR \fImine\fR
.fi
on the command line, which would be equivalent to
.nf
\fB--format\fR long:start,end,jobid,user,state
.fi
Any number of format types may be specified in this way, though
if there are duplicate names, the last one specified will override
all previous types. This also implies that a user can redefine
the default \fBsqlog\fR format types \fIshort\fR, \fIlong\fR,
and \fIfreeform\fR, though this is not recommended.
.SH OUTPUT FORMAT
\fBsqlog\fR provides precise control over the output format, which aids with
readability and simplifies parsing via scripts. When parsing output, be sure
to specify each field and the expected order using the -o,--format option.
The built-in formats (short, long, and freeform) may add or reorder fields
over time.
By default, \fBsqlog\fR uses the output format
.nf
short:jobid,partition,name,user,state,start,runtime,ncores,nnodes,nodes
.fi
The \fIshort:\fR preceeding the format specification tells \fBsqlog\fR
to use the \fIshort\fR form of each of the format keys. The result
is what you see when running \fBsqlog\fR without using the -o,--format
option. All format keys currently available are detailed here. Some
keys have shorter aliases that are provided for convenience. These
are listed alongside the full key name below. Note that all these
keys can also be listed by using --\fIformat=list\fR.
.TP 20
.B "jobid | jid"
The SLURM jobid for this job.
.TP
.B "partition | part"
The SLURM partition in which the job ran or is running.
.TP
.B "name"
The name of the job as recorded by SLURM.
.TP
.B "user"
The username of the user running the job.
.TP
.B "state | st"
The current or final state of the job. See JOB STATE CODES
for a description of the two-letter codes that this field
displays by default.
.TP
.B "start"
The start time of the job in the form MM/DD-HH:MM:SS.
.TP
.B "runtime | time"
The total runtime of the job in the form
DD-HH:MM:SS. Leading zero values may be dropped,
for instance 4:30 is 4 minutes 30 seconds.
.TP
.B "ncores | C"
The number of cores allocated to the job.
.TP
.B "nnodes | N"
The number of nodes allocated to the job.
.TP
.B "nodes"
The nodelist that was allocated to the job. Note that for
completing jobs (CG) this nodelist will be restricted to
the currently completing nodes for the job. To see the
full nodelist, restrict \fBsqlog\fR to the database only,
i.e. run with the -X, --no-running option.
.TP
.B "runtime_s | time_s"
The total job runtime in seconds.
.TP
.B "end"
The time at which the job completed in the form
MM/DD-HH:MM:SS.
.TP
.B "longstart"
Date and time the job started in the form
YYYY-MM-DDTHH:MM:SS. This is displayed by
default in the \fIlong\fR format type.
.TP
.B "longend"
Date and time the job ended in the form
YYYY-MM-DDTHH:MM:SS. This is displayed by
default in the \fIlong\fR format type.
.TP
.B "unixstart"
Job start time in seconds since epoch.
.TP
.B "unixend"
Job end time in seconds since epoch.
.TP 0
A format type may be specified in addition to the format fields. These change the output width and in some cases the output format of the fields above. The format type may also be specified alone to the \fI--format\fR option. For instance \fI--format=long\fR would choose the default fields configured for the \fIlong\fR format type.
.TP 20
.B "short"
This is the default output type. It uses the format fields:
jobid,part,name,user,state,start,runtime,ncores,nnodes,nodes
.TP
.B "long"
This format type uses longer widths for most fields, and
displays the the full job state code by default (e.g.
completing instead of CG). Its default format fields are:
jobid,part,name,user,state,longstart,longend,runtime,ncores,nnodes,nodes
.TP
.B "freeform"
This is a freeform output in which full width fields are displayed
separated by whitespace. This would be used for parsing sqlog
output for instance, to guarantee no field is trunctated.
It uses the same format fields as the \fBlong\fR format type.
.SH JOB STATE CODES
In normal output, job states are displayed with two letter abbreviations
in \fBsqlog\fR output. Job state codes are fully explained in the
\fBsqueue\fR(1) man page, but the abbreviations are restated here
for completeness.
.TP 20
.B "CA CANCELLED"
Job was cancelled.
.TP
.B "CD COMPLETED"
Job completed normally.
.TP
.B "CG COMPLETING"
Job is in the process of completing.
.TP
.B "F FAILED"
Job termined abnormally.
.TP
.B "NF NODE_FAIL"
Job terminated due to node failure.
.TP
.B "PD PENDING"
Job is pending allocation.
.TP
.B "R RUNNING"
Job currently has an allocation.
.TP
.B "S SUSPENDED"
Job is suspended.
.TP
.B "TO TIMEOUT"
Job terminated upon reaching its time limit.
.SH EXAMPLES
Display the job or jobs that were running on host55 at July 19, 4:00PM:
.nf
sqlog --time="July 19, 4pm" --nodes=host55
.fi
Display at most 25 jobs that were running at midnight yesterday:
.nf
sqlog --time=yesterday,midnight
.fi
Display all jobs that failed between 8:00AM and 9:00AM this morning,
sorted by descending endtime:
.nf
sqlog --all --end=8am..9am --states=F --sort=-end
.fi
Display all jobs that started today:
.nf
sqlog --start=+midnight --all
.fi
Display all jobs that have run between 3 and 4 hours on the nodes
host30 through host65, and that didn't complete normally
.nf
sqlog -L 0 -T=3h..4h -n 'host[30-65]' -xs completed
.fi
Display all jobs that were running yesterday with 1000 nodes or
greater and completed normally:
.nf
sqlog -t yesterday,12am..12am -s CD -N +1000
.fi
List current queue, sorted by number of nodes (ascending):
.nf
sqlog --all --no-db --sort=nnodes
.fi
List the top 10 longest running jobs, and then the 5 oldest jobs:
.nf
sqlog --sort=runtime --limit=10
sqlog --sort=-start --limit=5
.fi
.SH AUTHOR
Written by Adam Moody and Mark Grondona.