forked from hakavlad/nohang
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnohang.conf
332 lines (210 loc) · 8.7 KB
/
nohang.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
This is nohang config file.
Lines starting with #, tabs and spaces are comments.
Lines starting with @ contain optional parameters.
The configuration includes the following sections:
1. Memory levels to respond to as an OOM threat
2. Response on PSI memory metrics
3. The frequency of checking the level of available memory
(and CPU usage)
4. The prevention of killing innocent victims
5. Impact on the badness of processes via matching their
- names,
- cgroups,
- cmdlines and
- UIDs
with regular expressions
6. Customize corrective actions: the execution of a specific command
instead of sending the SIGTERM signal
7. GUI notifications:
- OOM prevention results and
- low memory warnings
8. Output verbosity
9. Misc
Just read the description of the parameters and edit the values.
Please restart the program after editing the config.
Bool values are case sensitive.
#####################################################################
1. Thresholds below which a signal should be sent to the victim
Sets the available memory levels at or below which SIGTERM or SIGKILL
signals are sent. The signal will be sent if MemAvailable and
SwapFree (in /proc/meminfo) at the same time will drop below the
corresponding values. Can be specified in % (percent) and M (MiB).
Valid values are floating-point numbers from the range [0; 100] %.
MemAvailable levels.
mem_min_sigterm = 10 %
mem_min_sigkill = 5 %
SwapFree levels.
swap_min_sigterm = 15 %
swap_min_sigkill = 5 %
Specifying the total share of zram in memory, if exceeded the
corresponding signals are sent. As the share of zram in memory
increases, it may fall responsiveness of the system. 90 % is a
usual hang level, not recommended to set very high.
Can be specified in % and M. Valid values are floating-point
numbers from the range [0; 90] %.
zram_max_sigterm = 50 %
zram_max_sigkill = 55 %
#####################################################################
2. Response on PSI memory metrics (it needs Linux 4.20 and up)
About PSI:
https://facebookmicrosites.github.io/psi/
Disabled by default (ignore_psi = True).
ignore_psi = True
Choose a path to PSI file.
By default it monitors system-wide file: /proc/pressure/memory
You also can set file to monitor one cgroup slice.
For example:
psi_path = /sys/fs/cgroup/unified/user.slice/memory.pressure
psi_path = /sys/fs/cgroup/unified/system.slice/memory.pressure
psi_path = /sys/fs/cgroup/unified/system.slice/foo.service/memory.pressure
Execute the command
find /sys/fs/cgroup -name memory.pressure
to find available memory.pressue files (except /proc/pressure/memory).
(actual for cgroup2)
psi_path = /proc/pressure/memory
Valid psi_metrics are:
some_avg10
some_avg60
some_avg300
full_avg10
full_avg60
full_avg300
some_avg10 is most sensitive.
psi_metrics = some_avg10
sigterm_psi_threshold = 80
sigkill_psi_threshold = 90
psi_post_action_delay = 60
#####################################################################
3. The frequency of checking the amount of available memory
(and CPU usage)
Coefficients that affect the intensity of monitoring. Reducing
the coefficients can reduce CPU usage and increase the periods
between memory checks.
Why three coefficients instead of one? Because the swap fill rate
is usually lower than the RAM fill rate.
It is possible to set a lower intensity of monitoring for swap
without compromising to prevent OOM and thus reduce the CPU load.
Default values are well for desktop. On servers without rapid
fluctuations in memory levels the values can be reduced.
Valid values are positive floating-point numbers.
rate_mem = 4000
rate_swap = 1500
rate_zram = 500
See also https://github.com/rfjakob/earlyoom/issues/61
max_sleep_time = 3
min_sleep_time = 0.1
#####################################################################
4. The prevention of killing innocent victims
Valid values are integers from the range [0; 1000].
min_badness = 20
Valid values are non-negative floating-point numbers.
min_delay_after_sigterm = 0.2
min_delay_after_sigkill = 1
Valid values are True and False.
Values are case sensitive.
decrease_oom_score_adj = False
Valid values are integers from the range [0; 1000].
oom_score_adj_max = 20
#####################################################################
5. Impact on the badness of processes via matching their names,
cmdlines or UIDs with regular expressions using re.search().
See https://en.wikipedia.org/wiki/Regular_expression and
https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions
Enabling this options slows down the search for the victim
because the names, cmdlines or UIDs of all processes
(except init and kthreads) are compared with the
specified regex patterns (in fact slowing down is caused by
reading all /proc/*/cmdline and /proc/*/status files).
Use script `oom-sort` from nohang package to view
names, cmdlines and UIDs of processes.
5.1 Matching process names with RE patterns
Syntax:
@PROCESSNAME_RE badness_adj /// RE_pattern
New badness value will be += badness_adj
It is possible to compare multiple patterns
with different badness_adj values.
Example:
@PROCESSNAME_RE -500 /// ^sshd$
5.2 Matching cmdlines with RE patterns
A good option that allows fine adjustment.
@CMDLINE_RE 300 /// -childID|--type=renderer
@CMDLINE_RE -200 /// ^/usr/lib/virtualbox
5.3 Matching UIDs with RE patterns
The most slow option
@UID_RE -100 /// ^0$
5.4 Matching CGroup-line with RE patterns
@CGROUP_V1_RE -50 /// ^/system.slice
@CGROUP_V1_RE 50 /// foo.service
@CGROUP_V1_RE -50 /// ^/user.slice
@CGROUP_V2_RE 100 /// ^/workload
5.5 Matching realpath with RE patterns
@REALPATH_RE 20 /// ^/usr/bin/foo
5.6 Matching environ with RE patterns
@ENVIRON_RE 100 /// USER=user
Note that you can control badness also via systemd units via OOMScoreAdjust, see
https://www.freedesktop.org/software/systemd/man/systemd.exec.html#OOMScoreAdjust=
#####################################################################
6. Customize corrective actions.
TODO: docs
Syntax:
KEY REGEXP SEPARATOR COMMAND
@SOFT_ACTION_RE_NAME ^foo$ /// kill -SEGV $PID
@SOFT_ACTION_RE_NAME ^bash$ /// kill -9 $PID
@SOFT_ACTION_RE_CGROUP_V1 ^/system.slice/ /// systemctl restart $SERVICE
@SOFT_ACTION_RE_CGROUP_V1 foo.service$ /// systemctl restart $SERVICE
$PID will be replaced by process PID.
$NAME will be replaced by process name.
$SERVICE will be replaced by .service if it exists (overwise it will be relpaced by empty line).
#####################################################################
7. GUI notifications:
- OOM prevention results and
- low memory warnings
gui_notifications = False
Enable GUI notifications about the low level of available memory.
Valid values are True and False.
gui_low_memory_warnings = False
Execute the command instead of sending GUI notifications if the value is
not empty line. For example:
warning_exe = cat /proc/meminfo &
warning_exe =
Can be specified in % (percent) and M (MiB).
Valid values are floating-point numbers from the range [0; 100] %.
mem_min_warnings = 25 %
swap_min_warnings = 50 %
zram_max_warnings = 40 %
Valid values are floating-point numbers from the range [1; 300].
min_time_between_warnings = 15
Ampersands (&) will be replaced with asterisks (*) in process
names and in commands.
#####################################################################
8. Verbosity
Display the configuration when the program starts.
Valid values are True and False.
print_config = False
Print memory check results.
Valid values are True and False.
print_mem_check_results = False
min_mem_report_interval = 60
Print sleep periods between memory checks.
Valid values are True and False.
print_sleep_periods = False
print_total_stat = True
print_proc_table = True
Valid values:
None
cgroup_v1
cgroup_v2
cmdline
environ
realpath
All
extra_table_info = cgroup_v1
print_victim_info = True
max_ancestry_depth = 1
separate_log = False
psi_debug = False
#####################################################################
9. Misc
max_post_sigterm_victim_lifetime = 10
post_kill_exe =
forbid_negative_badness = True