-
-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track Event Loop intervals in dashboard plot #5964
Changes from all commits
c894f40
c272458
953af64
1d421b0
876d860
8fa8c46
d9ebab2
b780de3
55e8090
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3018,6 +3018,59 @@ def update(self): | |
) | ||
|
||
|
||
class EventLoop(DashboardComponent): | ||
"""Event Loop Health""" | ||
|
||
def __init__(self, scheduler, **kwargs): | ||
with log_errors(): | ||
self.scheduler = scheduler | ||
self.source = ColumnDataSource( | ||
{ | ||
"names": ["Scheduler", "Workers"], | ||
"values": [0, 0], | ||
"text": ["0", "0"], | ||
} | ||
) | ||
|
||
self.root = figure( | ||
title="Event Loop Health", | ||
x_range=["Scheduler", "Workers"], | ||
y_range=[ | ||
0, | ||
parse_timedelta(dask.config.get("distributed.admin.tick.interval")) | ||
* 25, | ||
], | ||
tools="", | ||
toolbar_location="above", | ||
**kwargs, | ||
) | ||
self.root.vbar(x="names", top="values", width=0.9, source=self.source) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ideally I'd like to see color as well (yellow if the interval is over a certain threshold, like the worker memory plot). I think 0.1s would be a reasonable threshold; this is asyncio's default There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with the ideal. However, I'd like to get this in and focus elsewhere for now if possible. |
||
|
||
self.root.xaxis.minor_tick_line_alpha = 0 | ||
self.root.ygrid.visible = True | ||
self.root.xgrid.visible = False | ||
|
||
hover = HoverTool(tooltips=[("Interval", "@text s")], mode="vline") | ||
self.root.add_tools(hover) | ||
|
||
@without_property_validation | ||
def update(self): | ||
with log_errors(): | ||
s = self.scheduler | ||
|
||
data = { | ||
"names": ["Scheduler", "Workers"], | ||
"values": [ | ||
s._tick_interval_observed, | ||
sum([w.metrics["event_loop_interval"] for w in s.workers.values()]) | ||
/ (len(s.workers) or 1), | ||
], | ||
} | ||
data["text"] = [format_time(x) for x in data["values"]] | ||
|
||
update(self.source, data) | ||
|
||
|
||
class WorkerTable(DashboardComponent): | ||
"""Status of the current workers | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,7 @@ | |
ClusterMemory, | ||
ComputePerKey, | ||
CurrentLoad, | ||
EventLoop, | ||
MemoryByKey, | ||
Occupancy, | ||
SystemMonitor, | ||
|
@@ -97,6 +98,7 @@ | |
"/individual-compute-time-per-key": individual_doc(ComputePerKey, 500), | ||
"/individual-aggregate-time-per-action": individual_doc(AggregateAction, 500), | ||
"/individual-scheduler-system": individual_doc(SystemMonitor, 500), | ||
"/individual-event-loop": individual_doc(EventLoop, 500), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should the interval for this plot correspond to the tick cycle? Otherwise it will run unnecessarily often. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's pretty cheap. I don't think that this matters much. |
||
"/individual-profile": individual_profile_doc, | ||
"/individual-profile-server": individual_profile_server_doc, | ||
"/individual-gpu-memory": gpu_memory_doc, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use a monotonic timer here #4528
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently use metrics.time because of windows support. If we want to switch to something else that's better I'm all in favor, but I hesitate to launch into this immediately. Instead I'd rather that we kept with the standard way of doing things, and then considered switching out to monotonic en-masse. Thoughts?