Skip to content

Commit

Permalink
Update task_proc to better handle killing tasks (#951)
Browse files Browse the repository at this point in the history
We've seen that task_proc/k8s will sometimes not correctly send events
for pods that we try to kill ourselves (either because the pods are
already gone or because the event is somehow missing data), so this
task_proc version will send synthetic events when we call kill() to
ensure that tron is in the correct state :)

Co-authored-by: Jen Patague <[email protected]>
  • Loading branch information
nemacysts and jfongatyelp authored May 9, 2024
1 parent 19642d4 commit 81ea462
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 2 deletions.
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ setuptools==65.5.1
six==1.15.0
sshpubkeys==3.1.0
stack-data==0.6.2
task-processing==0.13.0
task-processing==0.14.0
traitlets==5.0.0
Twisted==22.10.0
typing-extensions==4.5.0
Expand Down
12 changes: 11 additions & 1 deletion tron/kubernetes.py
Original file line number Diff line number Diff line change
Expand Up @@ -404,7 +404,17 @@ def _handle_task_event(self, event: Event) -> None:
return

if task_id not in self.tasks.keys():
log.warning(f"Got event for unknown task ({task_id} not in {self.tasks.keys()}): {event}")
# NOTE: we don't log killed events for tasks we don't know about, as we do some slightly
# funky things with these events: namely, we'll send our own synthetic killed event to
# work around some weird k8s event behavior we've seen in the past where the coalesced
# event that we get in the task_processing watch loop either doesn't have the correct state
# or is missing entirely. This is a bit of a hack, I'm sorry :(
# That said, without this we'd get somewhat annoying logspam in the tron logs whenever our
# workaround logic runs but k8s sends the correct event faster than we can send our synthetic
# one and the hackiness of this is somewhat removed by the `event.raw` check - that should only
# exclude our synthetic event.
if not (event.platform_type == "killed" and event.raw is None):
log.warning(f"Got event for unknown task ({task_id} not in {self.tasks.keys()}): {event}")
return

task = self.tasks[task_id]
Expand Down

0 comments on commit 81ea462

Please sign in to comment.