From ba16b1d3881140a5b85bd126cb69d68b8b40284d Mon Sep 17 00:00:00 2001 From: Jim Garlick Date: Thu, 17 Oct 2024 16:07:38 -0700 Subject: [PATCH] rfc15: describe signal forwarding details Problem: the IMP kill subcommand is briefly mentioned as the way to signal guest processes, but this is inadequate in practice. Now that the IMP lingers, just have it forward signals to the job shell. In addition, describe a surrogate signal that tells the IMP to do its best to clean up the entire job container. See also: flux-framework/flux-core#6011 --- spec_15.rst | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/spec_15.rst b/spec_15.rst index d173311..2f41599 100644 --- a/spec_15.rst +++ b/spec_15.rst @@ -372,13 +372,25 @@ A multi-user instance of Flux not only requires the ability to execute work as a guest user, but it must also have privilege to monitor and kill these processes as part of normal resource manager operation. -Signaling and terminating jobs in a multi-user instance -------------------------------------------------------- +Signal Handling +--------------- -For terminating and signaling processes the IMP SHALL include a ``kill`` -subcommand which, using the process tracking functionality, SHALL allow -an instance owner to signal or terminate any guest processes including -ancestors thereof that were started by the owner’s instance. +The IMP runs with an effective user ID of root and a real user id of the +system instance owner, thus the system instance owner is permitted to signal +the IMP. In contrast, the system instance owner is not permitted to signal +guest user processes. + +To enable the instance owner to signal guest jobs, the IMP SHALL act +as a proxy for the job by trapping common signals and forwarding them to +the job shell. + +To enable the instance owner to fully clean up when the job shell is unable +to do so, the IMP SHALL handle SIGUSR1 as a surrogate for SIGKILL. Upon +receipt of this signal, the IMP SHOULD deliver SIGKILL to all processes in +the job's container, including the job shell. + +The mechanism by which processes are identified to receive SIGKILL is +outside the scope of this document. IMP configuration =================