Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the deploy script kill child processes upon exit #36

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bjornfor
Copy link

@bjornfor bjornfor commented Nov 1, 2020

This allows aborting the deploy with Ctrl-C.

Fixes #5.

This allows aborting the deploy with Ctrl-C.

Fixes infinisil#5.
@cole-h
Copy link
Contributor

cole-h commented Nov 1, 2020

Last time I attempted this, @infinisil pointed out that this wouldn't kill remote deploy processes. So, you prevent the local deploy from finishing, but not the remote ones.

https://logs.nix.samueldr.com/nixus/2020-08-26#1598402208-1598402243

@bjornfor
Copy link
Author

bjornfor commented Nov 1, 2020

Ok, hehe, I wondered why nobody had done it if it was this easy.

@infinisil
Copy link
Owner

I guess this is better than nothing though. Most of the time when I need to Ctrl-C, it's during copying-closure time, which this would handle just fine. It's only a problem once it gets to https://github.com/Infinisil/nixus/blob/2127516f68ce6f900b38289985377559e69aab88/modules/deploy.nix#L106, which is the line that starts the remote process

@bjornfor
Copy link
Author

bjornfor commented Nov 20, 2020

I tested a bit more and found that ssh -tt actually fixes terminating remote processes. (I did this in a separate test script, not with nixus.) Does anyone know of a reason not to make nixus use ssh -tt?

Here is the test script:

#!/usr/bin/env bash
set -x

#trap "echo Received signal, exiting; exit" INT TERM
trap "exit" INT TERM
trap "kill 0" EXIT

# Remove the ssh -tt flags and see that when this script gets aborted, the "sleep
# 12m" command gets reparented to PID 1 on srv1 and lives on. With -tt it gets
# properly terminated.
ssh -tt srv1 sleep 12m &
sleep 12m &
wait

@bjornfor
Copy link
Author

AFAICS, there is only one place where we'd need ssh -tt, thats where we switch to the new system:

diff --git a/modules/deploy.nix b/modules/deploy.nix
index 42f7f0f..5f1a9af 100644
--- a/modules/deploy.nix
+++ b/modules/deploy.nix
@@ -103,7 +103,9 @@ let
           privilegeEscalation = builtins.concatStringsSep " " config.privilegeEscalationCommand;
         in lib.dag.entryAnywhere ''
         echo "Triggering system switcher..." >&2
-        id=$(ssh -o BatchMode=yes "$HOST" exec "${switch}/bin/switch" start "${system}")
+        # Use ssh -tt flags to also interrupt/stop the command on the remote
+        # side when the user presses Ctrl-C on the local side.
+        id=$(ssh -tt -o BatchMode=yes "$HOST" exec "${switch}/bin/switch" start "${system}")

         echo "Trying to confirm success..." >&2
         active=1

The other ssh calls are short-lived (like running mkdir or readlink).

@infinisil
Copy link
Owner

I can't find any docs for -tt, there's only -t? Also, Nixus' switch start command itself forks another process (a switch run) into the background, so I don't expect that to work anyways. See https://github.com/Infinisil/nixus/blob/2127516f68ce6f900b38289985377559e69aab88/scripts/switch#L61

@bjornfor
Copy link
Author

I can't find any docs for -tt, there's only -t?

Right, in the docs for -t it says Multiple -t options force tty allocation, even if ssh has no local tty..

Nixus' switch start command itself forks another process (a switch run) into the background, so I don't expect that to work anyways.

Ok. I'm not familiar with that code. I guess at some point you don't want the script to be aborted? Some kind of "atomic section"?

@@ -214,6 +214,8 @@ in {
# TODO: Handle signals to kill the async command
nixus.pkgs.writeScript "deploy" ''
#!${nixus.pkgs.runtimeShell}
trap "echo Received signal, exiting; exit" INT TERM
trap "kill 0" EXIT
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, I'm not actually sure how this does anything. According to this answer, kill -0 doesn't really do anything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this is kill 0, not kill -0 -- 0 is the PID (special-cased?).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, yeah, kill 0 is documented as

0      All processes in the current process group are signaled.

Makes sense now

@infinisil infinisil force-pushed the master branch 2 times, most recently from 71ac68e to 0104013 Compare May 26, 2023 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ctrl+c doesn't successfully stop deployment
3 participants