systemd service failure inhibits system configuration activation #1535

PAI5REECHO · 2022-08-14T18:14:29Z

Whenever a nixops deployment is made on a system with a systemd service in a activating (auto-restart) or failed state the deployment fails. I don't understand why nixops is designed in this way though.

test.........> setting up tmpfiles
test.........> the following new units were started: [email protected]
test.........> warning: the following units failed: restic-backups-external.service
test.........> 
test.........> ● test.service - test
test.........>      Loaded: loaded (/etc/systemd/system/test.service; linked; preset: enabled)
test.........>      Active: activating (auto-restart) since Sun 2022-08-14 12:00:08 UTC; 2h 9min ago
test.........> TriggeredBy: ● test.timer
test.........>    Main PID: 8780 (code=exited, status=1/FAILURE)
test.........>         CPU: 512ms
test.........> error: Traceback (most recent call last):
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/lib/python3.10/site-packages/nixops/deployment.py", line 906, in worker
    raise Exception(
Exception: unable to activate new configuration (exit code 4)

Traceback (most recent call last):
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/bin/.nixops-wrapped", line 9, in <module>
    sys.exit(main())
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/lib/python3.10/site-packages/nixops/__main__.py", line 56, in main
    args.op(args)
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/lib/python3.10/site-packages/nixops/script_defs.py", line 715, in op_deploy
    depl.deploy(
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/lib/python3.10/site-packages/nixops/deployment.py", line 1365, in deploy
    self.run_with_notify("deploy", lambda: self._deploy(**kwargs))
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/lib/python3.10/site-packages/nixops/deployment.py", line 1354, in run_with_notify
    f()
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/lib/python3.10/site-packages/nixops/deployment.py", line 1365, in <lambda>
    self.run_with_notify("deploy", lambda: self._deploy(**kwargs))
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/lib/python3.10/site-packages/nixops/deployment.py", line 1300, in _deploy
    self.activate_configs(
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/lib/python3.10/site-packages/nixops/deployment.py", line 947, in activate_configs
    raise Exception(
Exception: activation of 1 of 1 machines failed (namely on ‘test’)

The text was updated successfully, but these errors were encountered:

roberth · 2022-08-15T14:57:08Z

Me neither, if what you're saying is that something was skipped because of the error.

Stopping a deployment half way is incompatible with declarative deployments that do not specify dependencies (we don't) and it is also incompatible with the idea of letting the distributed system converge towards an acceptable (or fully) operational state.
That said, using the deployment process for feedback about the system seems useful. Did your deployment skip anything because of the error? If so, that would be an issue that needs correcting.

Also we shouldn't be emitting a stack trace for this type of error and the log should be clear about what did and did not happen.

TODO

check that errors are collected but do not interrupt parallel changes
report such errors with clarity as to what happened. Specifically answer the question whether a re-deployment is necessary.
do not report a stack trace for expected errors that are handled properly

PAI5REECHO · 2022-08-16T06:45:00Z

Did your deployment skip anything because of the error?

Yes, the system activation fails due to a failing or pending systemd service, so no changes to the system are applied which is unexpected. Activation shouldn't depend on the health of systemd services.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

systemd service failure inhibits system configuration activation #1535

systemd service failure inhibits system configuration activation #1535

PAI5REECHO commented Aug 14, 2022

roberth commented Aug 15, 2022

PAI5REECHO commented Aug 16, 2022 •

edited

Loading

systemd service failure inhibits system configuration activation #1535

systemd service failure inhibits system configuration activation #1535

Comments

PAI5REECHO commented Aug 14, 2022

roberth commented Aug 15, 2022

PAI5REECHO commented Aug 16, 2022 • edited Loading

PAI5REECHO commented Aug 16, 2022 •

edited

Loading