-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos-rebuild --use-remote-sudo fails with sudo error #118655
Comments
/cc @AmineChikhaoui @tewfik-ghariani in case you've encountered this while working on NixOps, and @flokli for your work on the oslogin package. |
Does it reliably fail, after trying multiple times? Do you see a nscd crash in dmesg/the journal?
|
I just tried, and this time it has worked, so no, it doesn't happen reliably. And yes, I've just observed an nscd crash:
This crash seems to happen on every operation, including e.g. |
This is
|
@flokli let me know if the coredump is of any interest, and i can send it to you. |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/nixos-rebuild-on-gce-vm/12301/5 |
I experience the same issue. Here are my observations.
Then I add myself to trustedUsers: awk 'NR==3{print " nix.trustedUsers = [\"'$(whoami)'\"];"}1' /etc/nixos/configuration.nix | sudo tee /etc/nixos/configuration.nix Now I see nscd crash:
however,
Then I use deploy-rs to deploy my configuration which is based on nixpkgs-unstable and contains additional setting
so instance would not fail fetching ssh keys which don't exist on oslogin-enabled instance. deploy-rs fails with:
No more nscd crashes but there is a proc one:
Interesting that sometimes when I run deploy-rs on a fresh instance it applies configuration successfully but later fails again. |
Ignore the On the failing sudo: There is some instability in nscd when used with |
I am also trying
|
This can be solved by setting |
It just replies (Not using google-oslogin) |
@06kellyjac did you ever solve this? I'm having the same problem |
I havent looked for a solution very hard. I was hoping deploy-rs might handle it but im not sure serokell/deploy-rs#78 |
My workaround to
was to use |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/how-to-build-nixos-system-remotely/26188/5 |
So this is still a problem for me and the discourse link just above my post seems to indicate that it worked once and might be a (very old) regression, now. Might be, that some combination of options causes that problem. Here is what I run: And yes, it does seem to have something to do with the
and the verbose output says it's set by Also tried with |
Adding another
However, I'm now getting The only way I could get around the prompt is to set security.sudo.extraRules = [
{ users = [ "privileged_user" ];
commands = [
{ command = "/run/current-system/sw/bin/nix-store" ;
options = [ "NOPASSWD" ];
}
];
}
]; https://discourse.nixos.org/t/dont-prompt-a-user-for-the-sudo-password/9163/3
|
Yeah, I read the manpage and tried that already several times. Didn't work. Just echo'ed the password back to me. I definitely think there's a bug somewhere in the stream handling. Also encountered a quoting error when trying to pass |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/remote-nixos-rebuild-works-with-build-but-not-with-switch/34741/12 |
There was an issue logged against Ansible where an SSH session would hang when trying to execute a command in a similar way, also worked around by providing the flag: ansible/ansible#66535. There's some additional explainers on https://serverfault.com/a/706543 (mind the comments and the old bugzilla report) that might prove useful in narrowing this down. A lot of issues seem to have come about because of unexpected (additional) requests for sudo auth. Edit: In the SSHOPTS definition
I wonder if it's possible that the ControlPersist duration is being exceeded and that's not being handled gracefully? The intermittence of the failure made me think about whether this could be being shown up by a long-running build with a while between content updates - similar to this one in Ansible. |
Describe the bug
On a Google Cloud instance with OS Login enabled, running
fails with
There is some strange interaction with
nix-copy-closure
and SSH session reuse. It seems that if the first connection is established bynix-copy-closure
, then some information is lost/not added to the session.A workaround for this is to prepend the
nixos-rebuild
invocation with `NIX_SSHOPTS='-o ControlMaster=no'.To Reproduce
Steps to reproduce the behavior:
nixos-rebuild
as outlined above.NIX_SSHOPTS='-o ControlMaster=no' nixos-rebuild
Expected behavior
Switch happens correctly
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
The
getent
command correctly returns the user:OS Login uses
NSS
to make the user available to the systemNotify maintainers
Metadata
Maintainer information:
The text was updated successfully, but these errors were encountered: