-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/google-compute-config: fix module #144761
Conversation
pkgs/tools/virtualization/google-compute-guest-agent/default.nix
Outdated
Show resolved
Hide resolved
pkgs/tools/virtualization/google-compute-guest-configs/default.nix
Outdated
Show resolved
Hide resolved
pkgs/tools/virtualization/google-compute-guest-configs/default.nix
Outdated
Show resolved
Hide resolved
pkgs/tools/virtualization/google-compute-guest-configs/default.nix
Outdated
Show resolved
Hide resolved
The guest agent is depending in some bash scripts contained in the google-compute-guest-configs packages to setup the virtio multiqueue support at runtime. We inject those through a wrapper. The oslogin implementation of the guest agent won't work on NixOS as it currently is. On top of that, it's enabled/disabled status exclusively depends on the remote GCE deployment metadata, we can't disable that through the conf file. We add a patch allowing us to disable this feature via the conf file. Co-authored-by: Mark Karpov <[email protected]>
The python scripts we're using have been deprecated in favor of the guest agent, see more infos at: GoogleCloudPlatform/compute-image-packages@276e520#diff-267a2788071ee63df1443363c2ab882e7d321adc77ee53462a920229a962eabbL40 The guest agent onboard all the old python scripts as part of two new go binaries: google_guest_agent and google_metadata_script_runner. Both are using the same /etc/default-instance_configs.cfg configuration file. The configuration file we embed in this module mimicks the setup we had with the python scripts. We'll have two services: - The metadata script service: in charge of running the google compute startup/showdown scripts. - The guest agent: in charge of syncing the guest clock with the host and setup the network interfaces on boot. Co-authored-by: Mark Karpov <[email protected]>
Could you please point me to this Nixpkgs "recommended style" guidelines?
https://discourse.nixos.org/t/rec-env-allocation/15926 . Now you do.
Meaning I'd have to edit that file again and rebase the PR. Meaning spend time on stuff you admit yourself would not lead to any qualitative improvement. I don't understand what you're trying to do here by nitpicking details. This PR makes some technical changes to this module, we did not even discussed yet whether or not this is the way to go for the What's the point of nitpicking style details when the core of the PR has not been discussed yet? What are you trying to achieve here? I think reading https://sage.thesharps.us/2014/09/01/the-gentle-art-of-patch-review/ might help. |
Oh, btw, for the record: the quoting part you're not happy with is actually not coming from me, it's already in Nixpkgs, in one of the derivation packaging the now partly legacy
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Thanks for doing that digging!
Especially considering how hard this is to test automatically, and to avoid introducing regressions without noticing, we should try to send this upstream.
...virtualization/google-compute-guest-agent/0001-oslogin-add-disable-toggle-in-conf-file.patch
Show resolved
Hide resolved
From f318f525149b06365e9f0c7f642f09f401c3e1c1 Mon Sep 17 00:00:00 2001 | ||
From: =?UTF-8?q?F=C3=A9lix=20Baylac-Jacqu=C3=A9?= <[email protected]> | ||
Date: Fri, 5 Nov 2021 15:40:13 +0100 | ||
Subject: [PATCH] oslogin: add disable toggle in conf file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reword this patch - we should send this upstream, so it should be reworded a bit (to explain this is about NixOS, which doesn't want to use the imperative install/uninstall bits) - and maybe add some more comments to their code, so it's clear why the config boolean is read.
In another context, upstream seems to have been open to accept patches, as long as it doesn't impair other users - so a config option disabling the "installation" phases in their daemon might be accepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NinjaTrappeur are you fine with me trying to upstream this patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 sure!
@@ -76,7 +76,8 @@ func (o *osloginMgr) timeout() bool { | ||
} | ||
|
||
func (o *osloginMgr) disabled(os string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As explained above, needs to have a comment on why we disable this.
sha256 = "sha256-DHpv6RIXV3Rn/8fuGUWS6nzsEqgyY5+zozsundX9ByM="; | ||
}; | ||
|
||
patches = [ ./0001-oslogin-add-disable-toggle-in-conf-file.patch ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be sent upstream and fetchpatch
'ed from there.
@@ -76,7 +76,8 @@ func (o *osloginMgr) timeout() bool { | ||
} | ||
|
||
func (o *osloginMgr) disabled(os string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As explained above, needs to have a comment on why we disable this.
sha256 = "sha256-DHpv6RIXV3Rn/8fuGUWS6nzsEqgyY5+zozsundX9ByM="; | ||
}; | ||
|
||
patches = [ ./0001-oslogin-add-disable-toggle-in-conf-file.patch ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be sent upstream and fetchpatch
'ed from there.
should we downgrade the package for now? |
I have a small question/suggestion related to this work: On this line:
google-compute-config.nix instead of google-compute-image.nix ?
I'm new to Nix, so maybe I don't understand something :) |
Also, on GCE, I installed Nixos manually with a ZFS root FS. This prevents me from importing directly Could there be a simple solution to this issue? For now, I just copied the file in my |
We have two options for a quick fix:
I'm unlikely to have the required bandwidth to properly push the patch upstream for the next 2 months. |
Then let's downgrade. I don't want to discredit the work being done in this PR, but we shouldn't merge it in halfway-done - I'd then rather revert the bump to fix things (and backport it to 21.11, too). |
Reverting is fine by me. |
This reverts commit 1748291. This upgrade broke the google-compute-config module. See NixOS#144761 for more info.
environment.etc."sysctl.d/11-gce-network-security.conf".source = "${gce}/sysctl.d/11-gce-network-security.conf"; | ||
|
||
environment.etc."default/instance_configs.cfg".text = '' | ||
[Accounts] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NinjaTrappeur can you compare your configuration here to the one from my PR and tell which other options are a good idea to use? Or to paraphase: which parts of default configuration did you change?
Closing in favor of #150837 |
Motivation for this change
We were relying on some google-provided python script to perform various actions on the GCE guests. These scripts have been decommissioned in favor of a Golang daemon: https://github.com/GoogleCloudPlatform/guest-agent. The #131761 bump broke the google-compute-config NixOS module: the Python script are not longer present in the
$out/bin
directory (more info about that in the upstream commit), hence breaking the NixOS module that was relying on those scripts.Migrating to the new Golang daemon to perform what the scripts used to do. See more details in the individual commit messages.
This module is tightly coupled to GCE, it's sadly hard to write any VM test for it and prevent future similar breakages :( . We tested this PR manually on a GCE test deployment containing some oslogin and metadata scripts.
Things done
sandbox = true
set innix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)