Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nscd, unscd, cron and crond fail to restart #139

Open
zbjornson opened this issue Oct 16, 2021 · 5 comments
Open

nscd, unscd, cron and crond fail to restart #139

zbjornson opened this issue Oct 16, 2021 · 5 comments

Comments

@zbjornson
Copy link

When google-guest-agent tries to start, it seems to try to start nscd, unscd, cron and crond, but those units are not present on our servers.

$ uname -a
Linux server-2 5.11.0-1020-gcp #22~20.04.1-Ubuntu SMP Tue Sep 21 10:54:26 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
# Ubuntu 20.04 LTS Minimal

$ systemctl status google-guest-agent
● google-guest-agent.service - Google Compute Engine Guest Agent
     Loaded: loaded (/lib/systemd/system/google-guest-agent.service; enabled; vendor preset: enabled)
     Active: active (running) since Sat 2021-10-16 18:02:28 UTC; 19min ago
   Main PID: 555 (google_guest_ag)
      Tasks: 12 (limit: 9536)
     Memory: 20.8M
     CGroup: /system.slice/google-guest-agent.service
             └─555 /usr/bin/google_guest_agent

Oct 16 18:02:27 server-2 dhclient[620]: All rights reserved.
Oct 16 18:02:27 server-2 dhclient[620]: For info, please visit https://www.isc.org/software/dhcp/
Oct 16 18:02:27 server-2 dhclient[620]: 
Oct 16 18:02:27 server-2 dhclient[620]: Listening on Socket/ens4
Oct 16 18:02:27 server-2 dhclient[620]: Sending on   Socket/ens4
Oct 16 18:02:28 server-2 systemd[1]: Started Google Compute Engine Guest Agent.
Oct 16 18:02:28 server-2 GCEGuestAgent[555]: 2021-10-16T18:02:28.9221Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart nscd.service: Unit nscd.service not found.
                                                     .
Oct 16 18:02:29 server-2 GCEGuestAgent[555]: 2021-10-16T18:02:29.5818Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart unscd.service: Unit unscd.service not found.
                                                     .
Oct 16 18:02:29 server-2 GCEGuestAgent[555]: 2021-10-16T18:02:29.8194Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart cron.service: Unit cron.service not found.
                                                     .
Oct 16 18:02:29 server-2 GCEGuestAgent[555]: 2021-10-16T18:02:29.8254Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart crond.service: Unit crond.service not found.
                                                     .

Are these benign? If so, can they be downgraded from Errors?

These same error lines appear in #134, but in my case, the service is active/running, not dead.

@andrewhsu
Copy link

I experience this same issue with COS version 93:

# cat /etc/os-release
NAME="Container-Optimized OS"
ID=cos
PRETTY_NAME="Container-Optimized OS from Google"
HOME_URL="https://cloud.google.com/container-optimized-os/docs"
BUG_REPORT_URL="https://cloud.google.com/container-optimized-os/docs/resources/support-policy#contact_us"
GOOGLE_CRASH_ID=Lakitu
GOOGLE_METRICS_PRODUCT_ID=26
KERNEL_COMMIT_ID=435e3f6b0837d398051855e22b245142aceb1ec6
VERSION=93
VERSION_ID=93
BUILD_ID=16623.39.6
# journalctl -u google-guest-agent.service -p 3
-- Journal begins at Fri 2021-10-22 22:19:33 UTC, ends at Fri 2021-10-22 22:58:32 UTC. --
Oct 22 22:19:41 elasticsearch-instance GCEGuestAgent[352]: 2021-10-22T22:19:41.4680Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart nscd.service: Unit nscd.service not found.
                                                           .
Oct 22 22:19:41 elasticsearch-instance GCEGuestAgent[352]: 2021-10-22T22:19:41.4773Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart unscd.service: Unit unscd.service not found.
                                                           .
Oct 22 22:19:42 elasticsearch-instance GCEGuestAgent[352]: 2021-10-22T22:19:42.1232Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart cron.service: Unit cron.service not found.
                                                           .
Oct 22 22:19:42 elasticsearch-instance GCEGuestAgent[352]: 2021-10-22T22:19:42.2261Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart crond.service: Unit crond.service not found.
                                                           .
Oct 22 22:19:42 elasticsearch-instance GCEGuestAgent[352]: 2021-10-22T22:19:42.2421Z GCEGuestAgent Error oslogin.go:116: Error reloading service: Failed to reload-or-restart ssh.service: Unit ssh.service not found.
                                                           .

@zbjornson
Copy link
Author

When lots of servers start or restart at once, we get 100s of these errors that end up triggering server alerts. Could someone please let us know if this is the same as #134 and thus being worked on? (Should I open a GCP Support case?)

@hopkiw
Copy link
Contributor

hopkiw commented Nov 18, 2021

The background for these log messages: on startup, the guest agent makes configuration changes, then restarts services for the changes to take effect. It logs a warning message when a service isn't found, but it is benign.

We actually already reduced this extraneous logging in #122 so if you use an updated version of the guest agent, these logs should go away. I think some of our partner distributions have not yet received this change, i.e. Ubuntu or COS.

@zbjornson
Copy link
Author

Thanks @hopkiw. Indeed the latest version available from/for Ubuntu is 20210629.00. Do you know if there's a way to accelerate the release of a new version? (Is that done by Canonical or Google?)

@hopkiw
Copy link
Contributor

hopkiw commented Nov 23, 2021

Canonical takes updates on a regular cadence, except for critical vulnerabilities, where we will ask them to prioritize an update or patch. I don't know if end users can influence the process, but I imagine you might try filing them a bug.

patelne pushed a commit to patelne/guest-agent that referenced this issue Feb 17, 2022
* add image test task in pipeline

fix bug

* add other image-test job

* make iamge test as task

* using full image name instead family

* add image test task in pipeline

fix bug

* fix centos-7

* fix missing image-test

* add almalinux rocky-linux
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants