Agent and management server communication broken after KVM host reboot #9907
Unanswered
adietrich-ussignal
asked this question in
Q&A
Replies: 1 comment
-
For some additional context, I see SSH logins using the user the host was added with when I attempt to exit maintenance mode. On another cluster that is not set up in a consolidated fashion, I do not see this behavior, which makes me wonder if the agent is handling this communication and CloudStack is not directly SSH-ing to the host as it is with the consolidated deployment. Does anyone know why they appear to be using two different approaches to handling entering and exiting maintenance mode? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
ISSUE TYPE
COMPONENT NAME
CLOUDSTACK VERSION
CONFIGURATION
Advanced Networking
KVM
Consolidated Cluster:
OS / ENVIRONMENT
Ubuntu 22.04
SUMMARY
When performing a host reboot after updating general packages (e.g., kernel updates), the agent fails to return to an "Up" state.
The error observed when attempting to disable maintnenance mode:
The agent logs:
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Fetching CPU speed from command "lscpu".
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: ERROR [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Unable to retrieve the CPU speed from lscpu.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: java.lang.NullPointerException
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/jdk.internal.math.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.lang.Float.parseFloat(Float.java:455)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeedFromCommandLscpu(KVMHostInfo.java:134)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeed(KVMHostInfo.java:109)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getHostInfoFromLibvirt(KVMHostInfo.java:189)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.(KVMHostInfo.java:66)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.initialize(LibvirtComputingResource.java:3627)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.agent.Agent.sendStartup(Agent.java:460)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1125)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.utils.nio.Task.call(Task.java:83)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.utils.nio.Task.call(Task.java:29)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.lang.Thread.run(Thread.java:829)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Fetching CPU speed from file [/sys/devices/system/cpu/cpu0/cpufreq/base_frequency].
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: ERROR [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Unable to retrieve the CPU speed from file [/sys/devices/system/cpu/cpu0/cpufreq/base_frequency]
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: java.io.FileNotFoundException: /sys/devices/system/cpu/cpu0/cpufreq/base_frequency (No such file or directory)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.io.FileInputStream.open0(Native Method)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.io.FileInputStream.(FileInputStream.java:157)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.io.FileInputStream.(FileInputStream.java:112)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.io.FileReader.(FileReader.java:60)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeedFromFile(KVMHostInfo.java:145)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeed(KVMHostInfo.java:114)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getHostInfoFromLibvirt(KVMHostInfo.java:189)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.(KVMHostInfo.java:66)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.initialize(LibvirtComputingResource.java:3627)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.agent.Agent.sendStartup(Agent.java:460)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1125)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.utils.nio.Task.call(Task.java:83)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.utils.nio.Task.call(Task.java:29)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.lang.Thread.run(Thread.java:829)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Fetching CPU speed from "host capabilities"
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Retrieved value [2100000000] from "host capabilities". This corresponds to a CPU speed of [2100] MHz.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) Host uses control group [cgroup2fs].
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) Calculating the max shares of the host.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) The max shares of the host is [268800].
Aug 13 14:17:27 dtw-cm-kvm01-03 sudo[7077]: root : PWD=/ ; USER=root ; COMMAND=/usr/bin/grep InitiatorName= /etc/iscsi/initiatorname.iscsi
Aug 13 14:17:27 dtw-cm-kvm01-03 sudo[7077]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
Aug 13 14:17:27 dtw-cm-kvm01-03 sudo[7077]: pam_unix(sudo:session): session closed for user root
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: WARN [utils.script.Script] (Agent-Handler-1:) (logid:) Execution of process [7107] for command [/bin/bash -c dpkg -l nbdkit ] failed.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: WARN [utils.script.Script] (Agent-Handler-1:) (logid:) Process [7107] for command [/bin/bash -c dpkg -l nbdkit ] encountered the error: [1].
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Attempting to create storage pool 5a0edcc5-002b-4f39-84ef-b8c7e8486dc9 (Filesystem) in libvirt
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Found existing defined storage pool 5a0edcc5-002b-4f39-84ef-b8c7e8486dc9, using it.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Trying to fetch storage pool 5a0edcc5-002b-4f39-84ef-b8c7e8486dc9 from libvirt
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Process agent startup answer, agent id = 0
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Set agent id 0
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Startup Response Received: agent id = 0
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:) (logid:47b167ac) Attempting to create storage pool 456fe424-034a-3bf4-b0b5-5b5b30ed07f6 (NetworkFilesystem) in libvirt
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:) (logid:47b167ac) Found existing defined storage pool 456fe424-034a-3bf4-b0b5-5b5b30ed07f6, using it.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:) (logid:47b167ac) Trying to fetch storage pool 456fe424-034a-3bf4-b0b5-5b5b30ed07f6 from libvirt
The management server log:
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: com.cloud.utils.exception.CloudRuntimeException: Could not restart agent on Host {"id":17,"name":"dtw-cm-kvm01-03","type":"Routing","uuid":"2549c995-78ca-411f-91f8-7a4863fe3d15"} due to: Failed to restart cloudstack-agent.service: Interactive authentication required.
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: See system logs and 'systemctl status cloudstack-agent.service' for details.
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.resource.ResourceManagerImpl.connectAndRestartAgentOnHost(ResourceManagerImpl.java:2958)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.resource.ResourceManagerImpl.handleAgentIfNotConnected(ResourceManagerImpl.java:2923)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2893)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2977)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1294)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.lang.reflect.Method.invoke(Method.java:566)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.sun.proxy.$Proxy200.cancelMaintenance(Unknown Source)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd.execute(CancelMaintenanceCmd.java:96)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:172)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:112)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:654)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:602)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.lang.Thread.run(Thread.java:829)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: INFO [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-12:ctx-d6de7f78 job-720) (logid:cb386b6d) Remove job-720 from job monitoring
STEPS TO REPRODUCE
EXPECTED RESULTS
ACTUAL RESULTS
Beta Was this translation helpful? Give feedback.
All reactions