Replies: 18 comments
-
@sbrueseke are you addressing (designing/implementing a solution to) this bug? |
Beta Was this translation helpful? Give feedback.
-
I'm not deep enough into the topic to suggest a solution. We noticed the bug because we were carrying out some tests with Linstor in our lab and suddenly all the KVM hosts restarted. By evaluating the logs, we saw that it was due to the KVMHAMonitor. However, we did not have HA enabled, not on the cluster and not on the hosts. |
Beta Was this translation helpful? Give feedback.
-
@sbrueseke, We are also facing similar issue, but as per our logs ( on all KVM ) we are only seeing below log lines -
and not seeing this one which you have mentioned
|
Beta Was this translation helpful? Give feedback.
-
heartbeat[13962]: kvmheartbeat.sh will reboot system because it was unable to write the heartbeat to the storage. I received the same error when my Primary NFS Storage was in accessible . |
Beta Was this translation helpful? Give feedback.
-
Hey @sbrueseke , I finally got a chance to look at this again.If you are using NFS this is how it is supposed to work , I think. If you are using LinStor, see the merged PR for 4.19 (#8670). Is this still an issue? |
Beta Was this translation helpful? Give feedback.
-
@DaanHoogland now I am confused! Why is KVMHAMonitor rebooting hosts when the host is unable to write a heartbeat file to any storage when any HA setting is disabled in the UI? Do you know the reason for that? |
Beta Was this translation helpful? Give feedback.
-
@sbrueseke
|
Beta Was this translation helpful? Give feedback.
-
I know this workaround. My question is if this setting should be added by the management server instead of manually. |
Beta Was this translation helpful? Give feedback.
-
@sbrueseke |
Beta Was this translation helpful? Give feedback.
-
Can you explain the root cause why KVMHAMonitor needs to reboot the host when a storage is read only? |
Beta Was this translation helpful? Give feedback.
-
@sbrueseke there were lots of discussion in the past since storage issue rarely happens, it may be ok for users to change the setting in agent.properties |
Beta Was this translation helpful? Give feedback.
-
@weizhouapache I would suggest to take another look into that and change it default. In my opinion a host should not automatically reboot at all if no settings are configured. From what I know this looks to me like old behavior lost in the code. |
Beta Was this translation helpful? Give feedback.
-
I agree with you. @sbrueseke cc @DaanHoogland @rohityadavcloud @andrijapanicsb @GutoVeronezi @wido |
Beta Was this translation helpful? Give feedback.
-
@sbrueseke , I beg to differ, when storage can not be reached all VMs will be running on read-only disks from there on in. This means the hosts is useless and VMs need to reboot according to VM-HA. This behaviour was implemented before Host-HA was conceived and is not related. Maybe a redesign is in place but as of now this works as designed. |
Beta Was this translation helpful? Give feedback.
-
afaik, this has 0 things to do with host HA, VM HA or anything else. They only way (to my knowledge) to disable host reboots is to comment out the lines that does some echo into sys/proc...etc - at the VERY end of the script (that echo triggers forcefull reboot) - just comment out that single "echo.... line and you are good. Storage migh be unaccessible (heartbeat fail) but nothing will happen (log messages will say "I'm rebooting" but script is not doing anything due to line commented out) |
Beta Was this translation helpful? Give feedback.
-
Hi all, |
Beta Was this translation helpful? Give feedback.
-
@andrijapanicsb you can also disable this behavior with the agent property
PR #4586 introduced this property with default as A redesign of the feature might be needed; however, we need discuss it first. |
Beta Was this translation helpful? Give feedback.
-
CloudStack 4.17.0.1 Recently I also started getting this issue:
This has happened second time in this month - after 2.5 years. My Issues is after reboot all the VMs were down. In my case I don't see any issue with the NFS server, and its accessible. Should we treat this issue as temporary NFS reachability or is there anything I should be checking related to NFS host. |
Beta Was this translation helpful? Give feedback.
-
ISSUE TYPE
COMPONENT NAME
CLOUDSTACK VERSION
CONFIGURATION
OS / ENVIRONMENT
SUMMARY
Even when HA has been disabled on cluster and/or host level, KVMHAMonitor getting initialized on KVM hosts.
It looks like code does not check if HA is enabled or not:
https://github.com/apache/cloudstack/blob/8f6721ed4c4e1b31081a951c62ffbe5331cf16d4/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/LibvirtComputingResource.java
STEPS TO REPRODUCE
EXPECTED RESULTS
When HA is disabled I would expect that the management server adds the following parameter to agent.properties file of each host:
reboot.host.and.alert.management.on.heartbeat.timeout=false
or
KVMHAMonitor will not getting initialized at all.
ACTUAL RESULTS
Nothing happens when disabling HA on cluster and/or host level. KVMHAMonitor is getting initialized and will perform checks of the host. In some situations this will lead to an automatic reboot of the host because of KVMHAMonitor is not able to write heartbeat for pool to primary storage:
Beta Was this translation helpful? Give feedback.
All reactions