Previously we disabled EDAC kernel modules by unloading them from the kernel and preventing them from loading on our proxmox systems. This is by recommendation of our hardware vendors, because having EDAC loaded can conflict with Dell and Lenovo's own hardware monitoring implementations through...
I will do that. But regardless of what triggers the full cluster reboot, corosync seems to recover fine afterwards, as shown by the output of `corosync-cfgtool -s`. But pvecm status does not show those other nodes. How does the proxmox software work with the corosync code? What needs to happen...
Today the whole cluster went down (all nodes rebooted) right after 1 node was shut down. Again the cluster nodes did not rejoin automatically.
I set 'debug: on' in corosync.conf and ran it manually in the foreground on 2 nodes. The logs are attached. We will now proceed to power all machines...
yes I can enable debug logging on corosync when I run into this issue again. I can also pretty reliably reproduce the issue. The sad part is that it takes a lot of time to bring everything back online after such an event. Therefore I prefer to wait until it happens again before turning on the...
Yes, we recently completed a firmware update round to bring all firmwares of all system components to recent versions. It made no difference. Besides, communication between nodes is working fine at the moment where the nodes can't re-join the cluster. Since 'corosync-cfgtool -s' shows the nodes...
In this case one or more STP TCN's caused issues on the network, which caused all hypervisors in the cluster to temporarily lose network long enough to trigger fencing and self reboots. That started this whole situation. But once those TCNs have been handled and the STP topology is stable again...
Hi,
I'm running a cluster with 28 nodes. After a full cluster reboot (all nodes restarted) none of the nodes want to join up. This is what each of the nodes sees:
Cluster information
-------------------
Name: pvenl02
Config Version: 57
Transport: knet
Secure auth: on...
Thank you Wolfgang, I understand now.
Would it be something that proxmox could consider to add as a feature, where it's possible to set a VM as 'local' (not need to be migrated when the HV goes down) without it showing up in error state after a hypervisor reboot?
Hi Wolfgang,
Thank you for your insights. If I understand correctly what you wrote, proxmox will set VMs into error state if it cannot migrate the VMs. Even if the HA policy says the VM should not be migrated. Is that correct?
Besides the freeze setting, is there a different way to tell...
Hi,
I'm running a proxmox 5.4 cluster with 18 nodes.
On multiple occasions I've had VMs show up in 'HA error' state. To fix this I set the VMs to ignored state, waited a minute for the error to clear, and then started them. They then start fine, without any problems.
The VM is member of an HA...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.