Proxmox becomes unstable when a node has a problem

kiarash

Active Member
May 2, 2020
1
0
41
54
Oman
cloudacropolis.com
we have had several crashes, with data loss at VM level, when one node seemed to have a power problem.
rather than isolating that node, Proxmox rebooted all the servers at the same time!
this has resulted into a major disaster for our company.
This is not related to a support issue with subscription level, but a major bug in the way proxmox works. In a ticket we even raised a request to pay an expert to look into our setup, but the only response we got was just to get a higher subscription to a license.
Has anyone had this issue before?
Now we have that server completely out. it was our ceph node as well. HA has been disabled as well as IPV6. By the way Ceph has its own 100 GBPS network, corosynch is also separated physically.
 
hi,

how many nodes in cluster? if cluster loses quorum and HA is enabled, this can happen...

but a major bug in the way proxmox works
most of the time when this specific issue happens, it's a cluster with HA losing quorum, so more likely it's a configuration mistake...

i suggest you read the following chapters in our documentation:
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_ha_manager
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_pvecm

also see this thread:
https://forum.proxmox.com/threads/random-reboot-of-full-proxmox-cluster.43999/