Proxmox becomes unstable when a node has a problem

kiarash

Member
May 2, 2020
1
0
6
52
Oman
cloudacropolis.com
we have had several crashes, with data loss at VM level, when one node seemed to have a power problem.
rather than isolating that node, Proxmox rebooted all the servers at the same time!
this has resulted into a major disaster for our company.
This is not related to a support issue with subscription level, but a major bug in the way proxmox works. In a ticket we even raised a request to pay an expert to look into our setup, but the only response we got was just to get a higher subscription to a license.
Has anyone had this issue before?
Now we have that server completely out. it was our ceph node as well. HA has been disabled as well as IPV6. By the way Ceph has its own 100 GBPS network, corosynch is also separated physically.
 
hi,

how many nodes in cluster? if cluster loses quorum and HA is enabled, this can happen...

but a major bug in the way proxmox works
most of the time when this specific issue happens, it's a cluster with HA losing quorum, so more likely it's a configuration mistake...

i suggest you read the following chapters in our documentation:
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_ha_manager
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_pvecm

also see this thread:
https://forum.proxmox.com/threads/random-reboot-of-full-proxmox-cluster.43999/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!