unexpected restart of all cluster nodes

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
7,903
1,521
164
Reading this thread, but not being experienced in clusters, I'm really worried about a couple of points:
a) fencing should be different, i.e. Proxmox node finds itself isolated, understands that has to "suicide" then stops/kills all KVM processes (or LXC or whatever), logs the fact, syncs the local storage (where logs are located) then does a clean "reboot" or if you think is risky, a "reset".

that's how fencing works - each node has a watchdog controlled by the HA services, if the watchdog expires, the node kills itself ;) as long as the node is part of the quorum, it will prevent the watchdog from expiring.

b) if corosync is separated from other networks, it can be that all the other networks are working (storage and VM) but just a corosync network problem can provoke a cluster suicide... that's bad

you have to define certain criteria for "part of the cluster". corosync already does the heavy lifting here, and we require corosync to say "this node is part of the quorate partition of the cluster" anyway for any tasks that modify state to work, so it's a good fit.

c) why not just have an option for not really critical setup (i.e. max 10 nodes and that can work with the described setup) to consider a note be OK as long as can communicate with it's shared cluster storage? Just reserve a "cluster_disk" in that storage with a FS that supports concurrent writes and each node rewrites a file with nodename. If a node can't write there, has to "commit suicide" (but as point a)), if it can write, has just to read all other nodes timestamps and if finds ones that are older than "n" minutes, can understand that that node is out of the cluster and, i.e., start HA VMs. I'm in a hurry and maybe must be thought something more sophisticated like node_vmid.txt or a sort of "cluster db" like proxmox already has or something good enough? Corosync is really overcomplicated and for small setups introduces more problems that it solves, OMHO

that's exactly what we are doing, just replace "shared cluster storage" with "pmxcfs", our fuse-mapped shared DB backed by corosync, and the rest (writing timestamps, checking which nodes haven't updated theirs and must have already fenced themselves via their watchdog expiring, etc.pp.) is what our HA stack is doing ;)

establishing a consistent view of the world/cluster is not trivial, the 'write and check timestamp' part is just the last piece of the puzzle and not sufficient on its own.
 
  • Like
Reactions: Moayad

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!