Hey all,
I have a cluster of 7 SFF PCs running as a PVE HA cluster backed by NFS storage (on an 8th machine, also in the cluster). I'm on "time-of-use" electricity costs, and would love to be able to kill off many of the SFF nodes during "peak" hours to save on energy costs, or if compute power isn't needed for some time.
However, due to Quorum rules, I need 5 machines running to obtain quorum (50% +1). Once I shut down more than 3 of the SFF PCs, quorum is lost and all heck breaks loose. So I have a few asks:
I have a cluster of 7 SFF PCs running as a PVE HA cluster backed by NFS storage (on an 8th machine, also in the cluster). I'm on "time-of-use" electricity costs, and would love to be able to kill off many of the SFF nodes during "peak" hours to save on energy costs, or if compute power isn't needed for some time.
However, due to Quorum rules, I need 5 machines running to obtain quorum (50% +1). Once I shut down more than 3 of the SFF PCs, quorum is lost and all heck breaks loose. So I have a few asks:
- are there recommendations on how one could safely reduce the number of nodes active in a cluster beyond 50%? Everything about PVE HA works great for me (shutdown initiates migration to other nodes, etc), except for this one main feature that is a blocker of sorts. One option that I've seen that I'd like to consider is last_man_standing feature of votequorum, or manually setting expected_votes but haven't seen it much when searching specifically related to PVE (or haven't seen it referenced in years).
- Is it expected that everything stops working when quorum is lost? It seems that once quorum is lost, all hosts, even those running perfectly fine on healthy alive nodes, stop working. Even if I pre-migrate all my VMs to alive nodes, then shut down the old, now-empty nodes, everything grinds to a halt. I expect this is expected behavior but would like to confirm.
- If I'm out of options, is there any other recommendation on how I can achieve this?
Last edited: