Quorum and recovery issues

sesquipedality

New Member
May 19, 2020
12
3
3
45
Hi,

I run a four node Proxmox cluster as follows

Code:
Node 1
    VM: router
        UPS client
    VM: jump host
    UPS server
Node 2
    VM: fileserver1
    VM: fileserver2
    UPS client
Node 3
    Staging/development node - some test VMs generally not running
    UPS client
Node 4
    Redundant hardware - occasionally used for testing purposes - usually offline due to significant power draw
    UPS client

This isn't a traditional high availability setup, I don't need my VMs to be able to run anywhere or fallover automatically, but I can, say move my router to "node 2" if I need to take Node1 down for maintenance. (Automatic fallover wouldn't even be helpful with the router, as I can't run two sets of cables to the uplink modems.) Proxmox's backup, migration and replication features are also useful to me.

All of these machines are powered by the UPS, which runs at about 40% load, and provides an adequate window for shutdown in the event of power loss.

The problem is what happens after that. Node 1 is of course the last person standing, and any and all of the other machines and VMs may go down before it. The problem is that node1 cannot restart the router until quorum is established. The router is of course the DHCP server, and as such the network is pretty much non-functional until it comes back.

My questions therefore are:
  1. is there any way to override the need to have quorum before starting a VM. It is extremely unlikely this VM will be present elsewhere on the network as I move it around manually using Proxmox's migration facilities as required.
  2. In theory, could I give "Node1" more than half the votes in the quorum? Even if I could, this then means that other nodes won't be able to start VMs without Node 1 being up, which again, is not something I necessarily want, particularly as I could be left in a situation where Node 1 is down for maintenance, and I need to restart the router or Node 2 (where the router would normally be if I am maintaining Node 1)
  3. Is it possible to configure the cluster so that any 1 server can form quorum on its own? Presumably this would induce a high likelihood of splits, but as I understand it these would be resolved by randomly picking one of the nodes as authoritative. What sort of problems might I encounter in practice running a configuration like this?
  4. Is there any other solution that might help me to ensure that I can bring the network back online without the need for quorum? Is there some better configuration for me than a cluster if I am genuinely not overly fussed about automatic HA, but really just want to be able to move VMs from server to server from time to time?
 
Last edited:
  • Like
Reactions: kwinz
Actually it occurs to me that in truth I only need the two "production" nodes (node 1 and node 2) to be able to operate independently so a potential config would be

Code:
quorum: 3 votes
Node 1: weight 3
Node 2: weight 3
Node 3: weight 1
Node 4: weight 1

Could I set it up like this? Would terrible things happen if I did?
 
Last edited:
OK - a few answers I have discovered.

1) Assigning different weights is a no go in terms for forcing quorum with only one machine because there is no way to set a maximum quorum value. Quorum is calculated dynamically by corosync. While you can temporarily give an isolated node quorum by dropping the "expected" value using "pvecm expected 1", the expected value will not drop to less than the number of votes currently available in the cluster.

2) I can implement a partial solution by changing the quorum section to the following in /etc/pve/corosync.conf on the running cluster

Code:
quorum {
    provider: corosync_votequorum
    last_man_standing: 1
    auto_tie_breaker: 1
    auto_tie_breaker_mode: <nodenum>
}

<nodenum> here is the node number of the "UPS master" node - this will ensure that the UPS master will always win a quorum tie with another single node in the event of dropping to a 2 node cluster.

(Instructions for safely changing corosync.conf are at https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_configuration)

This will allow even a single node to retain quorum in the event that all other nodes have gone down. Thus in the event of a power failure that takes down everything except the master, it can still bring up the router VM without any other nodes being present. There are other scenarios (e.g. cold start of the network requires quorum) that this will not ficx.

I think what this tells me is that actually what I really want is to be able to force start a VM even if quorum isn't present. There doesn't appear to be a way to do that. I can understand why I'd need to tell proxmox "yes, this is really a thing I want to do and I understand there are consequences", but what I don't understand is why proxmox seemingly prevents me from doing it under all circumstances.
 
Last edited:
  • Like
Reactions: kwinz
There is a possible alternative solution which I think should restart a VM on the local node regardless of quorum as follows:

Syntax: force-vm-start <vm_id>
Code:
#!/bin/bash
CUR_STATUS=`qm status $1`
if [ "$CUR_STATUS" = "status: stopped"  ]
then
   CUR_VOTES=`pvecm status | grep Total | sed -e s/"Total votes:      //"`
   pvecm expected $CUR_VOTES
   qm start $1
else
   echo "$0: status of VM id '$1' is not stopped. Exiting."
   exit 1
fi

Even if it works, this might result in unpredictable cluster behaviour on your own system. Use at your own risk, obviously.

I obviously think it would be better if Proxmox allowed me to force start a VM without quorum, but in the absence of that facility, this should do the trick.
 
Last edited:
  • Like
Reactions: kwinz

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!