Cluster, quorum, and backups

BloodyIron

Renowned Member
Jan 14, 2013
229
13
83
it.lanified.com
Hi Folks,

Long time Proxmox user and supporter (I help on IRC where I can).

One of the clusters I work with, has been 2 nodes for years now. Recently 2 more nodes were added to the cluster, but with the intent of being temporary "lab" space. As such, these nodes were powered down out of hours to try and save power (as they're old hardware, not horribly efficient).

Now, the problem I have is with backups. For those who don't already see the issue, when we have 4 nodes in the cluster, and 2 turn off, the quorum goes into an "emergency" state of sorts. This is due to the threshold of 50% of the nodes being offline being met. The default configuration of quorums in proxmox is that at this point consistency cannot reasonably be met in the cluster (seems like a reasonable default IMO).

So what actual issue does this cause me? It breaks my nightly backups of the VMs. This is using the built-in backup mechanism, to an NFS share, nothing like VEEAM or whatever. Backups have been veerrrry reliable up until I found this hiccup.

The logs show that certain file locks cannot be attained when trying to backup (on ALL the VMs listed for backup):

"
INFO: Starting Backup of VM 212 (qemu)
INFO: status = running
INFO: unable to open file '/etc/pve/nodes/REDACTED/qemu-server/212.conf.tmp.10682' - Permission denied
INFO: update VM 212: -lock backup
ERROR: Backup of VM 212 failed - command 'qm set 212 --lock backup' failed: exit code 2
INFO: Starting Backup of VM 400 (qemu)
INFO: status = running
INFO: unable to open file '/etc/pve/nodes/REDACTED/qemu-server/400.conf.tmp.10692' - Permission denied
INFO: update VM 400: -lock backup
ERROR: Backup of VM 400 failed - command 'qm set 400 --lock backup' failed: exit code 2
"

I'm quite sure this is because the clustered filesystem is in a RO state due to earlier mentioned 50% thresholds being met.

Anyways. I see two "solutions", but I was hoping that a third would be found as I like neither "solution":

  1. Leave the "lab" nodes on all the time. The backups start at 1am and go for 4hrs-ish. This is known to work as it happened successfully last night.
  2. Remove those two nodes from this cluster, and do something else with them.
Now, I would prefer being able to turn nodes 3 and 4 off and on at-will, and remain members of this cluster, since I never ever put any actual important VMs on them, unless I'm testing something (forked the VM for example). This is my preferred way as I would love to keep one management interface for all of it (love this stuff, seriously guys!).

I have tried, with no success, telling pvecm to expect 2 votes, however that value doesn't seem to _change_ OR _stick_. Furthermore, the only docs I can find about this are for proxmox cluster v2, can't even find anything for v3 or v4.

Also, it's worth nothing that the nodes are not in an HA cluster, just a regular one. I don't need HA clustering, and the first 2 nodes aren't able to be fenced at this time anyways.

Well, I'd love to hear your thoughts! So, please tell me them. Thanks peeps.
 
Last edited:
you could give one of your two "always on" nodes two votes (so you would have 3 out of 5 and be quorate). note that this is very specific to this asymmetric setup, and generally not a good idea ;) if you ever add a third "always on" node, you should revert that change.
 
another option might be to add a fifth very small, low power, corosync/quorum only node (some people experimented with raspberry pis)
 
Well, if I wanted to change the number of votes per node, how would I do that? I am unsure if the documentation that's currently available is still relevant to v4 (I'm on v4)
 
edit "/etc/pve/corosync.conf" and change "quorum_votes" in the desired node entry to "2":

example snippet:
Code:
  node {
    name: node2
    nodeid: 2
    quorum_votes: 2
    ring0_addr: 10.0.1.12
  }

it should automatically reload the configuration, and pvecm status should show the updated configuration as well.
 
Is that going to be resilient to proxmox updates?

I'm still torn as to whether I should do this or something else, but you certainly have given me the info I wanted! :)
 
yes, we normally don't touch the corosync configuration on upgrades.
 
Just want to let you know that this adjustment so far looks to have done the trick. I suspected it would have, but I only recently was able to make the change (due to project constraints).

To be explicit, I set nodes #1 and #2 to have 2 votes each, and turned #3 and #4 off over night. Backups executed without issue (so it says).

Totally appreciate the awesome help! :D
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!