Designing a Proxmox HA cluster with 4 nodes on 2 remote sites - Quorum and impact

rbollink

New Member
Jul 5, 2023
5
2
1
Hello, we're planning to migrate our VMware infrastructure to a PROXMOX cluster in the next few weeks, and would like to take advantage of the zero licensing costs to create an HA cluster with shared, redundant storage on two sites.

We have 2 datacenters on 2 different sites with a latency of less than 5ms.
Site A will consist of 2 nodes and will host all production VMs. Site B will also have 2 nodes and will run idle as long as site A is operational. Site B's role will be to take over the entire workload in the event of site A's failure.

The 4 PVE VMs will be stored on a Synology NFS server hosted on site A. It will be redundant with an identical machine on site B in SHA (Synology High Availability).

Nevertheless, we need to respect the quorum, and it would be wiser to have an odd number of votes. To achieve this, we have two options:
  1. Increase the weight of one of the servers on one of the 2 sites ( ex: site A = 2 nodes = 3 votes and site B = 2 nodes = 2 votes / then quorum = 3 - or vice versa for site B)
  2. Add a Qdevice on a 3rd remote site (site A = 2 nodes = 2 votes / Site B = 2 nodes = 2 votes / Site C = 1 Qdevice = 1 vote / then quorum = 3)
The issue:
  1. The problem with the first solution is that if we lose site B, which has quorum (e.g.: WAN network cut), then the PVEs of site A, which no longer has quorum, will go into read-only mode, and B, despite being cut off from the WAN (interrupting the supply of our customer services) will restart all the VMs of site A, despite being isolated from the world. Then the whole production falls.
  2. With the second solution, if we lose site A or site B but the Qdevice on site C continues to see the surviving site, then the prod can continue to run anywhere on A or B.But if, in addition to losing one of the two sites, we also lose the Qdevice, then we also lose our prod. This solution seems more resilient, so we're looking for ways around it.

As I learned more about how Corosync works, I discovered some options in the votequorum service, such as last_man_standing, but I also saw that this was not supported by proxmox. Could some of you explain the negative consequences of such a configuration?

Also, I've seen that votequorum requires an expect_votes value to work and this can be provided in two ways. The number of expected votes will either be automatically calculated when the nodelist { } section is present in corosync.conf or if expected_votes is specified in the quorum { } section.


In my case, let's take the following example:
Site A contains 2 nodes and 2 votes and site B contains 2 nodes and 3 votes and site C contains no Qdevice.

If I configure the cluster with pvecm expected 2, will I be able to continue running my prod on site A if site B goes down without risking a split brain? I think the answer is no, but confirmation would be appreciated.

Thanks in advance.
 
If I configure the cluster with pvecm expected 2, will I be able to continue running my prod on site A if site B goes down without risking a split brain?
If only the network breaks, each site has two votes and expects two votes and will run independently and if on site B the machine starts, you will get a split brain situtation.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!