How many nodes are needed for HA with cluster over two datacenters?

hec

Renowned Member
Jan 8, 2009
278
18
83
Wien
www.vector-its.at
Hello,

we like to build up a new cluster distributed over two datacenters. We use a NetApp Metrocluster and connect the storage with NFS. So storage is no problem.

In normal situation there should be at least 3 nodes. How many nodes are needed in a cluster distributed over two datacenters?

best regards
Gregor
 
Any ideas?

I found nothing in the documentation.

Lets say we have following situation:

DC1: 2 Proxmox nodes, NetApp Metrocluster Site A
DC2: 2 Proxmox nodes, NetApp Metrocluster Site B

So we have a cluster with 4 nodes. All should be ok. But then DC1 is going down. No connectivity to the hosts. The NetApp Metrocluster will do a switchover. So we have all the storage resources on one site. The Proxmox cluster will have 2 down and 2 up nodes.

Can the cluster decide what to do?
 
building a cluster over 2 datacenters is not recommended (corosync needs a latency < 2 ms to work reliably, so you should be fine if this holds)

to have quorum you need more than half of the votes (each node provides a vote) so in your case you would not have quorum (and ha enabled nodes would fence themselves, you cannot start any vms, etc.)

a solution could be a corosync qdevice (which is an external service providing a vote for quorum), sadly this is not properly documented at this time
 
I tackled this with two different clusters. Failover isn't automated obviously, but I can manually fail us over to our other data center in a matter of 20-30 minutes.
 
Latency is no problem we have ultra low latency switches.

Here you see latency to local and remote datacenter. We have now 20GBit between the DCs and we will add 2 more links to get 40GBit. This should be enough.

Code:
64 bytes from raptor3.dmz.cubit.at (192.168.61.217): icmp_seq=1 ttl=255 time=0.095 ms
64 bytes from raptor3.dmz.cubit.at (192.168.61.217): icmp_seq=2 ttl=255 time=0.084 ms
--- raptor3.dmz.cubit.at ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1006ms
rtt min/avg/max/mdev = 0.084/0.089/0.095/0.010 ms

PING raptor4.dmz.cubit.at (192.168.61.218) 56(84) bytes of data.
64 bytes from raptor4.dmz.cubit.at (192.168.61.218): icmp_seq=1 ttl=255 time=0.214 ms
64 bytes from raptor4.dmz.cubit.at (192.168.61.218): icmp_seq=2 ttl=255 time=0.198 ms
--- raptor4.dmz.cubit.at ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.198/0.206/0.214/0.008 ms

Ok so i need a small Debian VM which works as corosync qdevice and all should be ok. I think the best is to put this VM on our VMware cluster.

The two cluster version is not possible. I need to migrate the VMs between the two datacenters.
 
A cluster must have a minimum of 3 members to assure quorum; If your link between your two datacenters is severed or interrupted, which site becomes master?! There is also the matter of storage synchronization; what is your replication mechanism and will it be able to maintain quorum on link interruption?
 
As i said we have a NetApp MetroCluster so all writes are commited synchronous to both sides.

So storage is no problem. The switchover takes less than 60s. Whats about a tiebraker with LTE or something like this. So we could check if one site is down because of a power problem or something like this or the connection between the DCs ist broken.

I'm open for all solutions. Also storage fencing would be fine. But there should be a solution to stretch a cluster over one or more datacenters. Maybe all should work with 3 DCs. Then there should be a majority if one DC is going offline.

I think Proxmox should work on a solution i think i'm not the only one who need this.
 
H
As i said we have a NetApp MetroCluster so all writes are commited synchronous to both sides.

So storage is no problem. The switchover takes less than 60s. Whats about a tiebraker with LTE or something like this. So we could check if one site is down because of a power problem or something like this or the connection between the DCs ist broken.

I'm open for all solutions. Also storage fencing would be fine. But there should be a solution to stretch a cluster over one or more datacenters. Maybe all should work with 3 DCs. Then there should be a majority if one DC is going offline.

I think Proxmox should work on a solution i think i'm not the only one who need this.
Hi there,

Dis you make any progress on this please?

Cheers,
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!