Is anybody doing HA between two geographical locations ?

brucexx

Renowned Member
Mar 19, 2015
216
7
83
I wonder if it is possible with Proxmox to do the HA that way with Storage Replication across nodes in two different data centers. Any tried that ? Any other ideas ?

We can get a fast and reliable link between data centers from the same data center provider.

thank you
 
Hi,

no this does not work, and you will create an HF (High Failure) solution.
HA use as Basic corosync what need low latency in the network.
Corosync is made for local networks.
If you have a fast link, it will not be enough
because the latency will increase with the distance between the hosts.
 
Hi, I thing I have an ideea about this(is only an ideea, so be aware). The performance for disk access and network capacity is the problem.

Let say you have data-center A, and data-center B. In A/B you could have many VM/containers. Your goal it can be, let say I want to replicate only some of them from A to B. The Proxmox cluster in A could be different from B. You can setup a lizardfs(cluster file system with replication and distributed options). You will put all yours(critical) VM/containers on the lizardfs(as s separate director/folder under datacenter storage).
In lizardfs you will define a goal... like for any chunck I want N replica(like mirror, raid5, raid6), but in any case I want at least one to be in the datacenter B. So in the end, you will have the same VMs/containers in both data-centers(A and B). So this is the main ideea! For HA, you will need to use others tools ...
 
What about other solutions ? Can for example VMware do this reliably (two geographical locations) , does anybody know ? ...or it needs some crazy requirements to work.

Thank you
 
Hi,
I think that is not possible to have a 100% safety. When on datacenter A you write on a disk, then this block need to be replicated on B. So if before this block was arrive on B, let say that A is now broken. And in this case B it is not knowing that was a block sent from A. Then B will need to wait 2-10 seconds before he can start the data-center fail-over.
I think if you want a data-center replication you will have only 2 options: asyncron replication (like zfs send/receive) and/or syncron replication (like lizardfs). Each of them have their own good/bad things.
And in the end, be aware about split-brain scenario ;) This is the most dangerous problem.
 
We have such a setup.

This is no problem at all. You need some storage system which can do synchronous commit. We are workling with NetApp MetroCluster.
If you write a block the block is written to the nvram on both sides. The clients get the ok back as soon both nvrams have the data.
You can switchover to the other datacenter in less than 1min. Then you need only restart the vms or ha is restarting the vms.

You can do such a setup with up to 200km. I think proxmox will need less distance for corosync.

We have not such a big distance between our datacenters. Should be about 10 to 20km.

Corosync is working without any problem. The storage is doing sync commits so everything is HA and the data is written on both sides.

I don't understand why that should be impossible to have a cluster streched between two datacenters.

10G or 100G between 2 DCs should be enough. The storage has 8 links with 8G.

As i know you need less than 2ms for corosync. Can somebody from the proxmox team confirm this?

2ms are no problem with 10G.

For sure you need to think about split brain. This can be done with a monitoring node in a different location. Some LTE connection is also a good thing for checking split brain. You can also do this with storage fencing like vmware is doing this.
 
The clients get the ok back as soon both nvrams have the data.
... and if you have only one commit, and then your inter data-center link is broken before the 2nd commit can be donne? As I say this is the problem, because in a such case you can loose some data.
So any synchronized replication tool over 2 data-center is not fail-safe. In some cases some small data failures could be acceptable, but for sure is not for any case. As a dummy example let say you have a ERP db, you have a client that make a important deal ... he buy 1000 of X products. Then before you can finish your DB transactions for him, you are offline with your primary data-center. Then after less then 0.5 min (as you say) the backup data-center is now the primary. So the DB will see one unfinished transaction -> rollback and .... I guess you have a big problem.

So as I learn from my own errors, you can not have a fail-safe tool with only 2 hosts/data-center. Only a 3 hosts/datacenter can be ok using any synchronization tool.
 
You can do such a setup with up to 200km. I think proxmox will need less distance for corosync.
We have not such a big distance between our data centers. Should be about 10 to 20km.
Corosync is working without any problem. The storage is doing sync commits so everything is HA and the data is written on both sides.

We have much more distance (1500 Miles) and latency of at least of max 40ms, we could get much lower latency if we get data centers from the same company (currently we have two data center from different companies) but even then we could not below 10ms if we would get 10ms to start with.

Our system/systems don't generate that much data so there is not that much to sync between locations in terms of storage and only several of ours VMs need that setup so by looking at the bandwidth needed I think a 1 Gb link would be more then enough for storage sync.

For our needs it would be acceptable to use the PVE storage replication if it could be done across two PVE clusters - that what I was hoping could be somehow possible.

Thank you
 
For our needs it would be acceptable to use the PVE storage replication if it could be done across two PVE clusters - that what I was hoping could be somehow possible.

The most workable scenario would be to have a replicated storage solution such as @hec is describing, and deal with fault/failover manually on an all-or-nothing approach; essentially you would have a stonith at the network interface level, eg tell the other router/load balancer to shut off traffic to the fenced off cluster.

And yes, you can accomplish this using pve storage replication; proxmox already contains logic to send zfs snapshots (https://pve.proxmox.com/wiki/PVE-zsync) and the same soft of function could be done with ceph as well.
 
Hi there,

Anyone managed tout achieve this kind of setup with Proxmox 5.4 and ceph please?

Cheers ;)
 
Hi,

what if HA is not necessary.

Would it be possible to just have a PVE cluster that is geographically distributed without HA? So you would still be able manage all nodes from one but would manually move VMs to and fro between nodes?

What about CEPH?

Thanks!
 
Would it be possible to just have a PVE cluster that is geographically distributed without HA? So you would still be able manage all nodes from one but would manually move VMs to and fro between nodes?
asynchronous replication will work.

What about CEPH?
Writes will be very slow due to the much longer package travel time. Same problems as with HA.
 
To elaborate a bit on my setup and idea:

I have a local three node cluster in my home lab with dedicated 10G corosync network and with a dedicated 10G Ceph network. This all works fine. The next step will be to implement HA locally but that should not be an issue.

I also have two remote single nodes (in two separate locations; that I currently only use to run PBS for offsite backup but that I might as well to run one the other VM as a backup in case my local cluster goes down) that I would like to add to my cluster to manage all nodes from one GUI. I would not need them to be part of the HA setup and they would also not need to be part of Ceph.

My concern, though, is whether I can integrate them into my cluster seeing that they also can't be part of the local dedicated high speed corosync network. They would need to connect via much slower not dedicated networks (while my local cluster remains on the local 10G dedicated corosync network). Is that possible (or do all nodes of a cluster have to use the same network to connect to the cluster)?

Thanks!
 
asynchronous replication will work.


Writes will be very slow due to the much longer package travel time. Same problems as with HA.
I think I read somewhere that the ping between locations should be less than 5ms or so for the normal cluster network. My pings to the remote destinations are more like 40ms. So I am hesitant to try this...
 
I think a lot of times the desire for this is motivated by a desire for a single pane of glass in management. I wonder if that's on the roadmap - managing multiple "datacenter" objects connected by one management system.

Yes:

I think I read somewhere that the ping between locations should be less than 5ms or so for the normal cluster network. My pings to the remote destinations are more like 40ms. So I am hesitant to try this...

Correct:
Network Requirements
The Proxmox VE cluster stack requires a reliable network with latencies under 5 milliseconds (LAN performance) between all nodes to operate stably. While on setups with a small node count a network with higher latencies may work, this is not guaranteed and gets rather unlikely with more than three nodes and latencies above around 10 ms.

The network should not be used heavily by other members, as while corosync does not uses much bandwidth it is sensitive to latency jitters; ideally corosync runs on its own physically separated network. Especially do not use a shared network for corosync and storage (except as a potential low-priority fallback in a redundant configuration).
https://pve.proxmox.com/wiki/Cluster_Manager#_cluster_network
 
We tried this by renting servers in different datacenters separated by about 500km and had no issues even with the relatively high ping.

I don't have the exact latency because we moved back to closer datacenters.

However with ~30 ZFS VMs, replication every 5 minutes put too big of a load on our storage so we had to lower that to 30 minutes. Which means if a node goes down we have to decide between taking the downtime or losing up to 30 minutes of data, so we decided to not use the built-in HA controller and just switch manually if needed.

When using bare-metal hosting providers you can't generally set up anycast, so I have a few more details on how we handled DNS here https://blog.guillaumematheron.fr/2022/250/proxmox-cluster-on-distant-bare-metal-servers/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!