Cluster over high latency WAN?

seteq

Renowned Member
Dec 15, 2015
21
3
68
39
Is there a possibility to have the promox nodes of 3 separate standalone datacenters added to just one web gui view?

The datacenters do only have a low-bandwidth high-latency ipsec tunnel between them. There is no need to share their resources or do any kind of failover.

Can I use clustering to acheive that or is there another (better) way?

Thank you
 
I know, but it would be nice when there is a possibility to manage non-clustered hosts in a single webgui... Is that possible?
 
Is there a possibility to have the promox nodes of 3 separate standalone datacenters added to just one web gui view?
The datacenters do only have a low-bandwidth high-latency ipsec tunnel between them. There is no need to share their resources or do any kind of failover.

For personal curiosity;
I have done this using Proxmox 3.x, openVPN and 3 OVH-Lowcost Boxes in Strassbourg , Gravelines and Barnharnois (Canada). I used a OpenVPN running on a 4rd Party VM VM in Frankfurt. (Ping times from one Proxmox-node to another where around 50-190 ms, maximum of 2 Mbit/s throughput) Simple WEB-Gui stuff worked. Albeit back then i had to use uni-cast. Everything else is painfull. There was a lot of painfull tinkering and testing involved (like making sure you "break" Quroum-votes, so that your long-distance Instance is not taken into account).

You will spent a lot of time restarting the proxmox-services that have become out of whack from my experience with 3.x.


Regardless with 3.x it was a lot in the end to just do Proxmox-Clusters for all 3 Data-Centers. Use a centrally located Storage as NFS connected to 3x openVPN running on each Regional-Cluster. Then Migrate everything by hand if needed.


Needless to say: its a PITA.

Never done this with 3.4 or 4.x
 
For clarification: I do not want and need any form of clustering between the datacenters. I'm just asking for easier management.
I hoped, that there may be a hidden or undocumented feature, that can simply let you add another node or cluster to the web gui for management only.
It would be nice to have all clusters/nodes in one management gui (like vmware vcenter does that)

Are there any plans to implement that?
 
Yes and no.

Yes:
If you want to have all nodes show up via the Webinterface, but do not need HA, you can just add em to a single Cluster as described here:
https://pve.proxmox.com/wiki/Proxmox_VE_4.x_Cluster
Some pointers:
That should work over a tunnel of your choice (i have not tried that since 4.x - but i do no see why that would not work).


No: it afaik does not work without sticking those nodes into a cluster. You need to make Corosync work over your tunnel and are out of luck. If you do not need HA (which it looks like from your post), then you also do not need a Shared storage.

Question:
How far apart are your nodes ? whats the latency between nodes ?
 
Thank you for your reply!
Maybe I did not explain that detailled enough...
I provide Proxmox based IaaS-Services to my customers. It's like a leased virtual infrastructure.
Every datacenter has an ipsec tunnel to my head office for management and monitoring of the proxmox-hosts/clusters, storage and other infrastructure.

At the moment I operate 3 completely isolated clusters - each with 4 hosts and redundant storage and networking.
There is absolutely no need to migrate ANYTHING between those three clusters.
I just thought it might be nice and useful to have all clusters and nodes in one management gui.

Network latency across the three locations is something about 25-35ms (measured across ipsec tunnel)
 
  • Like
Reactions: Uldis Valneris
Thank you for your reply!
Maybe I did not explain that detailled enough...
I provide Proxmox based IaaS-Services to my customers. It's like a leased virtual infrastructure.
Every datacenter has an ipsec tunnel to my head office for management and monitoring of the proxmox-hosts/clusters, storage and other infrastructure.

At the moment I operate 3 completely isolated clusters - each with 4 hosts and redundant storage and networking.
There is absolutely no need to migrate ANYTHING between those three clusters.
I just thought it might be nice and useful to have all clusters and nodes in one management gui.

Network latency across the three locations is something about 25-35ms (measured across ipsec tunnel)

Then you should be fine to add your nodes to a cluster (32 nodes max afaik). Run Proxmox backend services (ie corosync) via its own network via the tunnel and then just disable migration and cross-node backups and the sort for your customers. You do not need any shared storage then and should be able to manage what ever you to run this via the Proxmox CLuster File system.

Check https://pve.proxmox.com/wiki/Separate_Cluster_Network
to be sure i'd prototype it, this is just academic from my point of view, but i have setup Proxmox-Clusters using Virtual and dedicated machines on the work Lab (2 local Datacenters - 3-6 ms apart)
 
Interesting topic in discussion. Is it feasible to setup a cluster across WAN - just for central management (single web gui) and offline vm migration across nodes. No need for HA.

In particular, does Proxmox cluster supports offline migration of 'linked-clone' across nodes i.e. moving linked-clone+base-image from one node to another node - still maintaining as linked-clone after migrating across the new node?
 
Last edited:
At work we do run a Cluster with nodes in three data centers.
The network has very low latency tho, since it is our own fibre and network gear all the way. The datacenters are <10km distance from each other too.


Back in 2013, i ran a three node Cluster for project using OVH servers. 1 in Canada, 2 in France, with a redundant VPN provider in Frankfurt.
120 ms ping times via the tunnel. It worked but was sluggish.


Hope it helps.
 
Yes, this helps thanks. If that is the case due to high latency, you can create multiple sets of clusters - one in each DC for central management?

Any idea if cluster can support linked-clone offline migration?
 
I've been running a cluster over an L2TP/IPSEC VPN with the management/corosync interfaces in a single subnet via layer 2 for several years. It worked great -- obviously with the exception of storms or other network failures between the two sites -- until proxmox v6. The cluster breaks, requiring one or more nodes to have corosync restarted. This happens randomly. Sometimes the cluster runs for a day, sometimes five minutes.

I've researched, but I really haven't found anything giving rational instructions on ways to decrease the sensitivity of the timing in corosync.

Yes, this needs to be a cluster. No, I rarely do live migration between the sites, but I have. Site "A" is on a 250Mb/250Mb connection, site "B" is on a 1Gb/50Mb connection with a different provider. Ping time is normally 35-40ms. Cluster is currently three hosts, total of 4 sockets/36 cores, 256GB/48TB running between 35-50 vms depending on what I'm working on.

I've reset corosync twice while writing this message. Ping times have been steady the entire period. The only thing that has changed since this started is the upgrade from 5 to 6.
 
I've been running a cluster over an L2TP/IPSEC VPN with the management/corosync interfaces in a single subnet via layer 2 for several years. It worked great -- obviously with the exception of storms or other network failures between the two sites -- until proxmox v6. The cluster breaks, requiring one or more nodes to have corosync restarted. This happens randomly. Sometimes the cluster runs for a day, sometimes five minutes.

I've researched, but I really haven't found anything giving rational instructions on ways to decrease the sensitivity of the timing in corosync.

Yes, this needs to be a cluster. No, I rarely do live migration between the sites, but I have. Site "A" is on a 250Mb/250Mb connection, site "B" is on a 1Gb/50Mb connection with a different provider. Ping time is normally 35-40ms. Cluster is currently three hosts, total of 4 sockets/36 cores, 256GB/48TB running between 35-50 vms depending on what I'm working on.

I've reset corosync twice while writing this message. Ping times have been steady the entire period. The only thing that has changed since this started is the upgrade from 5 to 6.

Same thing happening to me since upgrade to PVE 6. Sometimes (not that often though as you describe) I find my cluster as "broken" due to corosync issues. What I do on "disconnected" nodes is simply:

Code:
killall -9 corosync
systemctl restart pve-cluster

Than it works flawlessly. It seems that sometimes network is slow or overloaded for a (very) short period of time and it breaks corosync when it works over WAN connection. What makes me wonder - is why do I have to restart it so it gets synced again? And yes, are there any parameters we can change to make it not that sensible? Have u got any experience since your last post on that? :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!