[SOLVED] Improving Cluster Management Over WAN with High Latency (No HA/common Storage)

apsmr

New Member
Jun 25, 2024
3
0
1
Hi,

I have three PVE servers running at two separate locations, connected via VPN (two in Germany, one in the UK).

They are set up as one cluster - mainly because of the common VM id number space and occasional non-time-critical machine migrations between the sites.
We do not want to use HA in this cluster and there is also no shared disk image/container storage. The cluster is only use to ease vm/container management.

Our issue: We have noticed that the slow WAN connection with its sometimes higher latency causes outages in the cluster file system /etc/pve or different cluster mechanisms. The web interface becomes unusable, and accessing files below /etc/pve results in a timeout. Usually, this resolves on its own, and everything runs smoothly again.

My question: Are there any Corosync/cluster parameters that we could tune to make the cluster more resilient against such network latency-caused hangers for our special use case?

Thanks,

Markus
 
As detailed in the docs, max latency is 10ms [1], although IME being around 5ms max is recomended. Clustering is not supported if latency is over that values.

In your use case, I would simply have independent hosts or at least an independent cluster on each location. Then set the value for Datacenter -> Options -> Next free VMID range to a different range on each PVE cluster/host to avoid overlapping VMIDs. You could use the new PDM [2] to manage all hosts and do live migrations among them.

[1] https://pve.proxmox.com/wiki/Cluster_Manager#_cluster_network
[2] https://forum.proxmox.com/threads/proxmox-datacenter-manager-first-alpha-release.159323/