Proxmox Cluster Latency

Feb 11, 2022
6
0
6
30
Hello there first of all, thank you for taking the time to read this!

I have read about that Corosync needs latency less than 7ms, I get that but I came across a thread on this forum.
Yes:
If you want to have all nodes show up via the Webinterface, but do not need HA, you can just add em to a single Cluster as described here:
https://pve.proxmox.com/wiki/Proxmox_VE_4.x_Cluster
Some pointers:
That should work over a tunnel of your choice (i have not tried that since 4.x - but i do no see why that would not work).


No: it afaik does not work without sticking those nodes into a cluster. You need to make Corosync work over your tunnel and are out of luck. If you do not need HA (which it looks like from your post), then you also do not need a Shared storage.
I'm in the same need to manage all nodes from the GUI and don't need any HA or Shared Storage.

I have currently 3 nodes with about 40-50ms between them, everything seems to work just fine on all of them and they communicate on IPV6.
I'm wondering if everything should be okay or I need to just disconnect them from the cluster and keep them separated and migrate everything by hand.
There is nothing mission critical there that requires High Availability, the biggest thing I might do is offline migrate a VM once in a while.

My worry came from reading this thread
As a CSP who runs Proxmox clouds for a living, I can tell you ... this is a BAD idea.

The problem is, that the cluster network is VERY sensitive to latency. This is why it's recommended, and really a necessity for production clusters, that your cluster traffic be separated out on it's own redundant bond by itself. We run 8 NICs per node to get the redundancy and latency that we need.

Once your latency gets over that threshold (7ms-10ms) the ENTIRE cluster will start having issues. Not just the "remote" node in this case. It can even get so bad that the cluster will start rebooting nodes, and if you're thinking about HA stretched across DCs ... just dont. The only customer we have doing this successfully does it between our Seattle Data Center and Amazon on a direct-connect line with <3ms latency, and buddy, they PAY for that connection.

As Always,
Crain
Thank you!
 
The problem you might run into when you create a cluster with higher latency links is the following. The folder /etc/pve gets synced over all nodes. If there is too much of a delay between the nodes the links get marked as down. This could result in a loss of quorum. When this happens the folder /etc/pve/ is marked as read only. This will lead to not being able to start VMs or change configs.
 
The problem you might run into when you create a cluster with higher latency links is the following. The folder /etc/pve gets synced over all nodes. If there is too much of a delay between the nodes the links get marked as down. This could result in a loss of quorum. When this happens the folder /etc/pve/ is marked as read only. This will lead to not being able to start VMs or change configs.
Thanks for answering!
Is fixing that scenario possible?
 
I think what you really would like to have is something that manages independent nodes over one interface ... but I didn't see anything planed like that.