high latency clusters

esi_y

Active Member
Nov 29, 2023
796
105
43
The docs [1] state that:

The Proxmox VE cluster stack requires a reliable network with latencies under 5 milliseconds (LAN performance) between all nodes to operate stably. While on setups with a small node count a network with higher latencies may work, this is not guaranteed and gets rather unlikely with more than three nodes and latencies above around 10 ms.

What exactly in the stack requires the low latencies? Anything other than HA?

[1] https://pve.proxmox.com/wiki/Cluster_Manager#_cluster_network
 
Corosync, eg cluster networking, not HA directly.
The reason I asked is specifically because corosync itself can happily go from e.g. default values to much higher numbers for token timeout, even the defaults changed [1] over time. RHEL has maximum as 300 seconds even [2] for their support. Other values related to ping and pong can be changed too.

I can imagine when HA is hardcoded to detect within a minute and recover in total within 2, this would be problem, but what else in the PVE stack does indeed require the low latency?

EDIT: A valid example with RHEL9, for instance [3] under "3.6. MODIFYING THE COROSYNC.CONF FILE WITH THE PCS COMMAND":
The following example command udates the knet_pmtud_interval transport value and the token and join totem values.
Code:
# pcs cluster config update transport knet_pmtud_interval=35 totem token=10000 join=100

And the join otherwise defaults to 50ms.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1870449
[2] https://access.redhat.com/articles/3068821
[3] https://access.redhat.com/documenta...managing_high_availability_clusters-en-us.pdf
 
Last edited:
The more I have been testing (the increased values), the more I think it's arbitrary limit due to how HA component of the stack is also shipped with hardcoded values.

Anyone running their cluster across WAN long-term?

Note I am not endorsing using it with HA. (Before I get dose of responses to read the docs which I did, I just look for reasons which are undocumented, including from my further searches.)
 
it's not just HA, but basically anything interacting with pmxcfs (/etc/pve , but also some direct interfaces offered by it) that would start breaking if the token processing would take longer.
 
it's not just HA, but basically anything interacting with pmxcfs (/etc/pve , but also some direct interfaces offered by it) that would start breaking if the token processing would take longer.

Thanks a lot for this one as well, Fabian.

But could you be a little bit more specific, please? I understand that e.g. spinning up a new VM on separate nodes in a "laggy" corosync could end up with e.g. duplicate IDs for those, in respect to pmxcfs. But is there anything else that cannot be foreseen to break? Is this about the IPC calls?
 
no, you wouldn't end up with duplicate IDs, just a lot of lag/blocking when writing to /etc/pve (best case), or so much back pressure that corosync can't keep up anymore and stuff starts timing out (including things like lock requests for cluster-wide config files, broadcasts, but also synchronous API requests which have a timeout of 30s for completion of request handling!)
 
  • Like
Reactions: VictorSTS
no, you wouldn't end up with duplicate IDs, just a lot of lag/blocking when writing to /etc/pve (best case), or so much back pressure that corosync can't keep up anymore and stuff starts timing out

Fair enough, I was mostly waving that one off as without HA and if one has otherwise own mechanism to avoid duplicate ID, it would be of no concern.

(including things like lock requests for cluster-wide config files, broadcasts, but also synchronous API requests which have a timeout of 30s for completion of request handling!)

I know I am testing your patience here, but I really just want to where is the requirement coming from:

a) if I increased the API timeout or stayed well under 30 seconds (realistically corosync would be fine with 10s token for even across-the-world resources cluster); AND

b) never have any duplicate ID issue; AND

c) do not use HA

... am I still at risk?

The lock requests for cluster-wide configs must also have a time-out somewhere hardcoded, correct?

EDIT: Or do I risk a deadlock somewhere with such high latencies?

(NB The reason I am asking is to have an idea what will be breaking, I will not be endorsing it here or asking for support for such setup.)
 
Last edited:
the assumption that writes to /etc/pve finish within a reasonable amount of time is hard-coded everywhere. increasing the token timeout/latency breaks that assumption and will cause issues across the board. if each write suddenly blocks for 5s, unless your cluster is totally inactive, nothing will work anymore, since those blocking writes will quickly accumulate and effectively stall/timeout all changes.
 
  • Like
Reactions: esi_y
This is more interesting not because of OP's original question but because of how it is related to the practical max cluster size.

pmxcfs works great as-is and the max cluster size, for 99% of people's use, is high enough.

Far into the future, what are everyone's ideas for how this could be pushed without sacrificing synchronousness? Or is there absolutely no way, and do we just need to explore managing multiple clusters and build an async inter-cluster interface?

It would be cool if there was a container above "Datacenter" in the pve resource tree that could contain many clusters in many sites.
 
Yes I'm aware of the 4 year old thread with 50 +1's on it.

In that amount of time, surely there have been multiple ideas kicked around internally, and not shared with us.
 
Last edited:
Yes I'm aware of the 4 year old thread with 50 +1's on it.

In that amount of time, surely there have been multiple ideas kicked around internally, and not shared with us.
No worries, I just wanted to put it there for reference for anyone else. Then, go ahead and take over the thread. I got what I wanted, so you can expand it. ;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!