What is the actual reason for X nodes count limit?

tempacc375924

Member
Nov 18, 2023
103
11
18
EDIT: Changed the number 32 from title to X.

1) What is the actual reason for a limit of the number of nodes (other than saying vaguely network as below [1]) when one could put management network on separate switch capable of tens of millions pps forwarding rate? It cannot do with saturating the network. It has to do with latency / jitter?

2) Why did corosync switch away from multicast?

I have already searched through and found threads such as this one (nothing specifically explanatory):
[1] https://forum.proxmox.com/threads/max-cluster-nodes-with-pve6.55277/
 
Last edited:
Where did you get the number ~32 from? The manual states 2021: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_pvecm

From the threads the search found (admittedly pre-2021) as in the quoted one going on with "haven't tested it yet with more than 36" from staff (2019). Even if it was 50, it is tens, it is not 100.

It always refers to the network being limitation. I just wish to be e.g. able to calculate it (not just number X worked for someone, other variables unknown). It clearly has to be limitation in relation to how corosync keeps the clusterdb in sync. I would view the fact that corosync moved from multicast to unicast actually as worrying in this sense (as was expressed in the quoted thread too).

I am not looking as much for exact number or YMMW, I want to understand what corosync might run into when you have e.g. 100+ instances chatting around. It clearly has nothing to do with how e.g. beefy the server hardware is. Also trying to dig out the multicast to unicast rationale (not decision by PVE team in itself).
 
I went ahead and changed the title to whatever arbitrary number. I think the original person asking might have come with their idea from vSpere (at the time), they then went on to add up 64-96 in 2020 as their arbitrary limit. But this is PVE forum, I understand there's no hard limit in PVE that stops one at 50 or 100. But the good takeaways from the quoted thread:

- (about unicast traffic) "e.g., for 6 packets sent/received cluster-wide instead of 4 for 3 nodes, 10/6 for 10 nodes, n/((n/2)+1) for n nodes"
- (about pps and bandwidth) "a few thousand pps and a few mb/s of traffic (all 36 nodes writing non-stop on pmxcfs). without load, a few hundred pps and a few hundred kb/s"

That sounds like nothing for even 1GbE ...

The other thing at the time was switch from multicast to unicast and (quoted from the same):

"After that I upgraded Proxmox 5.4 to 6.1, my cluster is not able to support more than 14 nodes..."
 
Last edited:
the algorithm used to establish consensus/quorum doesn't really scale to huge cluster sizes (or at least not within the time frames we require for clustering ;)).
 
the algorithm used to establish consensus/quorum doesn't really scale to huge cluster sizes (or at least not within the time frames we require for clustering ;)).

Thanks for chipping in Fabian! The "algorithm" meaning solely the use of corosync on kronosnet with passive link mode on UDP transport?
 
the algorithm meaning "paxos" as implemented by corosync. it is not related to the underlying transport or link mode.
 
  • Like
Reactions: tempacc375924
the algorithm meaning "paxos" as implemented by corosync. it is not related to the underlying transport or link mode.

I thought corosync was all running totem (vs paxos being entirely different - if it was a typo, I'll take it like you meant totem anyhow and), will go do my research some more though. Thanks for the reply.
 
Last edited:
yes! sorry for adding to the confusion. corosync uses totem as replication mechanism/protocol, similar scalability issues would affect paxos-based implementations as well though :) this is the reason why distributed systems that want to handle bigger cluster sizes usually employ either a sort of core <-> satellite architecture (e.g., where only the core nodes are allowed to take decisions, and broadcast the results to the satellite nodes that can only act on them) or reduce the consistency requirements. we are working on a different approach - a common management plane and the necessary integration to allow cross-cluster operations - the remote migration preview feature is the first bigger (released) block of that effort.
 
  • Like
Reactions: tempacc375924

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!