max cluster nodes with pve6?

udo

Distinguished Member
Apr 22, 2009
5,977
199
163
Ahrensburg; Germany
Hi,
with pve 6 the new corosync version is used.
Are there any changes for the amount of cluster nodes in one cluster?

If I remember right, for now is the limit 32 nodes, but less are recommendet (amount?).



Udo
 
in theory it supports more nodes, but we haven't tested it yet with more than 36. the new underlying transport library does away with multicast though, so no need for special switch configurations anymore. the performance via pmxcfs seems to be about the same - there are a bit more packets going over the wire when stress-testing it, and of course with unicast instead of multicast the amount of traffic sent out from each node is proportionally more than with multicast, depending on the cluster size (e.g., for 6 packets sent/received cluster-wide instead of 4 for 3 nodes, 10/6 for 10 nodes, n/((n/2)+1) for n nodes).

we fixed an issue with traffic spikes when cold-booting a cluster (such as installing upgrades and restarting corosync on all nodes, or loss of power and subsequent simultaneous start of all nodes). the general picture remains the same though - the bigger the cluster, the faster/low-latency the link should be to keep up (and the more interesting some adjusting knobs might become ;))
 
  • Like
Reactions: udo
36 nodes on 1 gigabit or on 10?

we ran our tests with 10 and 100gbit networking, dedicated to corosync. we are looking forward to more feedback from setups in the wild now that 6.0 is out!
 
Hello,

And about the bandwidth for a 36 nodes cluster, what kind of traffic (Mbps) should we expect ?
 
Hello,

And about the bandwidth for a 36 nodes cluster, what kind of traffic (Mbps) should we expect ?

in our stress tests we saw a few thousand pps and a few mb/s of traffic (all 36 nodes writing non-stop on pmxcfs). without load, a few hundred pps and a few hundred kb/s. production load will likely be somewhere inbetween ;)
 
After that I upgraded Proxmox 5.4 to 6.1, my cluster is not able to support more than 14 nodes...
Hi,
I've an 16 node cluster with corosync3 running (7 nodes still on pve5.4) without trouble.
Corosync is running on 1GB - but with an second path (ring 1) on 10GB (1 GB on 2 nodes).

Any errors in the logs?
Can you use an second path?
All nodes in one DC with small latencies?

Udo
 
Hi @udo ,
I can use only a single path for now.
Attached there's the latest logs before everything became unresponsive with an "unknown state (grey question mark)" on each node.
Problems starts when TOTEM informs that "Token has not been received in xxx ms"
 

Attachments

  • LOGS.txt
    63 KB · Views: 11
i would be interested to know if something like a gradual cluster would be possible ie. :

- having 2 clusters at different geographical locations with a failover from cluster #1 to cluster #2 , in essence having a cluster of clusters...

(also referenced by some software vendors as a Business Continuity Cluster ( BCC)
 
  • Like
Reactions: Tmanok
Hi @udo ,
I can use only a single path for now.
Attached there's the latest logs before everything became unresponsive with an "unknown state (grey question mark)" on each node.
Problems starts when TOTEM informs that "Token has not been received in xxx ms"
Hi,
how look the output of
Code:
pvecm nodes
pvecm status
?

Have you tried to restart corosync on all nodes (one after one)?
Code:
systemctl restart corosync

Udo
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!