Proxmox with 48 nodes

You need two corosync links. For 12 nodes on gigabit I would use dedicated links for both, just in case, even if having it just for Link0 would be enough. Max I've got in production with gigabit corosync is 8 hosts, no problems at all.
 
  • Like
Reactions: Johannes S
My corosync network is a 1000M independent network card. Can it support a PVE cluster with 12 Node nodes?
The number of cluster members is an inexact limit. that ACTUAL limit has to do with how much data the cluster members have to keep synchronized- if each of your cluster members had 400vms with continuous api traffic- your cluster would probably die due to tripping timeouts. if you have 5vms mostly sitting there and not being molested you'll be just fine.
 
For bigger clusters than that, fine-tuning might be necessary. We are currently working on guidance on how to work with bigger clusters. For the time being I would recommend to split this into smaller clusters.
A status update on this:

Two corosync parameters that are especially relevant for larger clusters are the "token timeout" and the "consensus timeout". When a node goes offline, corosync (or rather the totem protocol it implements) will need to go through at least one full "token timeout" and one full "consensus timeout" to form a membership without that node.

The token timeout is, by default, calculated based on the number of nodes multiplied by the "token coefficient" (plus a base value), see [0] for the exact formula. The consensus timeout is, by default, defined as 1.2 * token timeout [1].

To avoid unwanted node fencing if HA is active on that node, these two timeouts together should not become much larger than 40-45 seconds. Both timeouts depend on the number of nodes. For up to ~25 nodes, the default timeouts are within that range and no adjustment is necessary. On larger (approximately more than 25 nodes) clusters it is necessary to lower these timeouts.

We are currently performing tests to come up with general recommendations how to tweak the corosync configuration to achieve this. But this will not be ready before the holidays. At the moment, decreasing the token coefficient (which is 650ms by default) seems like the most promising option. We are still performing tests to find a good default. Currently, our tests suggest it should be safe to lower token the coefficient to 125ms (if the network is stable with low latency), but going to 325ms should already be enough for clusters with 30-40 nodes as discussed here.

Until our tests have concluded, if you would like to help and if you have the resources to test, one option would be to test whether decreasing the token coefficient solves the issue of unwanted node fencing in case a node goes offline for you, and otherwise maintains cluster stability as well.

If you want to test that: First make sure your cluster network fulfills the requirements [2], most importantly, stable low latency (<5ms is required, but I'd recommend <1ms). Then, edit /etc/pve/corosync.conf [3] and add inside the totem section the option token_coefficient, for example token_coefficient: 125 which will lower the token coefficient to 125ms. Of course you can also set other values (as noted above, token_coefficient: 325 should be enough for the clusters in the range of 30-40 nodes) or go back to the default. Don't forget to increase config_version [3]. You need to restart corosync on each node for the change to fully take effect. EDIT 2026-01-08: If you're using HA, you may want to disarm HA [1] before the config change, and re-enable it afterwards.

If you test a custom token coefficient, we would greatly appreciate if you report back with your observations.

In general, and as noted by others, going for several smaller clusters instead of one larger cluster can have other operational advantages, especially if you also use the Proxmox Datacenter Manager which recently saw the stable 1.0 release [4].

[0] https://manpages.debian.org/trixie/corosync/corosync.conf.5.en.html#token_coefficient
[1] https://manpages.debian.org/trixie/corosync/corosync.conf.5.en.html#consensus
[2] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_cluster_network_requirements
[3] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_edit_corosync_conf
[4] https://pdm.proxmox.com/
[5] https://forum.proxmox.com/threads/c...-soon-as-i-join-a-new-node.116804/post-510633
 
Last edited:
Hi, did anybody test a custom (lower) token coefficient and wants to share their observations?
 
Hi, did anybody test a custom (lower) token coefficient and wants to share their observations?
Hi,

We have seen intermittent reloads of cluster nodes on a 3-node cluster with the default token_coefficient of 125. Another cluster without this setting, with identical hardware and setup, has been running without issues for several months.

We have found this change in our Corosync config today and reverted it to the default Corosync setting (650ms).

I believe it's too early to say if this is/was the cause of our issues; we have seen hits in the journalctl that corosync was causing issues on the failing node.

Code:
May 12 18:21:12 pve02 corosync[3419]:   [KNET  ] link: host: 3 link: 0 is down
May 12 18:21:15 pve02 corosync[3419]:   [KNET  ] link: host: 1 link: 0 is down
May 12 18:21:17 pve02 corosync[3419]:   [KNET  ] link: host: 1 link: 1 is down
May 12 18:21:17 pve02 corosync[3419]:   [KNET  ] host: host: 3 (passive) best link: 1 (pri: 1)
May 12 18:21:18 pve02 corosync[3419]:   [TOTEM ] Token has not been received in 2392 ms
May 12 18:21:20 pve02 corosync[3419]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3125ms), waiting 3750ms for consensus.
May 12 18:21:22 pve02 corosync[3419]:   [KNET  ] link: host: 3 link: 1 is down
May 12 18:21:24 pve02 corosync[3419]:   [TOTEM ] Process pause detected for 1744 ms, flushing membership messages.
May 12 18:21:26 pve02 corosync[3419]:   [TOTEM ] Token has not been received in 9567 ms
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] host: host: 1 has no active links
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] host: host: 1 has no active links
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] host: host: 3 (passive) best link: 1 (pri: 1)
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] host: host: 3 has no active links
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] host: host: 3 (passive) best link: 1 (pri: 1)
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] host: host: 3 has no active links
May 12 18:21:35 pve02 corosync[3419]:   [QUORUM] Sync members[1]: 2
May 12 18:21:35 pve02 corosync[3419]:   [QUORUM] Sync left[2]: 1 3
May 12 18:21:35 pve02 corosync[3419]:   [TOTEM ] A new membership (2.103) was formed. Members left: 1 3
May 12 18:21:35 pve02 corosync[3419]:   [TOTEM ] Failed to receive the leave message. failed: 1 3
May 12 18:21:35 pve02 corosync[3419]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
May 12 18:21:35 pve02 corosync[3419]:   [QUORUM] Members[1]: 2
May 12 18:21:35 pve02 corosync[3419]:   [MAIN  ] Completed service synchronization, ready to provide service.
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] rx: host: 1 link: 1 is up
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] link: Resetting MTU for link 1 because host 1 joined
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] host: host: 1 (passive) best link: 1 (pri: 1)
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] pmtud: Global data MTU changed to: 1397
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] rx: host: 3 link: 0 is up
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] rx: host: 3 link: 1 is up
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] link: Resetting MTU for link 1 because host 3 joined
May 12 18:21:35 pve02 corosync[3419]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
May 12 18:21:36 pve02 corosync[3419]:   [KNET  ] rx: host: 1 link: 0 is up
May 12 18:21:36 pve02 corosync[3419]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
May 12 18:21:36 pve02 corosync[3419]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
May 12 18:21:36 pve02 corosync[3419]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
May 12 18:21:36 pve02 corosync[3419]:   [QUORUM] Sync members[3]: 1 2 3
May 12 18:21:36 pve02 corosync[3419]:   [QUORUM] Sync joined[2]: 1 3
May 12 18:21:36 pve02 corosync[3419]:   [TOTEM ] A new membership (1.10b) was formed. Members joined: 1 3
May 12 18:21:36 pve02 corosync[3419]:   [QUORUM] This node is within the primary component and will provide service.
May 12 18:21:36 pve02 corosync[3419]:   [QUORUM] Members[3]: 1 2 3
May 12 18:21:36 pve02 corosync[3419]:   [MAIN  ] Completed service synchronization, ready to provide service.
May 12 18:22:06 pve02 corosync[3419]:   [KNET  ] pmtud: Global data MTU changed to: 1397
May 12 18:22:25 pve02 corosync[3419]:   [TOTEM ] Token has not been received in 2354 ms
May 12 18:23:09 pve02 corosync[3419]:   [KNET  ] link: host: 3 link: 1 is down
May 12 18:23:09 pve02 corosync[3419]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
May 12 18:23:09 pve02 corosync[3419]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
May 12 18:23:10 pve02 corosync[3419]:   [KNET  ] rx: host: 3 link: 1 is up
May 12 18:23:10 pve02 corosync[3419]:   [KNET  ] link: Resetting MTU for link 1 because host 3 joined
May 12 18:23:10 pve02 corosync[3419]:   [KNET  ] pmtud: Global data MTU changed to: 1397
May 12 18:23:10 pve02 corosync[3419]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
May 12 18:23:10 pve02 corosync[3419]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
May 12 18:23:48 pve02 corosync[3419]:   [KNET  ] link: host: 3 link: 0 is down
May 12 18:23:49 pve02 corosync[3419]:   [KNET  ] link: host: 3 link: 1 is down
May 12 18:23:50 pve02 corosync[3419]:   [KNET  ] link: host: 1 link: 0 is down
May 12 18:23:51 pve02 corosync[3419]:   [KNET  ] link: host: 1 link: 1 is down
May 12 18:23:51 pve02 corosync[3419]:   [TOTEM ] Token has not been received in 2443 ms
May 12 18:23:51 pve02 corosync[3419]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3125ms), waiting 3750ms for consensus.
May 12 18:23:55 pve02 corosync[3419]:   [KNET  ] rx: host: 1 link: 1 is up
May 12 18:23:55 pve02 corosync[3419]:   [KNET  ] link: Resetting MTU for link 1 because host 1 joined
May 12 18:23:55 pve02 corosync[3419]:   [QUORUM] Sync members[1]: 2
May 12 18:23:55 pve02 corosync[3419]:   [QUORUM] Sync left[2]: 1 3
May 12 18:23:55 pve02 corosync[3419]:   [TOTEM ] A new membership (2.10f) was formed. Members left: 1 3
May 12 18:23:55 pve02 corosync[3419]:   [TOTEM ] Failed to receive the leave message. failed: 1 3
May 12 18:23:55 pve02 corosync[3419]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
May 12 18:23:55 pve02 corosync[3419]:   [QUORUM] Members[1]: 2
May 12 18:23:55 pve02 corosync[3419]:   [MAIN  ] Completed service synchronization, ready to provide service.
May 12 18:23:55 pve02 corosync[3419]:   [KNET  ] rx: host: 3 link: 0 is up
May 12 18:23:55 pve02 corosync[3419]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
May 12 18:23:55 pve02 corosync[3419]:   [KNET  ] rx: host: 1 link: 0 is up
May 12 18:23:55 pve02 corosync[3419]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
May 12 18:23:55 pve02 corosync[3419]:   [KNET  ] rx: host: 3 link: 1 is up
May 12 18:23:55 pve02 corosync[3419]:   [KNET  ] link: Resetting MTU for link 1 because host 3 joined
May 12 18:23:55 pve02 corosync[3419]:   [QUORUM] Sync members[1]: 2
May 12 18:23:55 pve02 corosync[3419]:   [TOTEM ] A new membership (2.113) was formed. Members
May 12 18:23:55 pve02 corosync[3419]:   [QUORUM] Members[1]: 2
May 12 18:23:55 pve02 corosync[3419]:   [MAIN  ] Completed service synchronization, ready to provide service.
May 12 18:23:55 pve02 corosync[3419]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
May 12 18:23:55 pve02 corosync[3419]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
May 12 18:23:55 pve02 corosync[3419]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
May 12 18:23:55 pve02 corosync[3419]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
May 12 18:23:55 pve02 corosync[3419]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
May 12 18:23:55 pve02 corosync[3419]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
May 12 18:23:55 pve02 corosync[3419]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
May 12 18:23:55 pve02 corosync[3419]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
May 12 18:23:55 pve02 corosync[3419]:   [KNET  ] pmtud: Global data MTU changed to: 1397
May 12 18:23:57 pve02 corosync[3419]:   [TOTEM ] Token has not been received in 2359 ms
May 12 18:23:58 pve02 corosync[3419]:   [QUORUM] Sync members[1]: 2
May 12 18:23:58 pve02 corosync[3419]:   [TOTEM ] A new membership (2.11b) was formed. Members
May 12 18:23:58 pve02 corosync[3419]:   [QUORUM] Sync members[3]: 1 2 3
May 12 18:23:58 pve02 corosync[3419]:   [QUORUM] Sync joined[2]: 1 3
May 12 18:23:58 pve02 corosync[3419]:   [TOTEM ] A new membership (1.11f) was formed. Members joined: 1 3
May 12 18:23:58 pve02 corosync[3419]:   [QUORUM] This node is within the primary component and will provide service.
May 12 18:23:58 pve02 corosync[3419]:   [QUORUM] Members[3]: 1 2 3
May 12 18:23:58 pve02 corosync[3419]:   [MAIN  ] Completed service synchronization, ready to provide service.
May 12 18:24:45 pve02 corosync[3419]:   [KNET  ] link: host: 3 link: 1 is down
May 12 18:24:46 pve02 corosync[3419]:   [KNET  ] link: host: 1 link: 0 is down
May 12 18:24:46 pve02 corosync[3419]:   [KNET  ] link: host: 1 link: 1 is down
May 12 18:24:47 pve02 corosync[3419]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
May 12 18:24:47 pve02 corosync[3419]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
May 12 18:24:47 pve02 corosync[3419]:   [KNET  ] host: host: 1 has no active links
May 12 18:24:48 pve02 corosync[3419]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
May 12 18:24:48 pve02 corosync[3419]:   [KNET  ] host: host: 1 has no active links
May 12 18:24:48 pve02 corosync[3419]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
May 12 18:24:48 pve02 corosync[3419]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
May 12 18:24:48 pve02 corosync[3419]:   [KNET  ] rx: host: 3 link: 1 is up
May 12 18:24:48 pve02 corosync[3419]:   [KNET  ] link: Resetting MTU for link 1 because host 3 joined
May 12 18:24:48 pve02 corosync[3419]:   [KNET  ] rx: host: 1 link: 1 is up
May 12 18:24:48 pve02 corosync[3419]:   [KNET  ] link: Resetting MTU for link 1 because host 1 joined
May 12 18:24:48 pve02 corosync[3419]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
May 12 18:24:48 pve02 corosync[3419]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
May 12 18:24:48 pve02 corosync[3419]:   [TOTEM ] Retransmit List: 141
May 12 18:24:48 pve02 corosync[3419]:   [KNET  ] pmtud: Global data MTU changed to: 1397
May 12 18:26:40 pve02 corosync[3419]:   [TOTEM ] Token has not been received in 2449 ms
May 12 18:26:54 pve02 corosync[3419]:   [KNET  ] link: host: 3 link: 0 is down
May 12 18:27:00 pve02 corosync[3419]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3125ms), waiting 3750ms for consensus.
May 12 18:27:04 pve02 corosync[3419]:   [KNET  ] link: host: 3 link: 1 is down
 
could you give more details about your setup? network, hardware, ..? for three nodes the coefficient makes almost no difference in practice..
 
  • Like
Reactions: Johannes S
could you give more details about your setup? network, hardware, ..? for three nodes the coefficient makes almost no difference in practice..
3 nodes (HPE DL360 Gen10)
On-board 1GE link to oob/mgmt switch for "link 0".
2 2-port 25/10GE NICs with 1 port each in an LACP (802.3ad) bonding to Nexus switches (in a VPC setup) with 10GE DAC cables 802.1q. "link 1" is running on an SDN VLAN interface.
 
could you post the full logs (at least corosync unit and kernel), network and corosync configs of the affected cluster?
 
  • Like
Reactions: Johannes S