Cluster network clarification

dj423

Member
Oct 10, 2023
98
31
18
While reading up on clustering here: https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_cluster_network_requirements

I am migrating some hardware that ran a Xen pool of hosts to a shared zfs array, and bringing up a Proxmox cluster to move everything to gradually.

I noticed the requirement for a separate network, even mentions "should be on a physically separate network". Now I normally use tagged interfaces on the host nodes / VMs to a pair of trunk ports in LACP/bond pair of nics) . Does this mean a separate physical NIC, or can I use VLAN's to separate data/storage/cluster traffic like I normally do for other virtualization pools? I am running Intel X520 dual SFP cards to a 10G switch, so I would think throughput should be fine on dual 10G links to the NAS. Just wanted to confirm that, rather than assume and find some bottlenecks I didn't foresee.
Thanks!
 
The need for dedicated network(s) for cluster traffic isn't related to bandwidth needs but to latency / jitter requirements and network availability. Corosync needs low latency between nodes (max ~5ms) and latency must be stable, as Corosync is sensible to jitter. In shared networks, traffic like that for your storage may saturate the links, increasing the latency and/or affecting jitter, making Corosync believe that some or all nodes are unreachable. When using HA, that may produce unwanted node reboots due to fencing[1].

Also, Corosync is able to detect link failures way faster than any bonding method. That's why the recommended setup uses two independent nic's for Link0 and Link1 instead of using a bond.

If possible, use a dedicated nic at least for Corosync Link0. If not, make sure Corosync traffic is delivered with high priority to reduce the chances of other traffic interfering with Corosync.

Finally, all this is barely useful if all your nic's and bond's go to one single switch, as it will be a SPOF and any problem with it will take the cluster down.

[1] https://pve.proxmox.com/wiki/High_Availability#ha_manager_fencing
 
  • Like
Reactions: dj423