New cluster network question

dj423

Member
Oct 10, 2023
98
31
18
In the process of setting up a new cluster and I have a few options it seems to configure the networks. The servers for these have a total of 4 NIC interfaces, 2 of them are 10Gb SFP+ bonded in LACP mode, being the link for vmbr1 where I then tag the vlan on the VM interface when connected to vmbr1. The 2 onboard intel NIC's are setup as untagged to ports on the switch, on separate vlans, 99 being management traffic, and 45 for the cluster traffic,

Layout as currently:
Intel onboard:
enp4s0: = 1Gb MGT Labeled 1gb (management interface on management vlan99)
enp3s0: = 1Gb Labeled 2.5Gb (link for corosync traffic on vlan45)

Intel X710:
bond0:
enp1s0f0: = 10Gb
enp1s0f1: = 10Gb

bond0 is essentially a trunk port bonded to the switch, which is a MikroTik CRS317, with all trunk ports as well.

My question is - for the sake of reliability - would it serve the stack more to bond the 2 onboard NIC's and just add vlan tagged interfaces to vlan45 for my corosync traffic, or just leave it as is with the single NIC? Can also setup a fallback link for corosync to a tagged interface in the management vlan (99 in this case) for redundancy as well.

The cluster is not yet formed - so have yet to do testing - since I only want to do this once, thought I would look for input in what works best with my use case. I know the cluster network needs it's own interface in a separate vlan dedicated to that traffic, is there any benefit to bonding the corosync links for added stability, or is just multiple links (management vlan and corosync vlan) good enough?

Open to ideas.
 
Here is the network config that worked best for my use case. Sending the management traffic over the bond results in the best performance, and sending the corosync traffic over enp3s0 is running pretty solid.

For reference the hosts are Lenovo ThinkStation P3 ultra's with Intel X710 2 port add-in NICs.

Code:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

auto enp1s0f0np0
iface enp1s0f0np0 inet manual

auto enp1s0f1np1
iface enp1s0f1np1 inet manual

iface enp3s0 inet manual
#2.5Gb

iface enp4s0 inet manual
#1Gb

auto bond0
iface bond0 inet manual
        bond-slaves enp1s0f0np0 enp1s0f1np1
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer2+3

auto vmbr1
iface vmbr1 inet manual
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#10G-Trunk

auto vmbr1.45
iface vmbr1.45 inet manual

auto vmbr1.99
iface vmbr1.99 inet static
        address 192.168.0.25/24
        gateway 192.168.0.1
#Management

auto vlan45
iface vlan45 inet static
        address 192.168.45.11/24
        vlan-raw-device enp3s0
#corosync

source /etc/network/interfaces.d/*
 
I suggest you consider setting up redundant links for your cluster traffic, so if you have a NIC port fail on you, you're cluster can continue working well.

enp4s0 can be that second link in your case.
 
  • Like
Reactions: dj423
I might as well - perhaps a bit overkill - should in essence give it three separate NIC's to traverse since the cluster already has links on both the bond (link0) and the management interface (link1) on separate NICs.links.PNG
 
I suggest you consider setting up redundant links for your cluster traffic, so if you have a NIC port fail on you, you're cluster can continue working well.

enp4s0 can be that second link in your case.
I think I will stick with the one dedicated interface for corosync. Per another post I found: " We strongly recommend using an unbonded, separate interface for corosync, since it is known to have issues with bonded interfaces." Could be issues when bonding the corosync links, so I will stick with how I have it setup with a dedicated NIC for the cluster traffic, and the bonded trunk port for all my VM traffic.