[SOLVED] Traffic separation and nics - how to plan the network?

Jan 14, 2025
5
0
1
Hi, we're building an 4 node PVE cluster with NVME Ceph storage.

Available Nics: We have several nics available:
  • Nic1: 2 x 10G + 2 x 1G
  • Nic2: 2 x 10G
  • Nic3: 2 x 100G
Traffic/Networks: Now we need (I think) the following traffic separations:
  • PVE Management
  • PVE Cluster & Corosync
  • Ceph (public) traffic > 2 x 10G Bond1 (MLAG)
  • Public VM & Migration traffic > 2 x 10G Bond2 (MLAG)
  • Ceph (internal) cluster traffic > 2 x 100G Bond3 (MLAG)
Question now: What to do best with the remaining 2 x 1G ports?

Bond/Bridge: All 10/100G ports should use Linux Bond to use MLAG. The Public VM & Migration traffic should also use a Linux bridge. Whats with the rest? No Bridge?

Thanks!
 
My 2c:

Do not use the 100G network for the Ceph cluster network but instead for the public network. No need for a separate cluster network here.

Use 2x 10G for Proxmox management and VM migration.

Use the other 2x 10G for VM guest traffic.

Use the remaining 1G ports for additional corossnc rings.
 
Not intending to derail the thread; but rather to help illustrate the underlying contextual philosophies driving the advise:

1) Why four nodes?
I would expect this to be a problem longer term.
I'd think you'd want 5.
(so as to have a min-required-alive count of 3. This allows for 1 node to be actively undergoing maintenance, and you could still endure a node panic without killing the cluster) My reasoning here comes from personal experience:
- I had a 4node setup. with min-required of 3...
- I had taken one node down to change out procs.
- I accidentally knocked the power cords out of one of the remaining active nodes which killed the cluster....
Since then, I've maintained the mindset that my clusters must be able to sustain 2 down nodes.... YMMV; but that's my 'why' here

2) your NIC selection is a lil awkward:
  1. nic1.10g.1
    nic1.10g.2
    nic1.1g.1
    nic1.1g.2

  2. nic2.10g.1
    nic2.10g.2

  3. nic3.100g.1
    nic3.100g.2
Would something like this be better?
  1. nic1.10g.1
    nic1.10g.2
    nic1.1g.1
    nic1.1g.2

  2. nic2.10g.1
    nic2.10g.2
    nic2.1g.1
    nic2.1g.2

  3. nic3.100g.1
    bifurcated to 2 vifs presented to the os as:
    - nic3.100g.A
    - nic3.100g.B
  4. nic4.100g.1
    bifurcated to 2 vifs presented to the os as:
    - nic4.100g.A
    - nic4.100g.B
or, assuming nic1 is a special attachment jobbie:
  1. nic1.10g.1
    nic1.10g.2
    nic1.1g.1
    nic1.1g.2
  2. nic2.1g.1
    nic2.1g.2
  3. nic3.10g.1
    nic3.10g.2
  4. nic4.100g.1
    bifurcated to 2 vifs presented to the os as:
    - nic4.100g.A
    - nic4.100g.B
  5. nic5.100g.1
    bifurcated to 2 vifs presented to the os as:
    - nic5.100g.A
    - nic5.100g.B

Then arranging bonds like:
Bond0Bond1Bond10Bond101Bond102
1.1g12.1g.21.10g.13.100g.A4.100g.A
2.1g.11.1g.22.10g.14.100g.B3.100g.B
1.10g.2
2.10g.2

Decidate bond0 to corosync.
split b1 between corosync backup and "internal VM heartbeat interfaces"* for internal-vm heartbeat bridges ... ideally connected to two different switches... on diverse power.... without any other connectivity ...

Corosync SHOULD use b0 unless something goes wobbly, then it may use b1 ....
B1 is otherwise relatively unused from pxm's perspective and so can be leveraged for essential low-latency requirement uses, that have, generally speaking, fairly low throughput demand)
if an infra-scoped VM cohort requires a heartbeat network with another VM... use a bridge atop the second 1g corosync net in conjuction with their normally provisioned network for a fallback hb for them

I wouldn't give these interfaces to non infra-grade vms.... but, rather for things like firewall VM pairs that need a cross-connected heartbeat interfaces that are "isolated"....

I'd use b10 for management/migration.

In my experience thus far, 4x10g nics for management/migration is generally okay.... Although migration of very active large memory footprints is problematic..

I'd use b101 for ceph
I'd use b102 + bridge for vmpub.
I THINK?!


My current topology is 6x dell r730xd's
Each node has 4x10g bifurcated to 8x10g as well as a secondary card with 2x1g nics for corosync. failback to the management vlan

so:
Physical LinkOS NicP/S/F
1.10G.0eno1enp1s0f0
1.10G.1eno2enp1s0f1
1.10G.2eno3enp1s0f2
1.10G.3eno4 enp1s0f3
1.10G.0eno5enp1s0f4
1.10G.1eno6enp1s0f5
1.10G.2eno7enp1s0f6
1.10G.3eno8enp1s0f7
2.1g.0enp129s0f0enp129s0f0
2.1g.1enp129s0f1enp129s0f0


Code:
enp129s0f0:  mtu 9000 qdisc mq master bond2
enp129s0f1:  mtu 9000 qdisc mq master bond2
 `----> bond2: mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 13888
     `---> bond2.43@bond2: mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 13888  

eno1 ( enp1s0f0 ): mtu 9198 qdisc mq master bond0 state UP mode DEFAULT group default qlen 13888
eno2 ( enp1s0f1 ): mtu 9198 qdisc mq master bond0 state UP mode DEFAULT group default qlen 13888
eno3 ( enp1s0f2 ): mtu 9198 qdisc mq master bond0 state UP mode DEFAULT group default qlen 13888
eno4 ( enp1s0f3 ): mtu 9198 qdisc mq master bond0 state UP mode DEFAULT group default qlen 13888
 `----> bond0: mtu 9100 qdisc noqueue state UP mode DEFAULT group default qlen 13888
    |---> bond0.10@bond0:   mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 13888
    |---> bond0.198@bond0: mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 13888
     `---> bond0.199@bond0: mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 13888

eno5 ( enp1s0f4 ): mtu 9198 qdisc mq master bond1 state UP mode DEFAULT group default qlen 13888
eno6 ( enp1s0f5 ): mtu 9198 qdisc mq master bond1 state UP mode DEFAULT group default qlen 13888
eno7 ( enp1s0f6 ): mtu 9198 qdisc mq master bond1 state UP mode DEFAULT group default qlen 13888
eno8 ( enp1s0f7 ): mtu 9198 qdisc mq master bond1 state UP mode DEFAULT group default qlen 13888
 `----> bond1: mtu 9100 qdisc noqueue master vmbr0 state UP mode DEFAULT group default qlen 13888
     `---> vmbr0: mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 13888

2.43 --> corosync
0.10 -> management / corosync backup
0.198 -> backup/nfs storage
0.199 -> ceph

1 -> vm pubtraffic ....


I'm not completely happy with this topology... but it mostly works...


hope my opinions and thoughts are helpful .... disregard if they're not.
 
  • Like
Reactions: Johannes S