[SOLVED] Traffic separation and nics - how to plan the network?

rubenhauser · Jan 16, 2025

Hi, we're building an 4 node PVE cluster with NVME Ceph storage.

Available Nics: We have several nics available:

Nic1: 2 x 10G + 2 x 1G
Nic2: 2 x 10G
Nic3: 2 x 100G

Traffic/Networks: Now we need (I think) the following traffic separations:

PVE Management
PVE Cluster & Corosync
Ceph (public) traffic > 2 x 10G Bond1 (MLAG)
Public VM & Migration traffic > 2 x 10G Bond2 (MLAG)
Ceph (internal) cluster traffic > 2 x 100G Bond3 (MLAG)

Question now: What to do best with the remaining 2 x 1G ports?

Bond/Bridge: All 10/100G ports should use Linux Bond to use MLAG. The Public VM & Migration traffic should also use a Linux bridge. Whats with the rest? No Bridge?

Thanks!

gurubert · Jan 17, 2025

My 2c:

Do not use the 100G network for the Ceph cluster network but instead for the public network. No need for a separate cluster network here.

Use 2x 10G for Proxmox management and VM migration.

Use the other 2x 10G for VM guest traffic.

Use the remaining 1G ports for additional corossnc rings.

rubenhauser · Jan 17, 2025

@gurubert Thanks!

Re bonds vs bridges: for the guest traffic a bridge is necessary. But for Ceph, and management? I guess that bridges need some overhead?

And re Corosync: Two separate rings, or one MLAC bond?

wolfspyre · Jan 18, 2025

Not intending to derail the thread; but rather to help illustrate the underlying contextual philosophies driving the advise:

1) Why four nodes?
I would expect this to be a problem longer term.
I'd think you'd want 5.
(so as to have a min-required-alive count of 3. This allows for 1 node to be actively undergoing maintenance, and you could still endure a node panic without killing the cluster) My reasoning here comes from personal experience:
- I had a 4node setup. with min-required of 3...
- I had taken one node down to change out procs.
- I accidentally knocked the power cords out of one of the remaining active nodes which killed the cluster....
Since then, I've maintained the mindset that my clusters must be able to sustain 2 down nodes.... YMMV; but that's my 'why' here

2) your NIC selection is a lil awkward:

nic1.10g.1
nic1.10g.2
nic1.1g.1
nic1.1g.2
nic2.10g.1
nic2.10g.2
nic3.100g.1
nic3.100g.2

Would something like this be better?

nic1.10g.1
nic1.10g.2
nic1.1g.1
nic1.1g.2
nic2.10g.1
nic2.10g.2
nic2.1g.1
nic2.1g.2
nic3.100g.1
bifurcated to 2 vifs presented to the os as:
- nic3.100g.A
- nic3.100g.B
nic4.100g.1
bifurcated to 2 vifs presented to the os as:
- nic4.100g.A
- nic4.100g.B

or, assuming nic1 is a special attachment jobbie:

nic1.10g.1
nic1.10g.2
nic1.1g.1
nic1.1g.2
nic2.1g.1
nic2.1g.2
nic3.10g.1
nic3.10g.2
nic4.100g.1
bifurcated to 2 vifs presented to the os as:
- nic4.100g.A
- nic4.100g.B
nic5.100g.1
bifurcated to 2 vifs presented to the os as:
- nic5.100g.A
- nic5.100g.B

Then arranging bonds like:

Bond0	Bond1	Bond10	Bond101	Bond102
1.1g1	2.1g.2	1.10g.1	3.100g.A	4.100g.A
2.1g.1	1.1g.2	2.10g.1	4.100g.B	3.100g.B
		1.10g.2
		2.10g.2

Decidate bond0 to corosync.
split b1 between corosync backup and "internal VM heartbeat interfaces"* for internal-vm heartbeat bridges ... ideally connected to two different switches... on diverse power.... without any other connectivity ...

Corosync SHOULD use b0 unless something goes wobbly, then it may use b1 ....
B1 is otherwise relatively unused from pxm's perspective and so can be leveraged for essential low-latency requirement uses, that have, generally speaking, fairly low throughput demand)
if an infra-scoped VM cohort requires a heartbeat network with another VM... use a bridge atop the second 1g corosync net in conjuction with their normally provisioned network for a fallback hb for them

I wouldn't give these interfaces to non infra-grade vms.... but, rather for things like firewall VM pairs that need a cross-connected heartbeat interfaces that are "isolated"....

I'd use b10 for management/migration.

In my experience thus far, 4x10g nics for management/migration is generally okay.... Although migration of very active large memory footprints is problematic..

I'd use b101 for ceph
I'd use b102 + bridge for vmpub.
I THINK?!

My current topology is 6x dell r730xd's
Each node has 4x10g bifurcated to 8x10g as well as a secondary card with 2x1g nics for corosync. failback to the management vlan

so:

Physical Link	OS Nic	P/S/F
1.10G.0	eno1	enp1s0f0
1.10G.1	eno2	enp1s0f1
1.10G.2	eno3	enp1s0f2
1.10G.3	eno4	enp1s0f3
1.10G.0	eno5	enp1s0f4
1.10G.1	eno6	enp1s0f5
1.10G.2	eno7	enp1s0f6
1.10G.3	eno8	enp1s0f7
2.1g.0	enp129s0f0	enp129s0f0
2.1g.1	enp129s0f1	enp129s0f0

Code:

enp129s0f0:  mtu 9000 qdisc mq master bond2
enp129s0f1:  mtu 9000 qdisc mq master bond2
 `----> bond2: mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 13888
     `---> bond2.43@bond2: mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 13888  

eno1 ( enp1s0f0 ): mtu 9198 qdisc mq master bond0 state UP mode DEFAULT group default qlen 13888
eno2 ( enp1s0f1 ): mtu 9198 qdisc mq master bond0 state UP mode DEFAULT group default qlen 13888
eno3 ( enp1s0f2 ): mtu 9198 qdisc mq master bond0 state UP mode DEFAULT group default qlen 13888
eno4 ( enp1s0f3 ): mtu 9198 qdisc mq master bond0 state UP mode DEFAULT group default qlen 13888
 `----> bond0: mtu 9100 qdisc noqueue state UP mode DEFAULT group default qlen 13888
    |---> bond0.10@bond0:   mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 13888
    |---> bond0.198@bond0: mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 13888
     `---> bond0.199@bond0: mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 13888

eno5 ( enp1s0f4 ): mtu 9198 qdisc mq master bond1 state UP mode DEFAULT group default qlen 13888
eno6 ( enp1s0f5 ): mtu 9198 qdisc mq master bond1 state UP mode DEFAULT group default qlen 13888
eno7 ( enp1s0f6 ): mtu 9198 qdisc mq master bond1 state UP mode DEFAULT group default qlen 13888
eno8 ( enp1s0f7 ): mtu 9198 qdisc mq master bond1 state UP mode DEFAULT group default qlen 13888
 `----> bond1: mtu 9100 qdisc noqueue master vmbr0 state UP mode DEFAULT group default qlen 13888
     `---> vmbr0: mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 13888

2.43 --> corosync
0.10 -> management / corosync backup
0.198 -> backup/nfs storage
0.199 -> ceph

1 -> vm pubtraffic ....

I'm not completely happy with this topology... but it mostly works...

hope my opinions and thoughts are helpful .... disregard if they're not.

rubenhauser · Jan 19, 2025

Thanks everyone for the input. I’ll start to configuration on Monday.

gurubert · Jan 21, 2025

wolfspyre said:
1) Why four nodes?
I would expect this to be a problem longer term.
I'd think you'd want 5.

That's another important aspect. To establish a good quorum an odd number of nodes should be used.

Search

Search

[SOLVED] Traffic separation and nics - how to plan the network?

rubenhauser

New Member

gurubert

Distinguished Member

rubenhauser

New Member

wolfspyre

Member

rubenhauser

New Member

gurubert

Distinguished Member

We value your privacy