3 NICS 2 bonded with 802.3ad plus failover to second switch?

jacs

New Member
Dec 16, 2024
2
0
1
Hi,

Ive got two Nic's on each of my Proxmox servers aggregate bonded together via 802.3ad to my vmbr and connected to a high speed switch. I have a spare third Nic on each server which I would like to connect to a lower speed switch as a failover. Just incase the high speed switch fails or is rebooted. Is this possible? I don't have the ability to 802.3ad across switches.

Thanks

Chris
 
What kind of failover are you looking for? PVE node management? Corosync? Guest traffic?

There are several different scenarios for failover.

More information about your current network configuration and the failure you are trying to mitigate would be helpful.
 
What kind of failover are you looking for? PVE node management? Corosync? Guest traffic?

There are several different scenarios for failover.

More information about your current network configuration and the failure you are trying to mitigate would be helpful.
Thanks for replying. What I am worried about is having HA switched on with Ceph and the Ceph getting corrupted with a network outage. This was a major issue when I used VMWare where if the vsan got corrupted due to total network outage it was a real pain sorting it out.
 
Thanks for the additional information.

What I am worried about is having HA switched on with Ceph and the Ceph getting corrupted with a network outage

Ceph is remarkably resilient. I have not seen it get corrupted. I am not saying it cannot; I have not seen it.

If you can, get redundant switches for your high-speed links. There are two approaches you can take.

#1 Best

Two switches that support MLAG. You still use 802.3ad on the bond, and the switches would be present as one connection. You get the full bandwidth of each; if one fails, you will still have 50% of your bandwidth. The switches need to support MLAG.

Depending on your switch vendor, it may have different names. Cisco has many names for it. Cisco Nexus calls it Virtual Port Channel (vPC), and Juniper calls it multi-chassis LAG.

#2 Half As Good

You do not need the switches to support this approach. You set the bond in active-backup and connect one link to each switch. You lose half your bandwidth, but if a switch fails, the connection will failover.

Plan for Extra NIC

Since either of the approaches above will allow you to achieve the resiliency you seek, you can still use the extra NIC. Make it your primary or backup Corosync link on a dedicated switch, assuming the NIC is at least 1 Gbps.
 
Hi, I would like to do something similar, I have 3 nodes with 2 10GB ports each, I currently use one port for manager and the other for ceph and VMs, all on a single microtik switch, I would like to add a second switch in case the first one dies or for reboots, but I'm confused how to best configure the network, I was thinking about a bonding and adding both network cards, and then managing ceph and the VMs possibly with VLANs, so as to have a total bandwidth of 20GB, can you recommend an optimal solution thanks, I'll post a diagram of how I would like to do it.
 

Attachments

  • PROXMOX FAINLOV.png
    PROXMOX FAINLOV.png
    22 KB · Views: 6
I currently use one port for manager and the other for ceph and VMs,

You have left out important information. Are your drives NVMe, or are they slower? If you have NVMe, Ceph could choke your other traffic.

In most cases, the host and guests (VMs and LXCs) will not saturate a 10G link, but Ceph can.

If this is a production environment (i.e. not a lab or home lab), 2 x 10G is not ideal. You should add at least a 1G NIC to dedicate to Corosync.

I would like to add a second switch in case the first one dies or for reboots, but I'm confused how to best configure the network

I was thinking about a bonding and adding both network cards, and then managing ceph and the VMs possibly with VLANs, so as to have a total bandwidth of 20GB

If you want redundancy, you would put both ports into a bond using 802.3ad . Mikrotik calls this MLAG (linked here). This will give you 20G links; if a switch dies, you will still have 10G capacity. You can run VLANs over this link.

If your Mikrotik switches do not support the Mikrotik OS with MLAG, you might consider the new SDN Fabrics in PVE 9. I have not used this in production, but in the lab, it seems solid and is built on known technology.

Important: No matter which of these you choose, you should try to get 1G on each host for Corosync.

Note: Generally, asking a new question to a stale post is considered bad form. It would have been better to create a new post and link to the old one.
:cool:
 
You have left out important information. Are your drives NVMe, or are they slower? If you have NVMe, Ceph could choke your other traffic.

In most cases, the host and guests (VMs and LXCs) will not saturate a 10G link, but Ceph can.

If this is a production environment (i.e. not a lab or home lab), 2 x 10G is not ideal. You should add at least a 1G NIC to dedicate to Corosync.





If you want redundancy, you would put both ports into a bond using 802.3ad . Mikrotik calls this MLAG (linked here). This will give you 20G links; if a switch dies, you will still have 10G capacity. You can run VLANs over this link.

If your Mikrotik switches do not support the Mikrotik OS with MLAG, you might consider the new SDN Fabrics in PVE 9. I have not used this in production, but in the lab, it seems solid and is built on known technology.

Important: No matter which of these you choose, you should try to get 1G on each host for Corosync.

Note: Generally, asking a new question to a stale post is considered bad form. It would have been better to create a new post and link to the old one.
:cool:
Note: Generally, asking a new question to a stale post is considered bad form. It would have been better to create a new post and link to the old one.

Sorry, I didn't know about this rule.

In any case, the servers also have two integrated 1G ether ports, which I could dedicate to Corosync.
And then I wonder exactly what Corosync is for?
 
I noticed in my current configuration that if I do a migration the traffic goes through the port I dedicated for the cluster, and I use the other port for ceph and the VMs, how should I do to dedicate the 1G port only for Corosync? But then I wonder if if I use the 1G port for Corosync then it won't saturate when I do the migrations?
 
my current configuration that if I do a migration the traffic goes through the port I dedicated for the cluster

There are different types of "cluster" traffic. By default, it all goes over the host network. Corosync is not migration traffic; the Corosync traffic should be moved to its own network. You can leave the migration traffic on the host network to take advantage of the 10G link.

Setup your Corosync interfaces with their own IP addresses and when you create the cluster you can specify these interfaces to use with Corosync. Do not put any other traffic on those interfaces.
 
  • Like
Reactions: Johannes S
  • Like
Reactions: Johannes S
when I create a cluster it lets me choose the name and the cluster name, but I don't see any references to which network to dedicate for Corosync, how should I proceed to dedicate the 1G port only for Corosync?