Failover and high availability network bonding

Imad Daou

Renowned Member
Nov 29, 2014
24
3
68
48
California, United States
imaddaou.com
Dear ProxMox,

We have 2 cisco 4948E cisco switches. All proxmox hosts connected to those switches. Proxmox hosts are configured as cluster but with local storage, hence no ceph or gluster is being used. Another word, no hyper-converged setup is being used. So, the switches just serve regular traffic for users. Sometimes, we migrate machines among the hosts.

Based on the below methods, which method of network Bonding the ProxMox Engineer recommend?


  • Balance-rr
    This mode provides load balancing and fault tolerance (failover) features via round-robin policy. Means that it transmits packets in sequential order from the first available slave through the last.
  • Active-Backup
    This mode provides fault tolerance features via active-backup policy. It means that once the bonding ethernet is up, only 1 of the ethernet slaves is active. The other ethernet slave will only become active if and only if the current active slave fails to be up. If you choose this mode, you will notice that the bonding MAC address is externally visible on only one network adapter. This is to avoid confusing the switch.
  • Balance-xor
    This mode provides load balancing and fault tolerance. It transmits based on the selected transmit hash policy. Alternate transmit policies may be selected via the xmit_hash_policy option.
  • Broadcast
    This mode provides fault tolerance only. It transmits everything on all slave ethernet interfaces.
  • 802.3ad - LACP
    This mode provides load balancing and fault tolerance. It creates an aggregation group that shares the same speed and duplex settings. It utilizes all slave ethernet interfaces in the active aggregator, it is based on the 802.3ad specification. To implement this mode, the ethtool must support the base drivers for retrieving the speed and duplex mode of each slave. The switch must also support dynamic link aggregation. Normally, this requires Network Engineer intervention for detailed configuration.
  • Balance-TLB
    This mode provides load balancing capabilities as the name TLB represent transmit load balancing. For this mode, if configuration tlb_dynamic_lb = 1, then the outgoing traffic is distributed according to current load on each slave. If configuration tlb_dynamic_lb = 0 then the load balancing is disabled, yet the load is distributed only using the hasd distribution. For this mode, the ethtool must support the base drivers for retrieving the speed of each slave.
  • Balance-ALB
    This mode provides load balancing capabilities as the name TLB represents adaptive load balancing. Similar to balance-tlb, except that both send and receive traffic are bonded. It receives load balancing by achieving ARP negotiation. The bonding driver intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address with the unique hardware address of one of the slaves in the bond. For this mode, the ethtool must support the base drivers for retreiving the speed of each slave.
Source: https://www.howtoforge.com/tutorial/how-to-configure-high-availability-and-network-bonding-on-linux/

Our cisco switches support 802.3ad LACP
https://www.cisco.com/c/en/us/produ...8e-ethernet-switch/data_sheet_c78-598933.html

Should we configure LACP, if that's what you recommend, would you please help me with the configuration on the Host side and if possible at the cisco side?

For example, at the cisco side, do you believe I can have ether-channel or port grouping across 2 switches or the Ether-channel should be set at one switch?

Our HP servers has 4 ports each. 2 ports will serve External (Public Network) and another 2 ports serving internal network.

What would be your best configuration for either openswitch or GNU/Linux Bridge?

Currently, we have the hosts configured using Active-backup, however, it doesn't provide Active-Active, meaning, if we loose SW1 the main switch, we loose connection to services for few seconds or minutes while the bonding is trying to become active on the second switch that is SW2.

# The following sample is being used for our Firewalls load balancing between 2 internet providers

auto lo
iface lo inet loopback

auto enp3s0f0
iface enp3s0f0 inet manual

auto enp3s0f1
iface enp3s0f1 inet manual

auto enp4s0f0
iface enp4s0f0 inet manual

auto enp4s0f1
iface enp4s0f1 inet manual

auto bond0
iface bond0 inet manual
bond-slaves enp3s0f0 enp3s0f1
bond-miimon 100
bond-mode active-backup
#internal

auto vmbr0
iface vmbr0 inet static
address 10.15.10.10/22
gateway 10.15.10.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
#internal

auto vmbr1
iface vmbr1 inet manual
bridge-ports enp4s0f0
bridge-stp off
bridge-fd 0
#External ISP1

auto vmbr2
iface vmbr2 inet manual
bridge-ports enp4s0f1
bridge-stp off
bridge-fd 0
#External ISP2

Note: none of the Hosts configured with Public IP addresses, VM Firewalls do, and uses both External and Internal Networks.

# The following sample is being used across the Compute hosts, hence machines can be at DMZ or Internal

auto lo
iface lo inet loopback

auto enp3s0f0
iface enp3s0f0 inet manual
#internal network

auto enp3s0f1
iface enp3s0f1 inet manual
#internal network

auto enp4s0f0
iface enp4s0f0 inet manual
#external network

auto enp4s0f1
iface enp4s0f1 inet manual
#external network

auto bond0
iface bond0 inet manual
bond-slaves enp3s0f0 enp3s0f1
bond-miimon 100
bond-mode active-backup
bond-primary enp3s0f0
#internal

auto bond1
iface bond1 inet manual
bond-slaves enp4s0f0 enp4s0f1
bond-miimon 100
bond-mode active-backup
bond-primary enp4s0f0
#external

auto vmbr0
iface vmbr0 inet static
address 10.15.35.10/22
gateway 10.15.32.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
#internal

auto vmbr1
iface vmbr1 inet manual
bridge-ports bond1
bridge-stp off
bridge-fd 0
#external

Your time is highly appreciated; I look for forward for your help to configure Active-Active bonding at your earliest convenience.

Thank you!
 
If your switches support 802.3ad then that is what you should use. Then also make sure you change the hash policy on the bond to layer2+3 which should give you maximum speed over all the bonded interfaces
 
For example, at the cisco side, do you believe I can have ether-channel or port grouping across 2 switches or the Ether-channel should be set at one switch?

if your 2switches are stacked, mlag, or something similar, yes it's possible to create 1ethernet across both switch. (It's the best and more stable method).

if not, you can still create active-backup bond for failover
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!