Bonded vmbr but only 1 port works at a time

Gavcol

Member
Feb 12, 2022
26
1
8
48
Hi,

Hope you can help me solve why only one ports is active at a time in a bonded virtual bridged port.

I have PVE on a bare metal unit with 6 intel I211 ports.
In a bonded pair, it seems like only one port is active at a time.
The config in the interfaces file is for LACP mode which, from what I thought, is supposed to provide load balancing and fault tolerance and so should allow both ports to operate together as one virtual port.

I've configured the network as ;
Port 1 - Management with Linux bridge vmbr0
Port 2 - Wan with Linux bridge vmbr1
Ports 3&4 - Bonded LACP(802.3ad) as vmbr34
Ports 5&6 - unused for now but have plans for them once I get the bonded ports 3&4 working.

All are active with Autostart and the Bridge's are all vlan aware.

What's strange though is the bonded ports only seem to work for the first device connected to one of the pair, regardless of which one is connected first.
e.g. for bonded ports 3&4 as vmbr34;
- I can connect an access point to port 3 then connecting anything (like a managed switch) to port 4 doesn't work (unidentified network and no ip assigned from DHCP)
or
- I can then take the managed switch from port 4 and plug it into port 3 and it will work, then connecting the AP to port 4 doesn't work.
or
- Similarly starting with the AP in port 4 on it's own will work but connecting the managed switch into port 3 afterwards doesn't work.

This seems to act more like active-backup rather than LACP !


interfaces file
(not the full interfaces file - it's a subset for the relevant ports):


auto lo
iface lo inet loopback

auto enp3s0
iface enp3s0 inet manual
#Port3

auto enp4s0
iface enp4s0 inet manual
#Port4

auto bond34
iface bond34 inet manual
bond-slaves enp3s0 enp4s0
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2+3
#Bond Ports 3-4

auto vmbr34
iface vmbr34 inet manual
bridge-ports bond34
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4090
#Bridge Port34 Bond

Also, I have a pfsense vm with vlans and dhcp configured in pfsense which is working fine and is serving dhcp IPs to whichever is the first device to connect to the bonded pair. I just can't seem to get the 2nd port active at the same time.

It might be better to use balance-rr or balance-alb but I'm not sure if they also work with bond-xmit-hash-policy layer2+3

I've been tearing my hair out with this and would appreciate any help.
Thanks in advance
Gav
 
Last edited:
Is that while sending or while receiving data or both? Have you configured the switch ports to support LACP?
Hi @gurubert
Not sure what you mean by sending/receiving. I haven't even gotten to that stage yet as I cannot get an IP from the DHCP because the 2nd connect port is inactive until the first connected port is disconnected.
I haven't configured the switch ports for DHCP because I didn't think it was needed when the bridge is a virtual port which I didn't think pfsense or the switch would be aware of or affected by LACP behind the virtual port.
 
Last edited:
Are the 2 ports on your managed switch physically connected to enp3so and enp4s0 also set up as an Lacp bond within the managed switch software? Lacp requires both sides of the connection to be configured separately for this to work.
 
  • Like
Reactions: gurubert
Hi @gurubert & @vesalius (and hopefully anyone else that may help ! ;) )

sorry for the long layoff. I had some work and personal stuff that got in the way since last August but now I'm back to trying to resolve this.

So, in my current config I have bonded NICs 3 & 4 with LACP
  • IEEE 802.3ad Dynamic link aggregation (802.3ad)(LACP): Creates aggregation groups that share the same speed and duplex settings. Utilizes all slave network interfaces in the active aggregator group according to the 802.3ad specification.

Then used a linux bridge for them to be available as a single port in a pfSense vm. (see proxmox config images below)
For the original issue above of only a single port being active at a time (i.e. the first connected device to either port gets the goods and the 2nd device connected has to wait for failure or disconnection of the other port.) and so the bonded ports seems to act more like Active-Backup;
  • Active-backup (active-backup): Only one NIC slave in the bond is active. A different slave becomes active if, and only if, the active slave fails. The single logical bonded interface’s MAC address is externally visible on only one NIC (port) to avoid distortion in the network switch. This mode provides fault tolerance.
The issue may also relate to LACP and a question I have is;
- In this case, does pfSense even need to be aware of LACP ?
-that is, doesn't pfSense only see a single port and not have to have LACP configured on the pfsense interface ?
- If LACP does need to be configured in pfSense for the bonded linux bridge ports, could this be a part of the problem where a single port is only ever active at any one time ?

If that is the problem then I don't want to mess around with the config in pfSense so I'd rather change the bond mode in proxmox.
I'm not necessarily looking for increased speed but more to have both ports active at the same time, acting as a single port, sharing traffic to allow for increased traffic/load balancing and also a level of redundancy if one port fails.

Looking at https://pve.proxmox.com/wiki/Network_Configuration &
https://forum.proxmox.com/threads/failover-and-high-availability-network-bonding.80848
I think round robin or adaptive load balancing might be better suited.
  • Round-robin (balance-rr): Transmit network packets in sequential order from the first available network interface (NIC) slave through the last. This mode provides load balancing and fault tolerance.
  • Adaptive load balancing (balance-alb): Includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic, and does not require any special network switch support. The receive load balancing is achieved by ARP negotiation. The bonding driver intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address with the unique hardware address of one of the NIC slaves in the single logical bonded interface such that different network-peers use different MAC addresses for their network packet traffic.

So, the next question is, can I just easily change the bond mode on the linux bond to one of the other modes or will this kill the linux bridge and ultimately the current pfsense(vm) interface ?

Thanks in advance,
Gav
1678804864261.png
1678762430506.png
 
Last edited:
LACP will not work for you in this setup, one Proxmox host LACP port plugged into an AP and the other plugged into a managed switch. LACP would require the 2 ports from the proxmox host to both be plugged into the managed switch AND the managed switch to be capable/setup with those receiving ports in a LACP bond. LACP is more about redundancy between 2 devices, not about connectivity between 3 devices.

1. For the topology you have chosen a Broadcast bond would be the best option for you IMO.
Code:
auto bond<No>
iface bond<No> inet manual
    slaves <Nic1> <Nic2>
    bond_miimon 100
    bond_mode broadcast
2. Basically all the other bond modes, including LACP, would require you to change your network topology:
  • plug the AP into the managed switch
  • Plug both enp3s0 and enp4s0 into the managed switch
  • If the managed switch supports LACP, setup a LACP bond on the receiving ports in the managed switch software.
  • If the managed switch does not support LACP then use one of the other bond configurations as described in the docs.
3. What does the pfsense VM use from the proxmox host for network access ... vmbr34 or directly bond34? From the picture provided, I see vmbr34, but that bridge does not list bond34 under the Ports/Slaves column as would be assumed for your initial /etc/network/interfaces in post 1.
4. Yes you can change the Bond Mode of bond34 and it will not kill any bridge (vmbr<No>) that uses bond34 as Ports/Slaves.
 
Last edited:
Thanks @vesalius

I have a Ubiquiti USW switch but before I can even connect that, I'm still having trouble attaching anything to both port 3 & 4 at the same time, even without a switch and just using two PCs. (though I can run the switch on either port 3 or 4 in isolation without any issues).

1) I'll try the broadcast bond, though I don't know anything about it yet. Have to do some research.
2) Not sure about AP ? I have proxmox with a pfsense VM on a 6 port bare metal box. To start with I just wanted the proxmox bond and bridge to allow pfsense to see the 2 ports as 1 port but have both ports work in tandem with any device connected to expand the throughput to parallele 1Gb. There is LAGG options in pfsense but I'm too far down the config now. I'd have to unwind the bonded ports, interfaces, vlans dhcp servers etc to make the port 3/4 interfaces available in pfsense just to configure the LAGG before reconfigging interfaces, vlans, dhcp all over again..
Ideally and eventually once I get the 2 ports to be active at the same time then I can reintroduce the USW switch and aggregate 2 of its ports to connect to the port 3 and 4. As I type it out, it sounds like I might be doing this backwards and may have to just go back and change the interfaces :(

3) vmbr34 uses bond34 (I fixed this in the previous image).....I got a bit over zealous when masking the other info on the screen grab and accidentally blanked the vmbr34 ports/slaves :(

4) In the updated image you can see I changed the bond mode to balance-alb. This is worse than LACP in that previously with LACP when I introduced a 2nd connection to the unused port then it would jsut remain inactive but with the balance-alb mode it now jsut brings the whole network down when I add the 2nd cable.
I have a few more tests with this, maybe instead of just changing the mode maybe I have to delete the bond alltogether then recreate it and re-add it to vmbr34.

I'll also try your broadcast bond when I know more about it.

Thanks again
Gav
 
again basically all of those bond setup are designed to connect increase redundancy between 2 devices not allow for simultaneous connections between 3 devices (I.e. Proxmox directly connected to 2 different PCs).

my use of AP, come directly from your first post where you intruduced it:

- I can then take the managed switch from port 4 and plug it into port 3 and it will work, then connecting the AP to port 4 doesn't work.
or
- Similarly starting with the AP in port 4 on it's own will work but connecting the managed switch into port 3 afterwards doesn't work.
i am still unclear what your desired or current network topology is. Forget the pfSense VM for now. What exactly is your desired connection from the 2 Proxmox host ports? both into the the USW switch? Have the 2 usw ports used been configured as a bond in the USW software?
 
Last edited:
  • Like
Reactions: Gavcol
I have a Ubiquiti USW switch but before I can even connect that, I'm still having trouble attaching anything to both port 3 & 4 at the same time, even without a switch and just using two PCs. (though I can run the switch on either port 3 or 4 in isolation without any issues).
That's intended. When one port is connected to another device, the other port expects to be connected to another port of that very same device. Everything else must fail.
2) Not sure about AP ? I have proxmox with a pfsense VM on a 6 port bare metal box. To start with I just wanted the proxmox bond and bridge to allow pfsense to see the 2 ports as 1 port but have both ports work in tandem with any device connected to expand the throughput to parallele 1Gb.
I think you want just a bridge, not a bond at all.
Ideally and eventually once I get the 2 ports to be active at the same time then I can reintroduce the USW switch and aggregate 2 of its ports to connect to the port 3 and 4. As I type it out, it sounds like I might be doing this backwards and may have to just go back and change the interfaces :(
Does your switch support LACP at all? If not, there's no way you could use it with any other kind of bond because the network will be confused and start arp-flapping between the two ports ...
4) In the updated image you can see I changed the bond mode to balance-alb. This is worse than LACP in that previously with LACP when I introduced a 2nd connection to the unused port then it would jsut remain inactive but with the balance-alb mode it now jsut brings the whole network down when I add the 2nd cable.
... yeah. That.

I have a bond of 3 interfaces running between two machines (pve and pbs). Works with balance-rr. Every other balancer doesn't increase throughput between just two IPs.
 
again basically all of those bond setup are designed to connect increase redundancy between 2 devices not allow for simultaneous connections between 3 devices (I.e. Proxmox directly connected to 2 different PCs).

my use of AP, come directly from your first post where you intruduced it:


i am still unclear what your desired or current network topology is. Forget the pfSense VM for now. What exactly is your desired connection from the 2 Proxmox host ports? both into the the USW switch? Have the 2 usw ports used been configured as a bond in the USW software?

Correct, both into the USW switch.
I haven't yet configured the aggregate ports on the switch yet. I started to configure it but there was a notification to configure the other device first (even though it's actually an uplink device, not downlink)

1679300473475.png
 
That's intended. When one port is connected to another device, the other port expects to be connected to another port of that very same device. Everything else must fail.

I think you want just a bridge, not a bond at all.

Does your switch support LACP at all? If not, there's no way you could use it with any other kind of bond because the network will be confused and start arp-flapping between the two ports ...

... yeah. That.

I have a bond of 3 interfaces running between two machines (pve and pbs). Works with balance-rr. Every other balancer doesn't increase throughput between just two IPs.

The other port expects to be connected to another port of that very same device. Everything else must fail.
ah, I didn't realise that

I think you want just a bridge, not a bond at all.
Ok, but I'm not sure how that will work.......

Does your switch support LACP at all?
..........Yes it does,
https://help.ui.com/hc/en-us/articles/360007279753-UniFi-Network-Link-Aggregation-LAG-FAQs
All UniFi Switches, except for USW-Flex and USW-Flex-Mini support LAG.

What are the limitations of LAG?​

  • Ports must be sequential in number.
  • Static LAG configurations are not supported, only LACP (802.3ad).
  • Multi-chassis Link Aggregation Group (MLAG) is not supported.
I wasn't looking for 2 x speed but originally to allow for 2 x throughput/load and have the extra benefit of redundancy.
At this point I may have to rethink the aggregation/bond in proxmox, split the ports back out to singles and just keep the 2nd port as a backup to physically switch over to if the first port ever fails.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!