[SOLVED] Getting LACP to work on Proxmox 8.1

mrzippy

New Member
Apr 4, 2024
4
0
1
Hello All,

I'm trying to get LACP working on a bridge interface.

Some background:

The bridge interface comprises 1 bond: 2 10g SFP connections to a Cisco 2960-X switch. Those two ports are configured as trunks in a channel-group, LACP.
Port config from switch here:

description ProxMox Trunk r330
switchport trunk allowed vlan 1,10,20,30,66,99
switchport mode trunk
channel-group 1 mode active

I have the SFP+ connected to a card on my server, enp1s0f0, and enp1s0f1

Below is my current config from /etc/network/interfaces:


auto lo
iface lo inet loopback

auto eno1
iface eno1 inet manual
mtu 9000

auto eno2
iface eno2 inet manual
mtu 9000

auto enp1s0f0
iface enp1s0f0 inet manual
mtu 9000
hwaddress ether d0:bf:9c:02:98:98

auto enp1s0f1
iface enp1s0f1 inet manual
mtu 9000
hwaddress ether d0:bf:9c:02:98:9c

auto enp2s0f0
iface enp2s0f0 inet manual
mtu 9000
hwaddress ether a0:36:9f:4b:bc:d4

auto enp2s0f1
iface enp2s0f1 inet manual
mtu 9000
hwaddress ether a0:36:9f:4b:bc:d6

auto bond0
iface bond0 inet manual
bond-slaves enp1s0f0 enp1s0f1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2+3
mtu 9000
hwaddress ether 00:00:00:02:98:9c
#10GB Trunks

auto bond1
iface bond1 inet manual
bond-slaves enp2s0f0 enp2s0f1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
hwaddress ether 00:00:00:4b:dc:b5
#HA 10GB Copper

auto vmbr0
iface vmbr0 inet static
address 10.99.99.200/24
gateway 10.99.99.1
bridge-ports eno1 eno2
bridge-stp off
bridge-fd 0
mtu 9000
#MGMT interface

auto vmbr1
iface vmbr1 inet manual
bridge-ports bond0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
mtu 9000
hwaddress ether 00:00:00:02:98:99
#VM Networks

auto vmbr2
iface vmbr2 inet static
address 10.99.100.4/24
bridge-ports bond1
bridge-stp off
bridge-fd 0
mtu 9000
#HA Cluster


If I shut down one of the interfaces in Bond0, all things work fine, my vms can be assigned to different vlans, all is well. But if I enable the second link in the group, then I get a loop error showing on the console and I lose connectivity to some hosts etc...

Any suggestions on where next to go?
 
I should note: I put in "dummy" hwaddress ether on vmbr1 and Bond0... but the hwaddress on the various interfaces match their actual MAC addressses.
 
So... I stumbled across a post on Reddit, and i think the solution has been found: On the bond interface members.. i needed to add a line to them that declared their bond-master. So, using the config posted above:

auto enp1s0f0
iface enp1s0f0 inet manual
mtu 9000
bond-master bond0
hwaddress ether d0:bf:9c:02:98:98

auto enp1s0f1
iface enp1s0f1 inet manual
mtu 9000
bond-master bond0
hwaddress ether d0:bf:9c:02:98:9c


And then restarting the network stack (note: an easy way to do it without a full system reboot is to edit /etc/interfaces/network, add the relevant lines, save. Then go into the GUI, and add / edit a comment on any interface. Apply Configuration, and it will reload).

This has my LAG running correctly, speeds showing as the aggregated link speed for the vmbr interface. Don't know if I needed to leave the custom HW addresses in for the interfaces, but it does no harm at this juncture.

This may be a bug / feature request to put in, where this bond-master line can be added via GUI when creating LACP (another line in the popup) or automatically so when a bond interface is created, and slaves added... it then goes back and adds the bond-master etc to the interfaces. Either way, if this saves someone else some frustration, I'm glad.

Cheers.
 
The bond-master is not required to get LACP working. The syntax bond-slaves <primary-interface> <secondary-interface> is enough. Bond-master is old way of doing things. Adding it to the GUI would be going backward. If you remove or comment out the bond-master then reboot, I believe you will see everything working without it.
The hwaddresses also are not required unless your requirement calls for it.
 
That's a nice thought.. but it did not work without out the bond-master statements. I tried without and even after reboots, would get a loop message on the console. It may be old.. but it's working and stable. Following the wiki / reccommended setup did not allow the LACP group to work.
 
I can confirm this problem too. Running Proxmox 8.2.7 with kernel 6.8.12-2-pve. The LACP doesn't work at all, except if you declare the 'bond-master bond0' on each interface of the bond in the file /etc/network/interfaces. I tried many tests with many different configurations. It is a bug in Proxmox. Please fix it in the next release.
 
I can confirm this problem too. Running Proxmox 8.2.7 with kernel 6.8.12-2-pve. The LACP doesn't work at all, except if you declare the 'bond-master bond0' on each interface of the bond in the file /etc/network/interfaces. I tried many tests with many different configurations. It is a bug in Proxmox. Please fix it in the next release.
I truly understand it is not working for you. But I do not believe it is a bug that need fixing. LACP bonding is working as it should in both Proxmox 8.2.x and 8.3.x. Even nested bonding where you bond LACP inside a active-backup or balance-tlb bond, all with vlan works without error.
I apologize if I am hugely overlooking something here in the post. Following is a snippet of a bond config that is taken from a live environment running several dozen Proxmox node:
Code:
auto bond1
iface bond1 inet manual
    bond-slaves enp4s0f0 enp4s0f1
    bond-mode 802.3ad
    bond-miimon 100
    bond-lacp-rate fast
    bond-xmit-hash-policy layer3+4
    mtu 8192
    
auto vmbr1
iface vmbr1 inet manual
    bridge_ports bond1
    bridge_stp off
    bridge_fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
    
allow-bond1 enp4s0f0
iface enp4s0f0 inet manual
    mtu 8192

allow-bond1 enp4s0f1
iface enp4s0f1 inet manual
    mtu 8192
 
I truly understand it is not working for you. But I do not believe it is a bug that need fixing. LACP bonding is working as it should in both Proxmox 8.2.x and 8.3.x. Even nested bonding where you bond LACP inside a active-backup or balance-tlb bond, all with vlan works without error.
I apologize if I am hugely overlooking something here in the post. Following is a snippet of a bond config that is taken from a live environment running several dozen Proxmox node:
Code:
auto bond1
iface bond1 inet manual
    bond-slaves enp4s0f0 enp4s0f1
    bond-mode 802.3ad
    bond-miimon 100
    bond-lacp-rate fast
    bond-xmit-hash-policy layer3+4
    mtu 8192
  
auto vmbr1
iface vmbr1 inet manual
    bridge_ports bond1
    bridge_stp off
    bridge_fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
  
allow-bond1 enp4s0f0
iface enp4s0f0 inet manual
    mtu 8192

allow-bond1 enp4s0f1
iface enp4s0f1 inet manual
    mtu 8192

And I'm telling you that I found this thread because we have exactly the same problem as the other user above. I'm using only professional equipment and this is from a production server with Riverbed and Intel cards. When a problem occurs in 2 different cases, totally random to each other, then it stops being an individual case anymore, it is a general bug. Please I kindly ask you to take a look on it for the next release and fix it.
 
Last edited:
  • Like
Reactions: bgInner
I’d confirm the presence of the problem: the LACP doesn't work on my Proxmox Backup Server 3.3.3

It runs on Supermicro X10DRL-LN4. Motherboard has 4 internal NICs

Output of lspci -k

111.jpg

By the way you can see the subsystem was defined improperly, as X10DRW-i

I tried to get LACP working on bond applying all recommendations found in this thread but didn’t manage to do this in any way.

At the same time I have proxmox-ve: 8.3.0 (running kernel: 6.8.12-7-pve) on SuperMicro X12DPU-6 with discrete 4-port NIC based on Intel I350-T4. I set up bond LACP on 2 ports of latter and it works without any problems

Output of lspci -k

222.jpg

Comparing two screenshots you can see that in both cases the same driver is used. But in the first case (X10DRL-LN4) internal NICs are used and subsystem was defined improperly. In the second case discrete ones and bond LACP works. Maybe this will help to find out where is the problem
 
Last edited: