LACP bonding stopped working after 5.2 upgrade

M-SK

Member
Oct 11, 2016
46
4
13
52
Hello,

We have an issue on one of our servers after 5.2 upgrade.
All of servers have active LACP bonding interfaces.
One of our servers won't work with LACP after upgrade (others do). It is in fact the only one using old Broadcom dual-nic adapter:

Code:
af:00.0 Ethernet controller: Broadcom Limited NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
af:00.1 Ethernet controller: Broadcom Limited NetXtreme II BCM5709 Gigabit Ethernet (rev 20)

lspci -nks af:00.0
af:00.0 0200: 14e4:1639 (rev 20)
        Subsystem: 14e4:0907
        Kernel driver in use: bnx2
        Kernel modules: bnx2

Anyway, the bond is shown as up, but there's no traffic (MII status is down on both slaves), and the kern log shows no active 802.3ad partners available.
This is on kernel 4.15.
If I boot to 4.13, LACP works without hitch.

Both kernels show same modules loaded and same kernel driver.

Thanks in advance,
Marko
 
Nope, sorry, same as before.
I've installed 4.15.17-3-pve from your link above, updated grub and rebooted, but it still won't work.
 
anything relevant to the nic in dmesg-output after a reboot? (search for bnx2) - if possible please post the relevant lines
 
This is the output (btw, it's exactly the same as output from 5.13 kernel):
Code:
Jun  6 09:21:08 xxxx kernel: [    1.731906] bnx2 0000:af:00.0 enp175s0f0: renamed from eth0
Jun  6 09:21:08 xxxx kernel: [    1.765802] bnx2 0000:af:00.1 enp175s0f1: renamed from eth1
Jun  6 09:21:08 xxxx kernel: [    7.955262] bnx2 0000:af:00.0 enp175s0f0: using MSIX
Jun  6 09:21:09 xxxx kernel: [    8.051353] bnx2 0000:af:00.1 enp175s0f1: using MSIX
Jun  6 09:21:12 xxxx kernel: [   11.178328] bnx2 0000:af:00.0 enp175s0f0: NIC Copper Link is Up, 1000 Mbps full duplex
Jun  6 09:21:12 xxxx kernel: [   11.765320] bnx2 0000:af:00.1 enp175s0f1: NIC Copper Link is Up, 1000 Mbps full duplex

And this is from bond init:

Code:
Jun  6 09:34:48 xxxx kernel: [    7.932033] bond0: Enslaving enp175s0f0 as a backup interface with an up link
Jun  6 09:34:48 xxxx kernel: [    8.012103] bond0: Enslaving enp175s0f1 as a backup interface with an up link
Jun  6 09:34:48 xxxx kernel: [    8.017834] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
Jun  6 09:34:49 xxxx kernel: [    8.145741] 8021q: adding VLAN 0 to HW filter on device bond0
Jun  6 09:34:49 xxxx kernel: [    8.187338] device bond0 entered promiscuous mode
Jun  6 09:34:53 xxxx kernel: [   12.710136] bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond
 
More information:

Code:
# mii-tool eth0
eth0: negotiated 1000baseT-FD flow-control, link ok

# mii-tool eth1
eth1: negotiated 1000baseT-FD flow-control, link ok

# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
....
02.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
...
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 1
        Actor Key: 0
        Partner Key: 1
....
Slave Interface: eth0
MII Status: down
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
...
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: churned
Actor Churned Count: 0
Partner Churned Count: 1

In syslog.log:
Code:
Jun  7 09:50:23 abfproxmox03 kernel: [  290.065525] bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond
 
I add miimon in my network interface and it seems to work:

Code:
auto bond0
iface bond0 inet manual
        slaves eth0 eth1
        bond-miimon 100
        bond-mode 802.3ad
        bond-lacp-rate 1

The strange thing is that i have another LACP bond, and no problem with it. The first bond (the one with miimon required) use bnx2 drivers. The second use e1000e drivers.
 
The ifenslave documentation states that either bond-miimon or bond-arp-interval should be given to detect link-failures - so if it works with bond-miimon, it'd probably best to leave it in the config
 
does the problem persist for you if you add the bond-miimon parameter to the /etc/network/interfaces stanza? As indicated in the docs it (or bond-arp-interval) is needed for link-failure detection
 
Yes, this is the config that worked before the update:

Code:
iface bond0 inet manual
        slaves enp175s0f0 enp175s0f1
        bond miimon 100
        bond_mode 802.3ad
 
The kernel changelogs don't indicate any particular changes w.r.t. bnx drivers.

Do the individual NICs work w/o the bond?

could you maybe post the output of ip link show (with the bond active and with the NICs set up by themselves), and the output of ethtool
for both NICs
 
Yeah, drivers are the same across latest kernel versions.
The NIC's work (I can see traffic over individual NIC's, but not over bond).

ip link show:

Code:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp175s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
3: enp175s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
4: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether a4:bf:01:38:df:dc brd ff:ff:ff:ff:ff:ff
5: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether a4:bf:01:38:df:dd brd ff:ff:ff:ff:ff:ff
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
7: vlan300@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr300 state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
8: vmbr300: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
9: vlan350@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr350 state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
10: vlan400@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr400 state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
11: vmbr350: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
12: vmbr400: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
13: vlan450@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr450 state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
14: vmbr450: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
15: vlan500@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr500 state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
16: vmbr500: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
17: vlan600@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr600 state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
18: vlan601@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr601 state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
19: vmbr600: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
20: vlan602@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr602 state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
21: vmbr601: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
22: vlan603@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr603 state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
23: vmbr602: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
24: vlan604@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr604 state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
25: vmbr603: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
26: vlan605@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr605 state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
27: vmbr604: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff
28: vmbr605: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:6e:4e:e8 brd ff:ff:ff:ff:ff:ff

Ethtool:

Code:
Settings for enp175s0f0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supported pause frame use: No
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: off
        Supports Wake-on: g
        Wake-on: d
        Link detected: yes

Settings for enp175s0f1:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supported pause frame use: No
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: off
        Supports Wake-on: g
        Wake-on: d
        Link detected: yes

Bond stats:

Code:
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 00:10:18:6e:4e:e8
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 1
        Actor Key: 0
        Partner Key: 1
        Partner Mac Address: 00:00:00:00:00:00

Slave Interface: enp175s0f0
MII Status: down
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:10:18:6e:4e:e8
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: churned
Actor Churned Count: 0
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 00:10:18:6e:4e:e8
    port key: 0
    port priority: 255
    port number: 1
    port state: 77
details partner lacp pdu:
    system priority: 65535
    system mac address: 00:00:00:00:00:00
    oper key: 1
    port priority: 255
    port number: 1
    port state: 1

Slave Interface: enp175s0f1
MII Status: down
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:10:18:6e:4e:ea
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 00:10:18:6e:4e:e8
    port key: 0
    port priority: 255
    port number: 2
    port state: 69
details partner lacp pdu:
    system priority: 65535
    system mac address: 00:00:00:00:00:00
    oper key: 1
    port priority: 255
    port number: 1
    port state: 1
 
  • sadly I don't have bnx-NICs around to reproduce the problem...
  • do the switch logs show any information - do they get the LACP-frames?
  • the kernel documentation at: https://www.kernel.org/doc/Documentation/networking/bonding.txt - might provide a starting point if you want to debug this further (maybe change from miimon to arp_link_monitoring, or the use_carrier flag).
 
  • sadly I don't have bnx-NICs around to reproduce the problem...
  • do the switch logs show any information - do they get the LACP-frames?
  • the kernel documentation at: https://www.kernel.org/doc/Documentation/networking/bonding.txt - might provide a starting point if you want to debug this further (maybe change from miimon to arp_link_monitoring, or the use_carrier flag).

I have changed the trunk from LACP to L2 XOR mode which works. In fact, everything does except 802.3ad mode.
I guess there's nothing to be done right now except try to test as new kernel versions hit the repos?
 
Hi. I have the same issue.
Here: intel i350 NIC.
in dmesg I have something like this:
Code:
[16576.599296] igb 0000:04:00.0 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[16576.626874] bond0: link status definitely up for interface eth1, 1000 Mbps full duplex
[16576.628108] bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond
[16576.629325] bond0: first active interface up!
[16576.630500] vmbr1: port 1(bond0) entered blocking state
[16576.631697] vmbr1: port 1(bond0) entered forwarding state
[16576.633074] IPv6: ADDRCONF(NETDEV_CHANGE): vmbr1: link becomes ready

...

[16576.831211] igb 0000:04:00.1 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[16576.834908] bond0: link status definitely up for interface eth2, 1000 Mbps full duplex
[16576.991287] igb 0000:05:00.0 eth3: igb: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[16577.058886] bond0: link status definitely up for interface eth3, 1000 Mbps full duplex
[16577.101838] vmbr60: port 1(bond0.60) entered blocking state
...
[16594.064493] device tap501i3 entered promiscuous mode
[16594.082740] vmbr13: port 2(tap501i3) entered blocking state
[16594.083781] vmbr13: port 2(tap501i3) entered disabled state
[16594.084905] vmbr13: port 2(tap501i3) entered blocking state
[16594.085912] vmbr13: port 2(tap501i3) entered forwarding state
is there any solution?
I already used the newest igb driver from intel and changed the kernel ... no Success.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!