HP DL gen10 no bond communication

tl5k5

Well-Known Member
Jul 28, 2017
62
1
48
52
Hey all,
I have a DL gen10 server that I installed an HPE dual 10Gb 560FLR-SFP+ Adapter into.
I've found that it will not pass bond0/LACP communication. An Intel X520-T2 with the same interface bond0/LACP config connects without issue.
The switch shows LACP is communicating/connected.

Does anyone see any issues with the config as to why it's not communicating/connecting?
Code:
auto lo
iface lo inet loopback

auto eno1
iface eno1 inet manual

auto eno2
iface eno2 inet manual

auto bond0
iface bond0 inet manual
        bond-slaves eno1 eno2
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer2+3

auto vmbr0
iface vmbr0 inet static
        address 10.##.###.###/24
        gateway 10.###.###.1
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094

Here's what iLO sees:

5nNAEa6QhD.png

Thanks!!!
 
Can you post your dmesg or syslog, are there any errors regarding that card?
 
Can you post your dmesg or syslog, are there any errors regarding that card?
None that I understand. I'm more of a sys admin...not a Linux admin. See attached zip file

UPDATE: You may see vmbr1. This was part of my testing. There's no type-o in the config.
 

Attachments

Last edited:
What about? cat /proc/net/bonding/bond0 Do the adapters work without bond? eno1 and eno2?
Can you update the firmware? Is LACP correctly configured on the opposite site? (Layer 2+3 = MAC+IP)
 
What about? cat /proc/net/bonding/bond0 Do the adapters work without bond? eno1 and eno2?
Can you update the firmware? Is LACP correctly configured on the opposite site? (Layer 2+3 = MAC+IP)
See attached bond0.
Yes, both ports work on the adapter (10Gb) when not in a bond.
I already updated the adapter firmware to the latest version.
Yes, LACP is configured on the opposite side. Again, an X520-T2 worked in bond/LACP with the same config. The Intel was swapped out for the HP FlexLOM adaptor. The only difference is the card, all other components and configs are the same.

Thanks!

CORRECTION: sorry Intel was an X520-DA2
 

Attachments

Last edited:
I may have found the issue.
OK HPE...which is it?!?!? You can't have it both ways!

Seems strange this could be different on the same exact chipset!!!

vivaldi_9WGBJqv8rw.png

vs.

vivaldi_r6WSyunTZn.png
 
See attached bond0.
Yes, both ports work on the adapter (10Gb) when not in a bond.
I already updated the adapter firmware to the latest version.
Yes, LACP is configured on the opposite side. Again, an X520-T2 worked in bond/LACP with the same config. The Intel was swapped out for the HP FlexLOM adaptor. The only difference is the card, all other components and configs are the same.

Thanks!

CORRECTION: sorry Intel was an X520-DA2

Can you install ethtool and checkout: ethtool --show-priv-flags eno1 and ethtool --show-priv-flags eno2 if there is lldp enabled? Regarding the type of the card, you can check it with lspci or with lshw (apt install lshw) and then lshw -c network -businfo

Edit: maybe you also check the driver for the card that is used, and if there is one avaiable from intel with dkms support.
 
Last edited:
You said the ports worked when not in bond. Did you reconfigure your switch and removed the LACP config for corresponding ports to test this? Are you moving cables to different ports? Are you doing an apples to apples comparison is the question.
You said that a different card worked in bond - was it physically connected to same ports/bond on the switch or elsewhere?

your log file shows:
Code:
May 26 14:03:19 hostname kernel: [   10.114840] ixgbe 0000:24:00.0: registered PHC device on eno1
May 26 14:03:19 hostname kernel: [   10.224409] bond0: (slave eno1): Enslaving as a backup interface with a down link
May 26 14:03:19 hostname kernel: [   10.286721] ixgbe 0000:24:00.0 eno1: detected SFP+: 4
May 26 14:03:19 hostname kernel: [   10.517704] ixgbe 0000:24:00.1: registered PHC device on eno2
May 26 14:03:19 hostname kernel: [   10.537993] ixgbe 0000:24:00.0 eno1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May 26 14:03:19 hostname kernel: [   10.624412] bond0: (slave eno2): Enslaving as a backup interface with a down link
May 26 14:03:19 hostname kernel: [   10.640490] vmbr1: port 1(bond0) entered blocking state
May 26 14:03:19 hostname kernel: [   10.640493] vmbr1: port 1(bond0) entered disabled state
May 26 14:03:19 hostname kernel: [   10.686445] ixgbe 0000:24:00.1 eno2: detected SFP+: 3
May 26 14:03:19 hostname kernel: [   10.806289] bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond
May 26 14:03:19 hostname kernel: [   10.814523] vmbr1: port 1(bond0) entered blocking state
May 26 14:03:19 hostname kernel: [   10.814527] vmbr1: port 1(bond0) entered forwarding state
May 26 14:03:19 hostname kernel: [   10.818105] bond0: (slave eno1): link status definitely up, 10000 Mbps full duplex
May 26 14:03:19 hostname kernel: [   10.818120] bond0: active interface up!
May 26 14:03:19 hostname kernel: [   10.941933] ixgbe 0000:24:00.1 eno2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May 26 14:03:19 hostname kernel: [   11.026138] bond0: (slave eno2): link status definitely up, 10000 Mbps full duplex
May 26 14:03:20 hostname kernel: [   11.817890] IPv6: ADDRCONF(NETDEV_CHANGE): vmbr1: link becomes ready

This is a pretty descriptive message explaining what the Linux wanted but didnt get:
May 26 14:03:19 hostname kernel: [ 10.806289] bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond

Double check all of your configuration. Install tcpdump and run this on each physical interface (on working and non-working card):
# tcpdump -ni <IFNAME> ether proto 0x8809

do you see LACP PDUs in each case? You should be seeing them regardless of the Linux side config.
Check the switch for any sort of "banning" of macs/ports due to flapping/changing.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
You said the ports worked when not in bond. Did you reconfigure your switch and removed the LACP config for corresponding ports to test this? Are you moving cables to different ports? Are you doing an apples to apples comparison is the question.
You said that a different card worked in bond - was it physically connected to same ports/bond on the switch or elsewhere?

your log file shows:
Code:
May 26 14:03:19 hostname kernel: [   10.114840] ixgbe 0000:24:00.0: registered PHC device on eno1
May 26 14:03:19 hostname kernel: [   10.224409] bond0: (slave eno1): Enslaving as a backup interface with a down link
May 26 14:03:19 hostname kernel: [   10.286721] ixgbe 0000:24:00.0 eno1: detected SFP+: 4
May 26 14:03:19 hostname kernel: [   10.517704] ixgbe 0000:24:00.1: registered PHC device on eno2
May 26 14:03:19 hostname kernel: [   10.537993] ixgbe 0000:24:00.0 eno1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May 26 14:03:19 hostname kernel: [   10.624412] bond0: (slave eno2): Enslaving as a backup interface with a down link
May 26 14:03:19 hostname kernel: [   10.640490] vmbr1: port 1(bond0) entered blocking state
May 26 14:03:19 hostname kernel: [   10.640493] vmbr1: port 1(bond0) entered disabled state
May 26 14:03:19 hostname kernel: [   10.686445] ixgbe 0000:24:00.1 eno2: detected SFP+: 3
May 26 14:03:19 hostname kernel: [   10.806289] bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond
May 26 14:03:19 hostname kernel: [   10.814523] vmbr1: port 1(bond0) entered blocking state
May 26 14:03:19 hostname kernel: [   10.814527] vmbr1: port 1(bond0) entered forwarding state
May 26 14:03:19 hostname kernel: [   10.818105] bond0: (slave eno1): link status definitely up, 10000 Mbps full duplex
May 26 14:03:19 hostname kernel: [   10.818120] bond0: active interface up!
May 26 14:03:19 hostname kernel: [   10.941933] ixgbe 0000:24:00.1 eno2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May 26 14:03:19 hostname kernel: [   11.026138] bond0: (slave eno2): link status definitely up, 10000 Mbps full duplex
May 26 14:03:20 hostname kernel: [   11.817890] IPv6: ADDRCONF(NETDEV_CHANGE): vmbr1: link becomes ready

This is a pretty descriptive message explaining what the Linux wanted but didnt get:
May 26 14:03:19 hostname kernel: [ 10.806289] bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond

Double check all of your configuration. Install tcpdump and run this on each physical interface (on working and non-working card):
# tcpdump -ni <IFNAME> ether proto 0x8809

do you see LACP PDUs in each case? You should be seeing them regardless of the Linux side config.
Check the switch for any sort of "banning" of macs/ports due to flapping/changing.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
I am cp'ing backups of the bond interface and the none-bond interface files and then systemctl restart networking.
I have two runs of cables going to two different locations on the switch. One set is a LACP LAG and the other is a switchport.
I move the cable(s) into the same port(s) on the NIC.
So yes...apples to apples.
Yes, the different cards used the same cables and same ports on the switch. The Intel is no longer installed.

Config is exactly the same between the Intel and HP cards. They both show up as eno1 and eno2 since they were never in the server at the same time.
I'll look into tcpdump.
The switch ports and LACP show all is good on that end.
This is a pretty simple config. I'm not sure why this is such an issue.

Thanks for the help!
 
Can you install ethtool and checkout: ethtool --show-priv-flags eno1 and ethtool --show-priv-flags eno2 if there is lldp enabled? Regarding the type of the card, you can check it with lspci or with lshw (apt install lshw) and then lshw -c network -businfo

Edit: maybe you also check the driver for the card that is used, and if there is one avaiable from intel with dkms support.
see attached.
 

Attachments

Please post: ethtool -m eno1 and ethtool eno1 - also for eno2
lshw says this is a 82599ES card - googling this with bond brings up lots of topics that bonding is not working.
 
Please post: ethtool -m eno1 and ethtool eno1 - also for eno2
lshw says this is a 82599ES card - googling this with bond brings up lots of topics that bonding is not working.
Interesting...makes me wonder more about Post 6 and how HPE docs contradict themselves.

See attached ethtool info
 

Attachments

Thanks everyone for the help.
I'll start troubleshooting again on Monday.
Everyone have a great weekend!