Bonding problems. LACP is selected and rebooted to take effect. Stuck in Round Robin.

sirsean12

Member
Jul 17, 2017
8
0
6
39
Bonding problems. LACP is selected and rebooted to take effect. WebGUI shows LACP but... cat /proc/net/bonding/bond0 shows "round robin" and I am getting errors on the switch and in Proxmox. I have no idea how to fix this without a reinstall. I have a Cisco Catalyst 3560G, I have the switch set up for LACP.

Also, I have an identical node in the same cluster that does not have this issue. So it is not Hardware related.
Thanks!


Interfaces are as follows.

interface GigabitEthernet0/37
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 4 mode active
!
interface GigabitEthernet0/38
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 4 mode active


/etc/network/interfaces


auto lo
iface lo inet loopback

iface eno2 inet manual

iface eno1 inet manual

auto bond0
iface bond0 inet manual
slaves eno1 eno2
bond_miimon 100
bond_mode 802.3ad

auto vmbr0
iface vmbr0 inet static
address 192.168.2.4
netmask 255.255.255.0
gateway 192.168.2.1
bridge_ports bond0
bridge_stp off
bridge_fd 0



/proc/net/bonding/bond0

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eno1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:25:90:47:27:96
Slave queue ID: 0

Slave Interface: eno2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:25:90:47:27:97
Slave queue ID: 0


Thank you!
 
Last edited:
you'll also of course need to make sure the switch lacp mode is set to agree with what you choose for interfaces . we use vlans on switch and in pve. so on netgear we use
Code:
3 Src/Dest MAC, VLAN, EType, incoming port
with this in interfaces
Code:
       bond_miimon 100
       bond_mode 802.3ad
       bond_xmit_hash_policy layer2+3

Code:
# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 0c:c4:7a:17:fb:cc
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 2
        Actor Key: 9
        Partner Key: 441
        Partner Mac Address: e0:91:f5:00:89:38

Slave Interface: enp4s0f0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 0c:c4:7a:17:fb:cc
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 0c:c4:7a:17:fb:cc
    port key: 9
    port priority: 255
    port number: 1
    port state: 61
details partner lacp pdu:
    system priority: 32768
    system mac address: e0:91:f5:00:89:38
    oper key: 441
    port priority: 128
    port number: 24
    port state: 61

Slave Interface: enp4s0f1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 0c:c4:7a:17:fb:cd
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 0c:c4:7a:17:fb:cc
    port key: 9
    port priority: 255
    port number: 2
    port state: 61
details partner lacp pdu:
    system priority: 32768
    system mac address: e0:91:f5:00:89:38
    oper key: 441
    port priority: 128
    port number: 76
    port state: 61

I am not a network expert. I refer and quote the link Alwin sent in requests for help at netgear forums. Cisco has historically had a more responsive user forum. There are so many combination of options between linux and on the hardware forums experts to help verify optimal settings.

I assume your swtich is a Layer 3 fully managed model?
 
bond_xmit_hash_policy layer2+3
I meant the xmit_hash_policy. default should be layer2, but maybe the system didn't pick it up. Also did you did a service or host restart after configuring it?
 
>Also, I have an identical node in the same cluster that does not have this issue. So it is not Hardware related.


do not be so sure of anything until the issue is fixed!

look at hardware : motherboard to nic to cable to switch port . switch configuration.

and after changing settings at /etc/network/interfaces I just reboot, as I have not found another way yet to ensure the settings work with vm's.
 
Adding the hash policy worked. Thank you!

Rob,
I agree with your statement. However, I had gone through the appropriate steps to insure that hardware was not the issue.

Thanks again guys!

-Sean
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!