Bonding problems. LACP is selected and rebooted to take effect. Stuck in Round Robin.

Discussion in 'Proxmox VE: Installation and configuration' started by sirsean12, Nov 12, 2017.

  1. sirsean12

    sirsean12 New Member

    Joined:
    Jul 17, 2017
    Messages:
    8
    Likes Received:
    0
    Bonding problems. LACP is selected and rebooted to take effect. WebGUI shows LACP but... cat /proc/net/bonding/bond0 shows "round robin" and I am getting errors on the switch and in Proxmox. I have no idea how to fix this without a reinstall. I have a Cisco Catalyst 3560G, I have the switch set up for LACP.

    Also, I have an identical node in the same cluster that does not have this issue. So it is not Hardware related.
    Thanks!


    Interfaces are as follows.

    interface GigabitEthernet0/37
    switchport trunk encapsulation dot1q
    switchport mode trunk
    channel-group 4 mode active
    !
    interface GigabitEthernet0/38
    switchport trunk encapsulation dot1q
    switchport mode trunk
    channel-group 4 mode active


    /etc/network/interfaces


    auto lo
    iface lo inet loopback

    iface eno2 inet manual

    iface eno1 inet manual

    auto bond0
    iface bond0 inet manual
    slaves eno1 eno2
    bond_miimon 100
    bond_mode 802.3ad

    auto vmbr0
    iface vmbr0 inet static
    address 192.168.2.4
    netmask 255.255.255.0
    gateway 192.168.2.1
    bridge_ports bond0
    bridge_stp off
    bridge_fd 0



    /proc/net/bonding/bond0

    Bonding Mode: load balancing (round-robin)
    MII Status: up
    MII Polling Interval (ms): 100
    Up Delay (ms): 0
    Down Delay (ms): 0

    Slave Interface: eno1
    MII Status: up
    Speed: 1000 Mbps
    Duplex: full
    Link Failure Count: 0
    Permanent HW addr: 00:25:90:47:27:96
    Slave queue ID: 0

    Slave Interface: eno2
    MII Status: up
    Speed: 1000 Mbps
    Duplex: full
    Link Failure Count: 0
    Permanent HW addr: 00:25:90:47:27:97
    Slave queue ID: 0


    Thank you!
     
    #1 sirsean12, Nov 12, 2017
    Last edited: Nov 12, 2017
  2. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,309
    Likes Received:
    206
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. RobFantini

    RobFantini Active Member
    Proxmox Subscriber

    Joined:
    May 24, 2012
    Messages:
    1,516
    Likes Received:
    21
    you'll also of course need to make sure the switch lacp mode is set to agree with what you choose for interfaces . we use vlans on switch and in pve. so on netgear we use
    Code:
    3 Src/Dest MAC, VLAN, EType, incoming port
    
    with this in interfaces
    Code:
           bond_miimon 100
           bond_mode 802.3ad
           bond_xmit_hash_policy layer2+3
    
    Code:
    # cat /proc/net/bonding/bond0
    Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
    
    Bonding Mode: IEEE 802.3ad Dynamic link aggregation
    Transmit Hash Policy: layer2+3 (2)
    MII Status: up
    MII Polling Interval (ms): 100
    Up Delay (ms): 0
    Down Delay (ms): 0
    
    802.3ad info
    LACP rate: slow
    Min links: 0
    Aggregator selection policy (ad_select): stable
    System priority: 65535
    System MAC address: 0c:c4:7a:17:fb:cc
    Active Aggregator Info:
            Aggregator ID: 1
            Number of ports: 2
            Actor Key: 9
            Partner Key: 441
            Partner Mac Address: e0:91:f5:00:89:38
    
    Slave Interface: enp4s0f0
    MII Status: up
    Speed: 1000 Mbps
    Duplex: full
    Link Failure Count: 0
    Permanent HW addr: 0c:c4:7a:17:fb:cc
    Slave queue ID: 0
    Aggregator ID: 1
    Actor Churn State: none
    Partner Churn State: none
    Actor Churned Count: 0
    Partner Churned Count: 0
    details actor lacp pdu:
        system priority: 65535
        system mac address: 0c:c4:7a:17:fb:cc
        port key: 9
        port priority: 255
        port number: 1
        port state: 61
    details partner lacp pdu:
        system priority: 32768
        system mac address: e0:91:f5:00:89:38
        oper key: 441
        port priority: 128
        port number: 24
        port state: 61
    
    Slave Interface: enp4s0f1
    MII Status: up
    Speed: 1000 Mbps
    Duplex: full
    Link Failure Count: 0
    Permanent HW addr: 0c:c4:7a:17:fb:cd
    Slave queue ID: 0
    Aggregator ID: 1
    Actor Churn State: none
    Partner Churn State: none
    Actor Churned Count: 0
    Partner Churned Count: 0
    details actor lacp pdu:
        system priority: 65535
        system mac address: 0c:c4:7a:17:fb:cc
        port key: 9
        port priority: 255
        port number: 2
        port state: 61
    details partner lacp pdu:
        system priority: 32768
        system mac address: e0:91:f5:00:89:38
        oper key: 441
        port priority: 128
        port number: 76
        port state: 61
    
    I am not a network expert. I refer and quote the link Alwin sent in requests for help at netgear forums. Cisco has historically had a more responsive user forum. There are so many combination of options between linux and on the hardware forums experts to help verify optimal settings.

    I assume your swtich is a Layer 3 fully managed model?
     
  4. sirsean12

    sirsean12 New Member

    Joined:
    Jul 17, 2017
    Messages:
    8
    Likes Received:
    0
    Also, I have an identical node in the same cluster that does not have this issue. So it is not Hardware related.
    Thanks!
    Am I missing something?
    I Clearly have the mode selected in /etc/network/interface.... "bond_mode 802.3ad"
     
  5. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,309
    Likes Received:
    206
    I meant the xmit_hash_policy. default should be layer2, but maybe the system didn't pick it up. Also did you did a service or host restart after configuring it?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  6. RobFantini

    RobFantini Active Member
    Proxmox Subscriber

    Joined:
    May 24, 2012
    Messages:
    1,516
    Likes Received:
    21
    >Also, I have an identical node in the same cluster that does not have this issue. So it is not Hardware related.


    do not be so sure of anything until the issue is fixed!

    look at hardware : motherboard to nic to cable to switch port . switch configuration.

    and after changing settings at /etc/network/interfaces I just reboot, as I have not found another way yet to ensure the settings work with vm's.
     
  7. sirsean12

    sirsean12 New Member

    Joined:
    Jul 17, 2017
    Messages:
    8
    Likes Received:
    0
    Adding the hash policy worked. Thank you!

    Rob,
    I agree with your statement. However, I had gone through the appropriate steps to insure that hardware was not the issue.

    Thanks again guys!

    -Sean
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice