4x1Gb LACP slow VM bandwidth

Discussion in 'Proxmox VE: Networking and Firewall' started by tl5k5, Sep 26, 2017.

  1. tl5k5

    tl5k5 New Member

    Joined:
    Jul 28, 2017
    Messages:
    19
    Likes Received:
    0
    Hey everyone,
    I'm currently building a "proof-of-concept" for work using Proxmox.
    I have a 4x1Gb LACP config (see below), but I get slow performance from my VMs.
    I'm using a 400MBps NAS storage that works perfectly when a 10Gb non-vm client is tested.
    On Proxmox I have 3, Win 2012 r2 VM's with the VirtIO NIC (latest release). Using the NAS manufacture's speed tool on 1 VM, I get 1Gb r/w speed. When testing 2 VMs, I get up to 2Gb aggregate reads and 1.4Gb aggregate write but it's not sustained performance. When I use 3 VMs, I get up to 2.5Gb aggregate reads and 1.4Gb aggregate writes but the performance is more erratic (larger and longer drops in performance). When I test 3 non-VM systems plugged into my switch, each will get 1Gb sustained r/w (315+MBps aggregate).
    BTW...the physical switch shows it is connecting with LACP to the Proxmox server.

    Questions =
    1: When using 1 VM with VirtIO (10Gb link speed), why do I only get 1Gb speed when the vmbr has a 4Gb connection?
    - I would think a 10Gb VirtIO NIC would use the maximum bandwidth the LAG could provide.
    2: Why am I seeing such poor performance with multiple VMs using the 4Gb LAG?
    - Seems like they are (poorly) fighting for limited bandwidth.
    3: Is the vmbr switch limited in some way?

    Config:

    auto lo
    iface lo inet loopback

    iface eno4 inet manual

    iface eno3 inet manual

    iface eno1 inet manual

    iface eno2 inet manual

    iface enp132s0f0 inet manual

    iface enp132s0f1 inet manual

    auto bond0
    iface bond0 inet manual
    slaves eno1 eno2 eno3 eno4
    bond_miimon 100
    bond_mode 802.3ad
    bond_xmit_hash_policy layer2+3

    auto vmbr0
    iface vmbr0 inet static
    address x.x.255.20
    netmask 255.255.255.0
    gateway x.x.255.1
    bridge_ports bond0
    bridge_stp on
    bridge_fd 0

    auto vmbr1
    iface vmbr1 inet manual
    bridge_ports enp132s0f0 enp132s0f1
    bridge_stp off
    bridge_fd 0



    Thanks!
     
  2. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Joined:
    Oct 1, 2014
    Messages:
    4,763
    Likes Received:
    315
    Because LACP works on layer 2+3 what means:
    One VM has one MAC address and one IP address and the storage has also one mac an one IP.
    Therefore you use always the same nic because the hashing algo will always calculate the same hash.

    Im not sure but may be your switch is to slow.

    No the vmbr is as fast your cpu is.
    LACP use hashing and when you test against one target you can only use one nic. In your case 1GBit.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. Symbol

    Symbol Member
    Proxmox Subscriber

    Joined:
    Mar 1, 2017
    Messages:
    42
    Likes Received:
    4
    https://en.wikipedia.org/wiki/Link_aggregation#Order_of_frames

    You could replace
    bond_xmit_hash_policy layer2+3
    with
    bond_xmit_hash_policy encap3+4

    which might help to better distribute the load on the LAG.
     
  4. tl5k5

    tl5k5 New Member

    Joined:
    Jul 28, 2017
    Messages:
    19
    Likes Received:
    0
    I know it's not the switch. Same switch with physical clients has maximum sustained bandwidth.

    I'll try the bond_xmit_hash_policy encap3+4 when I get a chance...currently traveling.

    Do you think it's worth trying a different type of LAG than LACP?

    Thanks!
     
  5. tl5k5

    tl5k5 New Member

    Joined:
    Jul 28, 2017
    Messages:
    19
    Likes Received:
    0
    Changing to bond_xmit_hash_policy encap3+4 did not make a difference.

    So would changing the LAG to something other than LACP get me any more bandwidth?

    UPDATE:
    Changed to bond_xmit_hash_policy layer3+4 and I'm seeing sustained 325MBps aggregate read speed!
    I'm only seeing 80MBps aggregate write speed. :( Each NIC is fluctuating between 20-40MBps.
     
    #5 tl5k5, Sep 26, 2017
    Last edited: Sep 27, 2017
  6. Symbol

    Symbol Member
    Proxmox Subscriber

    Joined:
    Mar 1, 2017
    Messages:
    42
    Likes Received:
    4
    Of course you should see near 1Gb/s aggregate if you can see this with several real hosts toward your NAS...
    Did you check the CPU use ?
     
  7. tl5k5

    tl5k5 New Member

    Joined:
    Jul 28, 2017
    Messages:
    19
    Likes Received:
    0
    @Symbol Just so you know, I did not see the that kind of read speed with multiple VMs if you use layer2+3.
    With 3 VMs writing, my CPU(s)24 x Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz (2 Sockets) will jump to 14.6% and then quickly drop to 5%.
     
  8. Symbol

    Symbol Member
    Proxmox Subscriber

    Joined:
    Mar 1, 2017
    Messages:
    42
    Likes Received:
    4
    Well, as the config of the hash on the Host only deals with outgoing packets, Write toward the NAS could be expected to be quicker, not really read...
    Anyway you could check/change the hash config on the switch, I guess.
     
  9. tl5k5

    tl5k5 New Member

    Joined:
    Jul 28, 2017
    Messages:
    19
    Likes Received:
    0
    The switch is a Dell n3024 with the following config:

    switch#show interfaces port-channel 1

    Channel Ports Ch-Type Hash Type Min-links Local Prf
    ------- ----------------------------- -------- --------- --------- ---------
    Po1 Active: Gi1/0/19, Gi1/0/20, Dynamic 7 1 Disabled
    Gi1/0/21, Gi1/0/22

    Hash Algorithm Type
    1 - Source MAC, VLAN, EtherType, source module and port Id
    2 - Destination MAC, VLAN, EtherType, source module and port Id
    3 - Source IP and source TCP/UDP port
    4 - Destination IP and destination TCP/UDP port
    5 - Source/Destination MAC, VLAN, EtherType, source MODID/port
    6 - Source/Destination IP and source/destination TCP/UDP port
    7 - Enhanced hashing mode


    I'm not a switch expert, but I see the Hash Type is set to Enhanced. Does anyone have suggestion on what settings I should try?

    Thanks!
     
  10. Symbol

    Symbol Member
    Proxmox Subscriber

    Joined:
    Mar 1, 2017
    Messages:
    42
    Likes Received:
    4
    It seems to be already optimal. So the next step would be to graph each port and see how the load is balanced. If it's properly, then it's not a LAG issue.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice