balance-rr bond unbalanced

jhammer

Member
Dec 21, 2009
55
1
6
Receive (rx) packets in a balance-rr bond are not being distributed evenly for me.

In my configuration, eth3, eth4, and eth5 are in a multipath round-robin on a private network and distribute the load well.

eth0, eth1, and eth2 are in a balance-rr bond that is bridged to the main network. The switch (Dell Powerconnect 2724) is configured with LAG (Link aggregation) for the related ports (eth0, eth1, eth2). Transmit packets are distributed evenly among the 3 nics. However, received (rx) packets almost all go over a single interface, eth2 (see results in details below). I noticed in another thread in this forum that it is suggested to only have 2 NIC's in a bond. I tried that with similar results.

Why would the received (rx) packets not be distributed evenly for a balance-rr bond?

Here are some details...

Monitoring results
:

Code:
 bwm-ng v0.6 (probing every 0.500s), press 'h' for help
  input: /proc/net/dev type: sum
  /         iface                   Rx                   Tx                Total
  ==========================================================
               lo:               4278.15 KB            4278.15 KB                    8556.30 KB
             eth0:               245.56 KB           11215.56 KB                11461.12 KB
             eth1:           31120.73 KB           11197.75 KB                42318.48 KB
             eth2:       1533256.66 KB           11231.98 KB              544488.64 KB
             eth3:           27875.98 KB         760763.83 KB              788639.81 KB
             eth4:           28827.72 KB         763390.26 KB              792217.98 KB
             eth5:           29498.55 KB         762272.50 KB              791771.05 KB
            bond0:     1564622.88 KB             33645.29 KB          1598268.17 KB
            vmbr0:       791532.00 KB             29198.27 KB            820730.27 KB
           venet0:                 0.00 KB                  0.00 KB                         0.00 KB
       vmtab102i0:        4447.02 KB           765576.42 KB           770023.43 KB
  -----------------------------------------------------------------------------------
            total:        4015705.25 KB          3152770.01 KB         7168475.26 KB

/etc/network/interfaces
:
Code:
# network interface settings 
auto lo
iface lo inet loopback

iface eth0 inet manual

iface eth1 inet manual

iface eth2 inet manual

auto eth3
iface eth3 inet static
        address  172.18.32.100
        netmask  255.255.255.0
        mtu 9000

auto eth4
iface eth4 inet static
        address  172.18.33.100
        netmask  255.255.255.0
        mtu 9000

auto eth5
iface eth5 inet static
        address  172.18.34.100
        netmask  255.255.255.0
        mtu 9000

auto bond0
iface bond0 inet manual
        slaves eth0 eth1 eth2
        bond_miimon 100
        bond_mode balance-rr

auto vmbr0
iface vmbr0 inet static
        address  10.132.32.49
        netmask  255.255.254.0
        gateway  10.132.32.2
        bridge_ports bond0
        bridge_stp off
        bridge_fd 0
NICs:
Intel 82546GB
Intel 80003ES2LAN

Thanks.
 
I tried your suggestion to turn on bridge_stp. It made no difference for me. The rx packets are still unbalanced.
 
I did find the following regarding balance-rr bonding at the link below:

This mode is the only mode that will permit a single TCP/IP connection to stripe traffic across multiple interfaces...Many switches do not support any modes that stripe traffic (instead choosing a port based upon IP or MAC level addresses); for those devices, traffic for a particular connection flowing through the switch to a balance-rr bond will not utilize greater than one interface's worth of bandwidth.

http://www.kernel.org/doc/Documentation/networking/bonding.txt

I'm not sure if that is what I'm running into or not.
 
That is a good thought and sounds right to me. Perhaps I should look into the switch configuration a little more.

Thanks!
 
I've done more thorough testing on bonding modes. The best solution in my environment seems to be balance-rr with port trunking turned off on the switch...I'm surprised by that because this document suggests that balance-rr, balance-xor, and 802.3ad all require special switch configuration:

http://www.kernel.org/doc/Documentation/networking/bonding.txt

Here is my environment and some related results. The results are ordered by seemingly best performance (i.e. balance):

Switch: Dell Powerconnect 2724
Kernel: Linux prism 2.6.32-1-pve #1 SMP or Linux prism 2.6.18-2-pve #1 SMP
Proxmox: 1.5
Tests: Moved large files back and forth between proxmox server and 15 hosts via scp


balance-rr with switch ports not in trunk group:

iface: Rx Tx Total
eth0: 2614822.47 KB 2642812.25 KB 5257634.72 KB
eth1: 2624789.06 KB 2645099.71 KB 5269888.77 KB
eth2: 2709968.92 KB 2632662.74 KB 5342631.66 KB

balance-rr with switch ports in trunk group:

iface: Rx Tx Total
eth0: 4235957.92 KB 2592540.87 KB 6828498.79 KB
eth1: 1586867.44 KB 2595115.53 KB 4181982.96 KB
eth2: 2101130.24 KB 2588428.89 KB 4689559.13 KB

balance-xor with switch ports in trunk group:

iface: Rx Tx Total
eth0: 1543622.93 KB 4708218.77 KB 6251841.70 KB
eth1: 3627236.43 KB 2080233.80 KB 5707470.23 KB
eth2: 2068243.86 KB 1044213.55 KB 3112457.41 KB

balance-xor with switch ports not in trunk group:

iface: Rx Tx Total
eth0: 4462242.77 KB 4716836.65 KB 9179079.41 KB
eth1: 2274976.67 KB 2096756.41 KB 4371733.08 KB
eth2: 1022925.45 KB 1052227.11 KB 2075152.55 KB

balance-alb with switch ports not in trunk group:

iface: Rx Tx Total
eth0: 139.65 KB 63.11 KB 202.76 KB
eth1: 139.65 KB 63.11 KB 202.76 KB
eth2: 5130005.38 KB 1621164.95 KB 6751170.33 KB

balance-alb with switch ports in trunk group:

Does not work. Cannot ping.

802.3ad with switch ports in trunk group:

Does not work. Cannot ping unless bridge_stp is turned on. Even then I get 10% packet loss.

802.3ad with switch ports not in trunk group:

Does not work. Get the following error repeatedly: "bonding: bond0: An illegal loopback occurred on adapter (eth0). Check the configuration to verify that all Adapters are connected to 802.3ad compliant switch ports"

NOTE: Regarding 802.3ad on the Dell Powerconnect 2724...I spoke with Dell and the LAG groups on that switch only support static as opposed to LACP...so 802.3ad will likely not work with this switch.


So the un-trunked balance-rr is definitely the best in this test. If anyone has other tests they recommend for measuring performance on a bond or if you have insights into why this is working best with switch ports not in trunk groups, I'd greatly appreciate it.
 
UPDATE: balance-rr mode with switch ports not in trunk group has hiccups. Though I see no errors/dropped packets when pinging the physical hosts...pinging the VM's I get between 25% and 75% packet loss. So it looks like, as the documentation states, the ports on the switch have to be trunked in some fashion. When I create LAG (trunk) groups on the switch I get 0% packet loss on the physical host and on VM's.
 
Was this problem ever solved?

I have the same issue with a Dell PowerConnect 5448 switch. I tried 802.3ad and balance-rr with/without trunk group and I can't get it working like it should.

On our office we used a different switch. We didn't create a LAG and used balance-rr, the load was balanced in/out perfectly.

Now, in the datacenter with the Dell PowerConnect 5448, we can only balance traffic going out of the fileserver. Incoming traffic is not balanced, every Proxmox node is only using 1 NIC of the fileserver. This was very problematic with the initial copy of all VM's to the new fileserver since there was no capacity left for normal disk IO.

Is it a bad thing that I use 4 ports with balance-rr? I see alot of topics about balance-rr and "more then 2 ports is bad".
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!