Solution: Multiple OVS/SDN and LACP heartaches

xrobau

Member
Sep 4, 2022
30
8
8
QLD, Australia
clearlyip.com
There's a couple of problems that we've discovered as we've been doing some testing and tuning on Proxmox and OVS, which has lead me on a bit of a dive into the way OVS works and how it's configured.

Problem 1: Lots of people are reporting that for no reason at all, `ovs-appctl bond/show` is marking a member as `may_enable: false` and it's not possible to re-enable it without phyiscally unplugging and replugging the interface - in my case, even `ip link set eth1 down` didn't work, and I had to do it at the switch.

Cause: It APPEARS that OVS has a very short timeout on LACP PDUs, and once it marks a port as unusable, even if a LACP PDU arrives later, it is never enabled.

Solution: lacp-time fast EVERYWHERE. This is in a lot of the example documentation for Proxmox/OVS, but without any reason why. And make sure you do it on the switch, too!

Problem 2: SOME Traffic randomly pauses inexplicably, and then resumes a few second later.

Cause: OVS always wants to rebalance traffic out different interfaces based on load. There's no way to stop it from doing that. The 'normal' LACP load balancing is a simple hash based on src/dest/mac and never changes in the middle of a flow. There is no way to STOP OVS from doing that, and even worse, it also ignores updates for 5 seconds itself after it happens. https://docs.openvswitch.org/en/latest/topics/bonding/?highlight=lacp#lacp-bonding (See the last sentence)

Solution: Use bond_mode=active-backup with lacp=active. Unfortunately, this is not configurable in the GUI currently, but I have opened a ticket requesting it - https://bugzilla.proxmox.com/show_bug.cgi?id=4454 - which should be so simple to do that even I may be able to do a pull request 8)

Example:
Code:
auto bond0
iface bond0 inet manual
        ovs_bonds eth1 eth0
        ovs_type OVSBond
        ovs_bridge vmbr0
        ovs_mtu 9000
        ovs_options bond_mode=active-backup lacp=active other_config:lacp-time=fast

Also, for anyone who wants to roll out a pile of Proxmox hosts, I've written a playbook to bootstrap the machine and enable kexec for fast reboots. Note that this DISABLES ALL SPECTRE MITIGATIONS, because we don't host untrusted VMs. If you do, make sure you remove that!

https://github.com/xrobau/ansible-proxmox-host
 
Last edited:
Hi,

AFAIK, only balance-slb is doing rebalancing , but not balance-tcp.

Balance-tcp should be used with lacp.
Sadly that's what any sensible person would think, but you're incorrect. I refer you to both links posted above. OVS always ALWAYS rebalances. There's no way to stop it.

This was so surprising and unexpected that I ended up setting up port mirroring to a dedicated packet capture machine to see what was going on. And that's what it was. I'm not *certain* that the MLAG is what's causing the pause, but it's the only thing that makes sense.

This is visible when running ovs-appctl bond/show - you will see that the hash ids change ports. That's when OVS is switching outputs.
 
Last edited:
Sadly that's what any sensible person would think, but you're incorrect. I refer you to both links posted above. OVS always ALWAYS rebalances. There's no way to stop it.

This was so surprising and unexpected that I ended up setting up port mirroring to a dedicated packet capture machine to see what was going on. And that's what it was. I'm not *certain* that the MLAG is what's causing the pause, but it's the only thing that makes sense.

This is visible when running ovs-appctl bond/show - you will see that the hash ids change ports. That's when OVS is switching outputs.
I'll try to look at the code to see if it's a bug. But definitiively, if traffic is rebalanced, you can at minimum expect packet retransmissions. (not sure about a full pause). with active-backup + lacp, do you have only 1 link active ? or do you have hash policy working, with different connections on differents links ?

Do you have tried with a linux lacp bond to compare ?
 
Last edited:
I have a balance-tcp optimization, rework in 2019-2020, in the ovs mailng list

https://www.openvswitch.org/support/ovscon2019/day2/0944-Balance-TCP Performance Improvement.pdf
https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368563.html

It's not enabled by default, but can be enable with

Code:
New configuration knob 'other_config:lb-output-action' for bond ports
       that enables new datapath action 'lb_output' to avoid recirculation
       in balance-tcp mode.  Disabled by default.

Code:
       • ovs-vsctl set port <bond port> other_config:lb-output-action=true


Could you try it and verify with at tcpdump or port mirroting sniffing ?
 
... disable relbalancing ...

Which will be exactly the same as active-backup, because all the flows will go out the active interface, and it'll never be rebalanced.

Since my original post, I've spent some more time on this, and discovered that OVS is totally unsuitable for more enterprise-y setups, and MLAG in particular. None of the MLAG implementations I've tried this with expect distinct flows to flap between legs, and this does cause minor issues.

OVS doesn't implement a non-rebalancing (as required by 802.3ad) method of hashing, and I'm sure that's fine for smaller setups. This is mainly only an issue when using MLAG when the switches need to talk between themselves about where to send traffic.

So for the moment, I've given up on it, and gone back to standard linux bonding, which works perfectly and doesn't cause these issues. I'm going to leave this post here because I'm sure other people will go down this path, but not be as persistent as I was to figure out the cause 8)

There is still the OVS bug where it will never listen for LACP PDUs after it has decided an interface isn't connected to a LACP-enabled port, and I started looking through the code to figure out if may_enable: false can ever be unset, and discovered that it's effectively set once, forever, and can never be fixed. That's the point I gave up and walked away. 8-(

Thanks for your input!
 
Which will be exactly the same as active-backup, because all the flows will go out the active interface, and it'll never be rebalanced.

Since my original post, I've spent some more time on this, and discovered that OVS is totally unsuitable for more enterprise-y setups, and MLAG in particular. None of the MLAG implementations I've tried this with expect distinct flows to flap between legs, and this does cause minor issues.

OVS doesn't implement a non-rebalancing (as required by 802.3ad) method of hashing, and I'm sure that's fine for smaller setups. This is mainly only an issue when using MLAG when the switches need to talk between themselves about where to send traffic.

So for the moment, I've given up on it, and gone back to standard linux bonding, which works perfectly and doesn't cause these issues. I'm going to leave this post here because I'm sure other people will go down this path, but not be as persistent as I was to figure out the cause 8)

There is still the OVS bug where it will never listen for LACP PDUs after it has decided an interface isn't connected to a LACP-enabled port, and I started looking through the code to figure out if may_enable: false can ever be unset, and discovered that it's effectively set once, forever, and can never be fixed. That's the point I gave up and walked away. 8-(

Thanks for your input!
Thanks for your detailed infos.
(I don't like and use ovs myself. I'm using lacp linux bond since 15years without any probem.
I'm really suprised that something basic like LACP is not correctly implemented in 2022...

Anyway, Ovs don't have any advantage until use we use dpdk or xfp offloading, but it's not implemented in proxmox anyway ...
 
@xrobau
Problem 2: SOME Traffic randomly pauses inexplicably, and then resumes a few second later.

Set "other-config:bond-rebalance-interval=0" to stop OVS rebalancing.

Source: https://manpages.debian.org/testing...s-vswitchd.conf.db.5.en.html#other_config~178
If zero, load balancing is disabled on the bond (link failure still cause flows to move).

Check with "ovs-appctl bond/show bond0" (no "next rebalance" anymore).

Useful commands to deal with OVS bonds:

Code:
ovs-vsctl show
ovs-appctl bond/show bond0
ovs-appctl lacp/show bond0
ovs-vsctl list port bond0
 
I've done some testing and comparison of bonds using Open vSwitch with bond_mode balance-slb and bond_mode balance-tcp as well as Linux Bond Layer2+3 and Layer3+4. "bond-rebalance-interval" on openvswitch-bonds is set to 0 in order to avoid rebalancing every 10 seconds.

Performance Comparison Linux Bridge vs. Open vSwitch​

Method: Network Performance Test via Link Aggregation Group (LAG) utilizing iperf3​

Test four types of LAG/Bond​

1. Open vSwitch with bond_mode balance-slb (i.e. Layer 2 MAC+VLAN-tag)
Code:
ovs_options lacp=active bond_mode=balance-slb other-config:bond-rebalance-interval=0
Source: https://github.com/openvswitch/ovs/...2e4e6011732515e7ef/vswitchd/vswitch.xml#L2068

2. Open vSwitch with bond_mode balance-tcp (i.e. Layer 2-4 MAC/IP/Port)
Code:
ovs_options lacp=active bond_mode=balance-tcp other-config:bond-rebalance-interval=0
Source: https://github.com/openvswitch/ovs/...2e4e6011732515e7ef/vswitchd/vswitch.xml#L2091

3. Linux Bond Layer2+3
Code:
bond-xmit-hash-policy layer2+3

4. Linux Bond Layer3+4
Code:
bond-xmit-hash-policy layer3+4

All bonds are configured with active and negotiated Link Aggregation Control Protocol (LACP/IEEE 802.3ad)

Environment​

4 nodes each:
  • CPU: 8-Core Intel Xeon E-2288G CPU @ 3.70GHz (1 Socket)
  • RAM: 128GB DDR4-2667
  • NIC: NVIDIA Mellanox ConnectX-5 EN MCX512A-ACAT 25GbE dual-port SFP28
  • OS: Debian GNU/Linux 11 (bullseye) / Proxmox Virtual Environment 7.3-3
Switch: Multi-chassis Link Aggregation Group (MLAG) with 2x HUAWEI CE8851-32CQ8DQ-P 32*100GE with 4x25GbE Breakout-DACs

Bond configuration: 2x 25GbE, dual-port ConnectX-5 NIC with one port per switch (MLAG)

Preliminary​

3 servers: 10.10.10.2,10.10.10.3,10.10.10.4
1 client: 10.10.10.5

Install iperf3 on all nodes:
Code:
apt install iperf3

Server: 10.10.10.2
Code:
iperf3 --server --bind 10.10.114.12 & iperf3 --server --bind 10.10.115.12 &

Server: 10.10.10.3
Code:
iperf3 --server --bind 10.10.114.13 & iperf3 --server --bind 10.10.115.13 &

Server: 10.10.10.4
Code:
iperf3 --server --bind 10.10.114.14 & iperf3 --server --bind 10.10.115.14 &

Test execution​


bond_mode: balance-slb

One client to two servers:
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.114.13 --title 114-13 --time 30

iplink: [ ID] Interval Transfer Bitrate Retr
114-12: [ 5] 0.00-30.00 sec 43.2 GBytes 12.4 Gbits/sec 0 sender
114-13: [ 5] 0.00-30.00 sec 43.2 GBytes 12.4 Gbits/sec 0 sender

One client on two VLANs to three servers:
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.115.12 --title 115-12 --time 30 & iperf3 --client 10.10.114.13 --title 114-13 --time 30 & iperf3 --client 10.10.115.13 --title 115-13 --time 30 & iperf3 --client 10.10.114.14 --title 114-14 --time 30 & iperf3 --client 10.10.115.14 --title 115-14 --time 30 &

iplink: [ ID] Interval Transfer Bitrate Retr
114-13: [ 5] 0.00-30.00 sec 28.8 GBytes 8.23 Gbits/sec 0 sender
115-12: [ 5] 0.00-30.00 sec 28.8 GBytes 8.23 Gbits/sec 0 sender
115-14: [ 5] 0.00-30.00 sec 28.7 GBytes 8.23 Gbits/sec 0 sender
114-14: [ 5] 0.00-30.00 sec 28.8 GBytes 8.23 Gbits/sec 0 sender
114-12: [ 5] 0.00-30.00 sec 28.7 GBytes 8.23 Gbits/sec 0 sender
115-13: [ 5] 0.00-30.00 sec 28.7 GBytes 8.23 Gbits/sec 0 sender

bond_mode: balance-tcp

One client to two servers:
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.114.13 --title 114-13 --time 30

iplink: [ ID] Interval Transfer Bitrate Retr
114-12: [ 5] 0.00-30.00 sec 86.2 GBytes 24.7 Gbits/sec 0 sender
114-13: [ 5] 0.00-30.00 sec 86.2 GBytes 24.7 Gbits/sec 15 sender

One client on two VLANs to three servers:
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.115.12 --title 115-12 --time 30 & iperf3 --client 10.10.114.13 --title 114-13 --time 30 & iperf3 --client 10.10.115.13 --title 115-13 --time 30 & iperf3 --client 10.10.114.14 --title 114-14 --time 30 & iperf3 --client 10.10.115.14 --title 115-14 --time 30 &

iplink: [ ID] Interval Transfer Bitrate Retr
115-14: [ 5] 0.00-30.00 sec 28.8 GBytes 8.25 Gbits/sec 0 sender
114-14: [ 5] 0.00-30.00 sec 28.8 GBytes 8.25 Gbits/sec 0 sender
115-12: [ 5] 0.00-30.00 sec 28.8 GBytes 8.25 Gbits/sec 0 sender
114-13: [ 5] 0.00-30.00 sec 28.8 GBytes 8.25 Gbits/sec 0 sender
114-12: [ 5] 0.00-30.00 sec 28.8 GBytes 8.25 Gbits/sec 0 sender
115-13: [ 5] 0.00-30.00 sec 28.8 GBytes 8.25 Gbits/sec 0 sender


bond_mode: Linux Bond Layer2+3

One client to two servers:
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.114.13 --title 114-13 --time 30

iplink: [ ID] Interval Transfer Bitrate Retr
114-13: [ 5] 0.00-30.00 sec 43.2 GBytes 12.4 Gbits/sec 0 sender
114-12: [ 5] 0.00-30.00 sec 43.2 GBytes 12.4 Gbits/sec 0 sender


One client on two VLANs to three servers:
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.115.12 --title 115-12 --time 30 & iperf3 --client 10.10.114.13 --title 114-13 --time 30 & iperf3 --client 10.10.115.13 --title 115-13 --time 30 & iperf3 --client 10.10.114.14 --title 114-14 --time 30 & iperf3 --client 10.10.115.14 --title 115-14 --time 30 &

iplink: [ ID] Interval Transfer Bitrate Retr
115-12: [ 5] 0.00-30.00 sec 14.4 GBytes 4.12 Gbits/sec 0 sender
114-12: [ 5] 0.00-30.00 sec 14.4 GBytes 4.12 Gbits/sec 0 sender
115-14: [ 5] 0.00-30.00 sec 14.4 GBytes 4.12 Gbits/sec 0 sender
114-13: [ 5] 0.00-30.00 sec 14.4 GBytes 4.12 Gbits/sec 0 sender
114-14: [ 5] 0.00-30.00 sec 14.4 GBytes 4.12 Gbits/sec 0 sender
115-13: [ 5] 0.00-30.00 sec 14.4 GBytes 4.12 Gbits/sec 0 sender


bond_mode: Linux Bond Layer3+4

One client to two servers:
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.114.13 --title 114-13 --time 30

iplink: [ ID] Interval Transfer Bitrate Retr
114-14: [ 5] 0.00-30.00 sec 85.8 GBytes 24.6 Gbits/sec 1568 sender
115-14: [ 5] 0.00-30.00 sec 85.8 GBytes 24.6 Gbits/sec 1394 sender

Retries = CPU bound


One client on two VLANs to three servers:
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.115.12 --title 115-12 --time 30 & iperf3 --client 10.10.114.13 --title 114-13 --time 30 & iperf3 --client 10.10.115.13 --title 115-13 --time 30 & iperf3 --client 10.10.114.14 --title 114-14 --time 30 & iperf3 --client 10.10.115.14 --title 115-14 --time 30 &

iplink: [ ID] Interval Transfer Bitrate Retr
115-13: [ 5] 0.00-30.00 sec 28.8 GBytes 8.25 Gbits/sec 0 sender
114-13: [ 5] 0.00-30.00 sec 28.8 GBytes 8.24 Gbits/sec 0 sender
115-12: [ 5] 0.00-30.00 sec 28.8 GBytes 8.25 Gbits/sec 0 sender
114-14: [ 5] 0.00-30.00 sec 28.8 GBytes 8.25 Gbits/sec 0 sender
114-12: [ 5] 0.00-30.00 sec 28.8 GBytes 8.25 Gbits/sec 0 sender
115-14: [ 5] 0.00-30.00 sec 28.8 GBytes 8.24 Gbits/sec 0 sender

Results​

  • Bond "Open vSwitch with bond_mode balance-tcp (i.e. Layer 2-4 MAC/IP/Port)" leads to best overall performance in the given testing environment.
 
  • Like
Reactions: weehooey-bh
Thanks for doing all of that! The end result is basically 'it doesn't matter which one you use', as they both ended up being roughly wire speed at 25gbit.

The worrying thing (that I suspect you may not have noticed) is there were no retries on OVS. You expect to have a small amount, as TCP scaling kicks in, and congestion drops packets - as it should!

But as there were no retries, it potentially means that OVS is doing additional caching, and adding to the Buffer Bloat issues that are starting to plague the internet. https://en.wikipedia.org/wiki/Bufferbloat (there's many resources on this, that's a handy summary).

With VoIP, we want everything as low latency as possible, so that's something we always need to be aware of. It was an interesting test, and I'm also slightly puzzled why you didn't see the MLAG issues I did. I tried a random pair of fs.com switches after our Nexus 55xx's showed the issue. I now wonder if there was something ELSE that was making me think that the rebalancing was an issue.

If you still have the test lab you built available, could you try balance-tcp with vlan tagged traffic? I'm curious how deep OVS looks into each frame!

For other people who might be following along, and are wondering what is happening - This is just two network nerds having fun playing with enterprise-level network scalability and redundancy!

For an example of Buffer Bloat - this is what you would EXPECT to see. I just ran an iperf3 between two datacenters (10gb fibre link, about 5msec latency):

Connecting to host 10.44.45.10, port 31337 [ 5] local 10.44.45.200 port 49456 connected to 10.44.45.10 port 31337 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 1.12 GBytes 9.63 Gbits/sec 0 4.33 MBytes [ 5] 1.00-2.00 sec 1.13 GBytes 9.67 Gbits/sec 0 4.33 MBytes [ 5] 2.00-3.00 sec 1.15 GBytes 9.90 Gbits/sec 0 4.33 MBytes [ 5] 3.00-4.00 sec 1.15 GBytes 9.89 Gbits/sec 0 4.33 MBytes [ 5] 4.00-5.00 sec 1.15 GBytes 9.90 Gbits/sec 5 2.28 MBytes [ 5] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec 23 1.74 MBytes [ 5] 6.00-7.00 sec 1.15 GBytes 9.89 Gbits/sec 45 1.55 MBytes [ 5] 7.00-8.00 sec 1.15 GBytes 9.89 Gbits/sec 49 1.19 MBytes [ 5] 8.00-9.00 sec 1.15 GBytes 9.89 Gbits/sec 18 1.78 MBytes [ 5] 9.00-10.00 sec 1.15 GBytes 9.89 Gbits/sec 17 1.75 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 11.5 GBytes 9.84 Gbits/sec 157 sender [ 5] 0.00-10.00 sec 11.5 GBytes 9.83 Gbits/sec receiver

What happened there is that it ran at wire speed until everything in the path ran out of (a very small amount!) of buffering, and then started dropping packets. The congestion window decreased in size, and the dropped packets started to decrease, exactly as it should - however, these machines are not good examples because they use 'scalable' congestion control, which helps recover quickly from packet loss, which ALSO means they bounce around a lot faster than the normal cubic.
 
Thanks for the hint about Bufferbloat. At least there where sometimes few retries with OVS as well, but far less compared to Linux Bond. I always thought the less the better. Some examples with retries on OVS are included below. Very high retries occured when the CPUs maxed out on the target servers.

The tests where done using two VLANs. For convenience, I've used /24 networks while the third octet of the IPs indicates the used VLAN.

Interface config:

Code:
auto enp2s0f0np0
iface enp2s0f0np0 inet manual
        mtu 9000


auto enp2s0f1np1
iface enp2s0f1np1 inet manual
        mtu 9000


auto bond0
iface bond0 inet manual
        ovs_bonds enp2s0f0np0 enp2s0f1np1
        ovs_type OVSBond
        ovs_bridge vmbr0
        ovs_mtu 9000
        ovs_options lacp=active bond_mode=balance-slb other-config:bond-rebalance-interval=0


auto vmbr0
iface vmbr0 inet manual
        ovs_type OVSBridge
        ovs_ports bond0 v114 v115
        ovs_mtu 9000


auto v114
iface v114 inet static
        address 10.10.114.{{ node_number }}/24
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_mtu 9000
        ovs_options tag=114


auto v115
iface v115 inet static
        address 10.10.115.{{ node_number }}/24
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_mtu 9000
        ovs_options tag=115

___

Some more testing regarding OVS bond_modes with VLANs and bond-rebalance-interval:​


bond_mode balance-slb (default bond-rebalance-interval, i.e. 10 seconds)

Client on single VLAN to two servers
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.114.13 --title 114-13 --time 30

iplink: [ ID] Interval Transfer Bitrate Retr
114-13: [ 5] 0.00-30.00 sec 43.2 GBytes 12.4 Gbits/sec 0 sender
114-12: [ 5] 0.00-30.00 sec 43.2 GBytes 12.4 Gbits/sec 0 sender

As expected only a single link is used, because source.MAC + source.VLAN is the same for both connections. This is where bond_mond balance-tcp has a benefit (see test below at section bond_mode balance-tcp).

Client on two VLANs to single server
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.115.12 --title 115-12 --time 30

iplink: [ ID] Interval Transfer Bitrate Retr
114-12: [ 5] 0.00-30.00 sec 72.9 GBytes 20.9 Gbits/sec 1837 sender
115-12: [ 5] 0.00-30.00 sec 73.2 GBytes 20.9 Gbits/sec 1549 sender

Sometimes, the connections don't get distributed evenly right away. This was the case with this test. At around second 8 you can see the rebalancing kicking in and doing its job. As expected, the connections are distributed on both links because the source VLAN differs. The CPU on the server maxed out after both links were used (retries up). Although switching connections in the middle of transfer from one link to the other seems to me a bit like screaming for trouble.

Connections got rebalanced after approx. 8 seconds:
114-12: [ ID] Interval Transfer Bitrate Retr Cwnd
114-12: [ 5] 0.00-1.00 sec 1.43 GBytes 12.3 Gbits/sec 0 3.12 MBytes
115-12: [ 5] 0.00-1.00 sec 1.43 GBytes 12.3 Gbits/sec 0 3.05 MBytes
115-12: [ 5] 1.00-2.00 sec 1.43 GBytes 12.3 Gbits/sec 0 3.05 MBytes
114-12: [ 5] 1.00-2.00 sec 1.44 GBytes 12.4 Gbits/sec 0 3.12 MBytes
115-12: [ 5] 2.00-3.00 sec 1.44 GBytes 12.3 Gbits/sec 0 3.05 MBytes
114-12: [ 5] 2.00-3.00 sec 1.44 GBytes 12.3 Gbits/sec 0 3.12 MBytes
114-12: [ 5] 3.00-4.00 sec 1.44 GBytes 12.3 Gbits/sec 0 3.12 MBytes
115-12: [ 5] 3.00-4.00 sec 1.44 GBytes 12.4 Gbits/sec 0 3.05 MBytes
115-12: [ 5] 4.00-5.00 sec 1.44 GBytes 12.4 Gbits/sec 0 3.05 MBytes
114-12: [ 5] 4.00-5.00 sec 1.44 GBytes 12.3 Gbits/sec 0 3.12 MBytes
114-12: [ 5] 5.00-6.00 sec 1.43 GBytes 12.3 Gbits/sec 0 3.12 MBytes
115-12: [ 5] 5.00-6.00 sec 1.43 GBytes 12.3 Gbits/sec 0 3.05 MBytes
114-12: [ 5] 6.00-7.00 sec 1.44 GBytes 12.4 Gbits/sec 0 3.12 MBytes
115-12: [ 5] 6.00-7.00 sec 1.44 GBytes 12.3 Gbits/sec 0 3.05 MBytes
114-12: [ 5] 7.00-8.00 sec 1.44 GBytes 12.3 Gbits/sec 0 3.12 MBytes
115-12: [ 5] 7.00-8.00 sec 1.44 GBytes 12.3 Gbits/sec 0 3.05 MBytes
114-12: [ 5] 8.00-9.00 sec 1.69 GBytes 14.5 Gbits/sec 0 3.12 MBytes
115-12: [ 5] 8.00-9.00 sec 1.69 GBytes 14.5 Gbits/sec 148 3.05 MBytes
114-12: [ 5] 9.00-10.00 sec 2.86 GBytes 24.6 Gbits/sec 41 2.17 MBytes
115-12: [ 5] 9.00-10.00 sec 2.87 GBytes 24.7 Gbits/sec 8 2.10 MBytes
115-12: [ 5] 10.00-11.00 sec 2.88 GBytes 24.7 Gbits/sec 15 3.17 MBytes
114-12: [ 5] 10.00-11.00 sec 2.87 GBytes 24.6 Gbits/sec 25 2.17 MBytes
115-12: [ 5] 11.00-12.00 sec 2.87 GBytes 24.6 Gbits/sec 81 2.28 MBytes
114-12: [ 5] 11.00-12.00 sec 2.84 GBytes 24.4 Gbits/sec 77 2.59 MBytes
115-12: [ 5] 12.00-13.00 sec 2.88 GBytes 24.7 Gbits/sec 2 2.52 MBytes
114-12: [ 5] 12.00-13.00 sec 2.87 GBytes 24.7 Gbits/sec 34 3.07 MBytes
115-12: [ 5] 13.00-14.00 sec 2.87 GBytes 24.6 Gbits/sec 39 2.53 MBytes
114-12: [ 5] 13.00-14.00 sec 2.84 GBytes 24.4 Gbits/sec 104 2.53 MBytes
115-12: [ 5] 14.00-15.00 sec 2.88 GBytes 24.7 Gbits/sec 14 2.18 MBytes
114-12: [ 5] 14.00-15.00 sec 2.85 GBytes 24.5 Gbits/sec 31 3.10 MBytes
114-12: [ 5] 15.00-16.00 sec 2.87 GBytes 24.6 Gbits/sec 23 2.65 MBytes
115-12: [ 5] 15.00-16.00 sec 2.88 GBytes 24.7 Gbits/sec 0 2.68 MBytes
114-12: [ 5] 16.00-17.00 sec 2.82 GBytes 24.2 Gbits/sec 237 1.48 MBytes
115-12: [ 5] 16.00-17.00 sec 2.83 GBytes 24.3 Gbits/sec 181 944 KBytes
115-12: [ 5] 17.00-18.00 sec 2.77 GBytes 23.8 Gbits/sec 144 2.67 MBytes
114-12: [ 5] 17.00-18.00 sec 2.82 GBytes 24.2 Gbits/sec 180 2.59 MBytes
114-12: [ 5] 18.00-19.00 sec 2.86 GBytes 24.6 Gbits/sec 66 2.58 MBytes
115-12: [ 5] 18.00-19.00 sec 2.86 GBytes 24.6 Gbits/sec 62 2.43 MBytes
114-12: [ 5] 19.00-20.00 sec 2.86 GBytes 24.6 Gbits/sec 128 2.83 MBytes
115-12: [ 5] 19.00-20.00 sec 2.86 GBytes 24.6 Gbits/sec 59 2.30 MBytes
114-12: [ 5] 20.00-21.00 sec 2.84 GBytes 24.4 Gbits/sec 84 2.34 MBytes
115-12: [ 5] 20.00-21.00 sec 2.86 GBytes 24.6 Gbits/sec 82 2.19 MBytes
114-12: [ 5] 21.00-22.00 sec 2.84 GBytes 24.4 Gbits/sec 63 2.15 MBytes
115-12: [ 5] 21.00-22.00 sec 2.86 GBytes 24.5 Gbits/sec 77 2.65 MBytes
114-12: [ 5] 22.00-23.00 sec 2.82 GBytes 24.2 Gbits/sec 154 1.55 MBytes
115-12: [ 5] 22.00-23.00 sec 2.81 GBytes 24.1 Gbits/sec 112 1.32 MBytes
115-12: [ 5] 23.00-24.00 sec 2.82 GBytes 24.2 Gbits/sec 112 1.99 MBytes
114-12: [ 5] 23.00-24.00 sec 2.81 GBytes 24.1 Gbits/sec 175 2.34 MBytes
114-12: [ 5] 24.00-25.00 sec 2.86 GBytes 24.6 Gbits/sec 62 2.09 MBytes
115-12: [ 5] 24.00-25.00 sec 2.88 GBytes 24.7 Gbits/sec 20 2.89 MBytes
115-12: [ 5] 25.00-26.00 sec 2.87 GBytes 24.7 Gbits/sec 46 2.70 MBytes
114-12: [ 5] 25.00-26.00 sec 2.85 GBytes 24.5 Gbits/sec 82 2.63 MBytes
115-12: [ 5] 26.00-27.00 sec 2.85 GBytes 24.5 Gbits/sec 63 2.62 MBytes
114-12: [ 5] 26.00-27.00 sec 2.82 GBytes 24.2 Gbits/sec 69 2.41 MBytes
115-12: [ 5] 27.00-28.00 sec 2.84 GBytes 24.4 Gbits/sec 106 952 KBytes
114-12: [ 5] 27.00-28.00 sec 2.84 GBytes 24.4 Gbits/sec 144 848 KBytes
115-12: [ 5] 28.00-29.00 sec 2.87 GBytes 24.6 Gbits/sec 49 2.22 MBytes
114-12: [ 5] 28.00-29.00 sec 2.86 GBytes 24.5 Gbits/sec 13 2.97 MBytes
114-12: [ 5] 29.00-30.00 sec 2.85 GBytes 24.5 Gbits/sec 45 2.76 MBytes
115-12: [ 5] 29.00-30.00 sec 2.87 GBytes 24.7 Gbits/sec 129 2.10 MBytes
114-12: - - - - - - - - - - - - - - - - - - - - - - - - -
115-12: - - - - - - - - - - - - - - - - - - - - - - - - -
114-12: [ ID] Interval Transfer Bitrate Retr
115-12: [ ID] Interval Transfer Bitrate Retr
114-12: [ 5] 0.00-30.00 sec 72.9 GBytes 20.9 Gbits/sec 1837 sender
115-12: [ 5] 0.00-30.00 sec 73.2 GBytes 20.9 Gbits/sec 1549 sender
114-12: [ 5] 0.00-30.04 sec 72.9 GBytes 20.8 Gbits/sec receiver
115-12: [ 5] 0.00-30.04 sec 73.1 GBytes 20.9 Gbits/sec receiver


Client on two VLANs to two servers
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.115.13 --title 115-13 --time 30

iplink: [ ID] Interval Transfer Bitrate Retr
115-13: [ 5] 0.00-30.00 sec 86.2 GBytes 24.7 Gbits/sec 117 sender
114-12: [ 5] 0.00-30.00 sec 85.9 GBytes 24.6 Gbits/sec 0 sender

As exptected connections got distributed on both links. Way less retries compared to "Client on two VLANs to single server" scenario because two servers are used.


Client on two VLANs to three servers
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.115.13 --title 115-13 --time 30 & iperf3 --client 10.10.115.14 --title 115-14 --time 30

iplink: [ ID] Interval Transfer Bitrate Retr
115-13: [ 5] 0.00-30.00 sec 43.1 GBytes 12.3 Gbits/sec 0 sender
114-12: [ 5] 0.00-30.00 sec 85.7 GBytes 24.6 Gbits/sec 0 sender
115-14: [ 5] 0.00-30.00 sec 43.1 GBytes 12.3 Gbits/sec 0 sender

As expected, one connection has its own link, the other two connections share the other link.

___
When setting bond-rebalance-interval=0 on bond_mode balance-slb, the connections never got rebalanced during transfer. Unfortunately, they ended up on just a single link most of the time (see tests below, total throughput always adds up to 25 Gbits/sec). If bond_mode balance-slb would use source.MAC + source.VLAN as hash input I'd expect the traffic to get distributed on both links. Somehow only one link is used. Maybe, because both got started at pretty much the same time? In this case, rebalancing might be beneficial.

balance-slb bond-rebalance-interval=0

Client on single VLAN to two servers
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.114.13 --title 114-13 --time 30

iplink: [ ID] Interval Transfer Bitrate Retr
114-13: [ 5] 0.00-30.00 sec 43.3 GBytes 12.4 Gbits/sec 0 sender
114-12: [ 5] 0.00-30.00 sec 43.0 GBytes 12.3 Gbits/sec 0 sender


Client on two VLANs to single server
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.115.12 --title 115-12 --time 30

iplink: [ ID] Interval Transfer Bitrate Retr
115-12: [ 5] 0.00-30.00 sec 43.2 GBytes 12.4 Gbits/sec 0 sender
114-12: [ 5] 0.00-30.00 sec 43.2 GBytes 12.4 Gbits/sec 0 sender


Client on two VLANs to two servers
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.115.13 --title 115-13 --time 30

iplink: [ ID] Interval Transfer Bitrate Retr
115-13: [ 5] 0.00-30.00 sec 43.2 GBytes 12.4 Gbits/sec 0 sender
114-12: [ 5] 0.00-30.00 sec 43.2 GBytes 12.4 Gbits/sec 0 sender

Client on two VLANs to three servers
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.115.13 --title 115-13 --time 30 & iperf3 --client 10.10.115.14 --title 115-14 --time 30

iplink: [ ID] Interval Transfer Bitrate Retr
115-13: [ 5] 0.00-30.00 sec 28.8 GBytes 8.24 Gbits/sec 0 sender
115-14: [ 5] 0.00-30.00 sec 28.8 GBytes 8.24 Gbits/sec 0 sender
114-12: [ 5] 0.00-30.00 sec 28.8 GBytes 8.24 Gbits/sec 0 sender

___
For comparison, bond_mode balance-tcp distributes the connections quite evenly to the links. Although there where test cases when the distribution was not as evenly as shown below, I'd rather leave bond-rebalance-interval=0 in order to avoid connections switching the link. The risk reward ratio seems to be insufficient, at least in our environment and use case (R&D Proxmox Cluster).

bond_mode balance-tcp bond-rebalance-interval=0

Client on single VLAN to two servers
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.114.13 --title 114-13 --time 30

iplink: [ ID] Interval Transfer Bitrate Retr
114-13: [ 5] 0.00-30.00 sec 86.3 GBytes 24.7 Gbits/sec 7 sender
114-12: [ 5] 0.00-30.00 sec 86.3 GBytes 24.7 Gbits/sec 0 sender

Client on two VLANs to single server
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.115.12 --title 115-12 --time 30

iplink: [ ID] Interval Transfer Bitrate Retr
114-12: [ 5] 0.00-30.00 sec 86.0 GBytes 24.6 Gbits/sec 1116 sender
115-12: [ 5] 0.00-30.00 sec 86.0 GBytes 24.6 Gbits/sec 1334 sender

Note: CPU maxed out on server


Client on two VLANs to two servers
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.115.13 --title 115-13 --time 30

iplink: [ ID] Interval Transfer Bitrate Retr
114-12: [ 5] 0.00-30.00 sec 86.1 GBytes 24.6 Gbits/sec 0 sender
115-13: [ 5] 0.00-30.00 sec 86.1 GBytes 24.7 Gbits/sec 11 sender


Client on two VLANs to three servers
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.115.13 --title 115-13 --time 30 & iperf3 --client 10.10.115.14 --title 115-14 --time 30

iplink: [ ID] Interval Transfer Bitrate Retr
114-12: [ 5] 0.00-30.00 sec 43.1 GBytes 12.4 Gbits/sec 0 sender
115-13: [ 5] 0.00-30.00 sec 43.0 GBytes 12.3 Gbits/sec 0 sender
115-14: [ 5] 0.00-30.00 sec 86.0 GBytes 24.6 Gbits/sec 0 sender


Client on two VLANs each to three servers
Code:
iperf3 --client 10.10.114.12 --title 114-12 --time 30 & iperf3 --client 10.10.115.12 --title 115-12 --time 30 & iperf3 --client 10.10.114.13 --title 114-13 --time 30 & iperf3 --client 10.10.115.13 --title 115-13 --time 30 & iperf3 --client 10.10.114.14 --title 114-14 --time 30 & iperf3 --client 10.10.115.14 --title 115-14 --time 30

iplink: [ ID] Interval Transfer Bitrate Retr
115-12: [ 5] 0.00-30.00 sec 28.8 GBytes 8.24 Gbits/sec 8 sender
114-12: [ 5] 0.00-30.00 sec 28.8 GBytes 8.24 Gbits/sec 0 sender
114-14: [ 5] 0.00-30.00 sec 28.8 GBytes 8.25 Gbits/sec 18 sender
115-14: [ 5] 0.00-30.00 sec 28.8 GBytes 8.25 Gbits/sec 0 sender
114-13: [ 5] 0.00-30.00 sec 28.8 GBytes 8.25 Gbits/sec 0 sender
115-13: [ 5] 0.00-30.00 sec 28.8 GBytes 8.24 Gbits/sec 8 sender

I'll leave it there for now.
 
  • Like
Reactions: weehooey-bh

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!