Bond Performance

vitor costa · Aug 26, 2013

jbennet said:
That is normal behaviour. LACP is designed to _balance_ bandwidth between the links, but the maximum throughput between two clients is 1Gb/s.

(What is a "client" depend on the method used to compute which link should be used in function of the layer 2/3/4 adresses ;
e.g. with layer 2 you always use the same link between two ethernet cards).

I changede to balance-rr mode. same efect. maybe add bond-xmit-hash-policy layer3+4 do the trick ?

jbennet · Aug 29, 2013

maybe add bond-xmit-hash-policy layer3+4 do the trick ?

It will balance TCP connections (hash with IP addresses and port numbers), so iperf -P <N> where N is >= 2 will show better performance, hence the sample below, with two 1Gb/s ethernet bondings on an HP switch with LACP :

Code:

root@proxmox:~# iperf -c nas -P 4
------------------------------------------------------------
Client connecting to nas, TCP port 5001
TCP window size: 23.8 KByte (default)
------------------------------------------------------------
[  4] local 172.16.2.10 port 36657 connected with 172.16.2.100 port 5001
[  6] local 172.16.2.10 port 36655 connected with 172.16.2.100 port 5001
[  5] local 172.16.2.10 port 36654 connected with 172.16.2.100 port 5001
[  3] local 172.16.2.10 port 36656 connected with 172.16.2.100 port 5001
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec   547 MBytes   459 Mbits/sec
[  6]  0.0-10.0 sec   576 MBytes   483 Mbits/sec
[  5]  0.0-10.0 sec   536 MBytes   449 Mbits/sec
[  3]  0.0-10.0 sec   587 MBytes   492 Mbits/sec
[SUM]  0.0-10.0 sec  2.19 GBytes  [B]1.88 Gbits/sec[/B]

Code:

root@proxmox:~# iperf -c nas -P 1
------------------------------------------------------------
Client connecting to nas, TCP port 5001
TCP window size: 23.8 KByte (default)
------------------------------------------------------------
[  3] local 172.16.2.10 port 36658 connected with 172.16.2.100 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.10 GBytes   [B]944 Mbits/sec[/B]

vitor costa · Aug 31, 2013

I keep working on this question. Keep using switch and puting a node with 802.3ad and another with balance-rr i achieve 2.8 mbps when runing iperf -c -P 1 from node with balance-rr but runing iperf -c -P 1 from node with 802.3 ad, speed keep on 990 mbps. If both nodes with 802.3ad or with balance-rr speed iperf -c -P 1 with 990 mbps. Very strange result...

jbennet · Sep 1, 2013

I suppose that with the balance-rr to 802.3 ad scenario, the switch computes (with its internal algorithm) a different hash between the two input ports (even if the MAC address of the input bond shall be the same), as a consequence, the output port is different, so you have double bandwidth (maybe it is pure luck or it depends on the switch).

With both balance-rr, there is no particular "switch assistance" so the switch relies on its default behaviour : output traffic to the port where belongs the destination MAC address ; as a consequence, there is only one output port, so 1Gb/s.

To sum-up :
* with direct cables and balance-rr you should have double bandwidth in both directions.
* with LACP/802.3 ad and a switch you should have double bandwidth for multiple clients

vitor costa · Sep 2, 2013

I solved all question using balance-rr on both sides and creating a vlan with one network port from each server. Its same a back-to-back cross over cable, but using the switch. For now i using 2 network ports to each server to drbd conection, the #3 port i used to external network comunication. Maybe i put another 2 ports network card and do a 4x conection.

screenie · Sep 3, 2013

More throughput than 1GBit/s is only possible with balance-rr;
If you need that only for drbd-sync just connect the two servers directly together without switch in between, and cross cables are normally not needed as modern Nic's can do MDI/MDI-X;

4 Nic's does not give you 4x 1GBit/s - each link increases the issue of buffering and TCP re-ordering on the receiver side which slows down the throughput and can cause re-transmission;

1 nic gives you 1GBit/s
2 nic's give you about 1.6-1.8GBit/s
3 nic's give you about 2.4GBit/s;
4 nic's give you about 2.2-2.6GBit/s

Numbers are variable based on Nic vendor and chipset used;

So, balance-rr with more than 3 links make no sense in my opinion - i use it only with 2 Nic's between two hosts directly;
To the network side i only use active-backup mode to two switches or stack because it should be as much reliable and also does not need special switch configuration;

If you have a storage device where multiple servers need to read/write to, multiple Nic's with 803.2ad, XOR, balance-alb or balance-tlb makes sense as each server can use one of the links to transmit the data (optimal case);

But be aware if there is a router in between the ip header of the traffic from all servers have the same mac-address, so mode based on mac-address only is not enough;

And check if your switch is able to server all links within a LACP Trunk - as example: with HP 2510G you can configure that, but it will never utilize more than 1GBit/s through the LACP Trunk.....
Cisco 2960G series, 3750G or Nexus Series work like a charm - also Alcatel 6400,6800 series it's working

Alex

tarax · Oct 21, 2013

Hi,

Very interesting thread !

screenie said:
More throughput than 1GBit/s is only possible with balance-rr;

Point cleared, thanks !

screenie said:
4 Nic's does not give you 4x 1GBit/s - each link increases the issue of buffering and TCP re-ordering on the receiver side which slows down the throughput and can cause re-transmission;

1 nic gives you 1GBit/s
2 nic's give you about 1.6-1.8GBit/s
3 nic's give you about 2.4GBit/s;
4 nic's give you about 2.2-2.6GBit/s

So in any configuration with storage subsystems providing more than 300MB/s, 10GbEth link is _mandatory_ for DRBD if you don't want the network to be the bottleneck !?

Now I'm thinking of distributing my DRBD links on several [2|3]xGb bonds ? If no single LAG will be able to par the arrays throughput, at least global network throughput should be maxed... hmm, but benefits really depend on my use case... sorry, thinking out loud !

Thanks in advance for your insights
Bests

screenie · Oct 21, 2013

tarax said:
So in any configuration with storage subsystems providing more than 300MB/s, 10GbEth link is _mandatory_ for DRBD if you don't want the network to be the bottleneck !?

Now I'm thinking of distributing my DRBD links on several [2|3]xGb bonds ? If no single LAG will be able to par the arrays throughput, at least global network throughput should be maxed... hmm, but benefits really depend on my use case... sorry, thinking out loud !
Bests

Yes, i use two node drbd nodes in active/active mode and 2x 10GBit in bonding mode 1 (active/backup) and the 1 GBit interfaces for management and bridges (also bonding mode 1);

If you wonder why not bonding mode 0 (balance-rr) - because i want to make sure highest availability at minimum risk;
In bonding mode 0 you need to (you should) connect the links to separate switches, if the uplink of one of your switches is not working you have an server outage;

Explanation:
Bonding mode 0 puts the packets like round robin on the available wires, if the uplink of one of your switches fail's, the link of the nic connected to this switch stays up and the bonding driver continues putting packets on that wire;
The result is on two nic's 50% packet loss, with three nic's 33% packet loss, and so on... because there is no control mechanism;
And this amount of packet loss makes all your hosted services unavailable;

And for more drbd throughput performance you can offload the bitmap reads/writes to a separate device - a small ssd would be a good choice that;

Search

Search

Bond Performance

vitor costa

Renowned Member

jbennet

Renowned Member

vitor costa

Renowned Member

jbennet

Renowned Member

vitor costa

Renowned Member

screenie

Active Member

tarax

Member

screenie

Active Member

We value your privacy