10 GB Ethernet Issue / Problem / Question

dthompson · Jan 27, 2012

Hi all,

I have a 2 server nodes and a NAS (QNAP TS-879). Both server(s) and NAS each have a 1 X Intel Ethernet Server Adapter X520-T2 10GB Ethernet Card. (3 Total)

RAID Array on NAS is 5X500GB 7200 RPM Drives in a RAID5 Configuration

I have updated proxmox to the latest version of the 1.9 currently available. When I issue a lspci command I see the following output from the server(s):

04:00.0 Ethernet controller: Intel Corporation 82599EB 10 Gigabit TN Network Connection (rev 01)
04:00.1 Ethernet controller: Intel Corporation 82599EB 10 Gigabit TN Network Connection (rev 01)

So it is definitely seeing the internal PCI-e 10GB network cards since it sees the onboard controller.

PVE Version is currently at:
pveversion
pve-manager/1.9/6567

uname -a shows me the following:Linux node1 2.6.32-6-pve #1 SMP Mon Jan 23 08:27:52 CET 2012 x86_64 GNU/Linux

I should also note, that I have moved my OpenVZ /var/lib/vz over to my iSCSI volume for storage and have moved the built-in storage to /var/lib/vzInt, so all VM's are now running against the ISCSI raids. This is based on this how to located here: http://pve.proxmox.com/wiki/OpenVZ_on_ISCSI_howto

So as far as I can tell its all at the latest. After installing ethtool, I get the following on both network ports on both cards:

The configuration is as follows currently:

1.) 2 On-board nics are connected at 1.0GB's, but soon I will be changing this to a 802.ad to coincide with my existing switch since it has the capability to do so, thanks to Dimitri Alexandris suggestion in my post to the list.

2.) The cross connection between node1 and node2 are on the eth3 port on the 10GB's ethernet port direct connected (I have no 10GB Switch at the moment)

When I issue the following I get: ethtool eth3 | grep Speed Speed: 10000Mb/s

So it seems to me that the server recognizes the card and it is negotiating at 10000 GB's which is 10X faster than the built-in Nics -- GOOD

3.) Each Port on the servers 10GB Card, in this case eth2 are direct connected to the 1'st port for node1 and the 2nd port of node2 with the following IP addresses:

node1: 10.0.0.10
NAS Server port 1: 10.0.0.1

node 1: 10.0.1.11
NAS Server port 2: 10.0.1.2

There is no VLAN between the 2 of them, just directly connected.

If I issue the following on both, I get the following:
ethtool eth2 | grep Speed
Speed: 10000Mb/s

OK great, so this is also talking from the iSCSI NAS Host at 10000MB's on both. Good. So theoretically, I have a system that has a back end private network that is communicating over 10GB's

4.) The cluster is setup as follows on the server(s)
Node1:
auto vmbr0
iface vmbr0 inet static
address 10.0.2.10
netmask 255.255.255.0
network 10.0.2.0
broadcast 10.0.2.255
bridge_ports eth3
bridge_stp off
bridge_fd 0

Node 2:
auto vmbr0
iface vmbr0 inet static
address 10.0.2.11
netmask 255.255.255.0
network 10.0.2.0
broadcast 10.0.2.255
bridge_ports eth3
bridge_stp off
bridge_fd 0

The cluster is built on the private direct connection between the both of them (vmbr0).

So the Issue:
The problem however, is that the current version of the ixgbe driver, which is currently at version 3.7.17 has some fairly bad performance in it compared to when these servers were running Ubuntu 10.04LTS Server (3.6.7), which I compiled by hand since the OS didn't have native support for it.

Copying data from 1 node to another, say, a large windows KVM Image (27GB) is only copying at around 74MB's when I run an scp on it to test:
Private network copy (10GB's):

scp -r 112/ root@10.0.2.11:/var/lib/vz/dump/
vm-112-disk-1.raw 100% 27GB 53.2MB/s 08:39

Public network copy (1GB's)
scp -r 112/ root@172.16.10.11:/var/lib/vz/dump
vm-112-disk-1.raw 100% 27GB 53.1MB/s 08:40

So as you can see above, not much difference in speed between the copy speeds over 10GB ethernet or over 1GB Ethernet. I don't think thats write, but perhaps its a limitation of the utils like rsync, ssh, scp, etc

When I run a dd test on it, I get the following:
/var/lib/vz/dump# dd if=/dev/zero of=output.img bs=8k count=256k
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 3.43304 s, 626 MB/s

So thats pretty good, but when we compare this to my Ubuntu Server that was previously running on this node, here are the results:

/node2Data# dd if=/dev/zero of=output.img bs=8k count=256k
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 1.84195 s, 1.2 GB/s

However, running this same command on the local internal device, which is a single 500GB SATA 2.5 7200 RPM drive, yields the same results as that of the 10GB dd above:

/var/lib/vzInt# dd if=/dev/zero of=output.img bs=8k count=256k
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 3.51932 s, 610 MB/s

The Ubuntu Server seems to be twice as fast when writing to the previous drive' iSCSI export.

Anyone have any suggestion on how to make my server's communication between the 2 nodes run faster when copying data between them? I would like to be able to keep my downtime to a minimum when migrating, backing up, and maintaining my VM's (openVZ and KVM) between the nodes, and as this grows and I introduce a 10GB Ethernet switch, between more nodes.

I don't have jumbo frames enabled on either of the ports on the servers, or on the NAS at this time since I don't think thats the issue since it wasn't needed before when it was running under Ubuntu 10.04 Server. Is this a driver bug or a parameter issue?

Thanks for your help if any that you can give me. If you need more information, please let me know and I would be more than happy to provide it for you.

Sorry for the novel

tarball · Jan 28, 2012

dthompson said:
Hi all,

I have a 2 server nodes and a NAS (QNAP TS-879). Both server(s) and NAS each have a 1 X Intel Ethernet Server Adapter X520-T2 10GB Ethernet Card. (3 Total)

RAID Array on NAS is 5X500GB 7200 RPM Drives in a RAID5 Configuration

I have updated proxmox to the latest version of the 1.9 currently available. When I issue a lspci command I see the following output from the server(s):

04:00.0 Ethernet controller: Intel Corporation 82599EB 10 Gigabit TN Network Connection (rev 01)
04:00.1 Ethernet controller: Intel Corporation 82599EB 10 Gigabit TN Network Connection (rev 01)

So it is definitely seeing the internal PCI-e 10GB network cards since it sees the onboard controller.

PVE Version is currently at:
pveversion
pve-manager/1.9/6567

uname -a shows me the following:Linux node1 2.6.32-6-pve #1 SMP Mon Jan 23 08:27:52 CET 2012 x86_64 GNU/Linux

I should also note, that I have moved my OpenVZ /var/lib/vz over to my iSCSI volume for storage and have moved the built-in storage to /var/lib/vzInt, so all VM's are now running against the ISCSI raids. This is based on this how to located here: http://pve.proxmox.com/wiki/OpenVZ_on_ISCSI_howto

So as far as I can tell its all at the latest. After installing ethtool, I get the following on both network ports on both cards:

The configuration is as follows currently:

1.) 2 On-board nics are connected at 1.0GB's, but soon I will be changing this to a 802.ad to coincide with my existing switch since it has the capability to do so, thanks to Dimitri Alexandris suggestion in my post to the list.

2.) The cross connection between node1 and node2 are on the eth3 port on the 10GB's ethernet port direct connected (I have no 10GB Switch at the moment)

When I issue the following I get: ethtool eth3 | grep Speed Speed: 10000Mb/s

So it seems to me that the server recognizes the card and it is negotiating at 10000 GB's which is 10X faster than the built-in Nics -- GOOD

3.) Each Port on the servers 10GB Card, in this case eth2 are direct connected to the 1'st port for node1 and the 2nd port of node2 with the following IP addresses:

node1: 10.0.0.10
NAS Server port 1: 10.0.0.1

node 1: 10.0.1.11
NAS Server port 2: 10.0.1.2

There is no VLAN between the 2 of them, just directly connected.

If I issue the following on both, I get the following:
ethtool eth2 | grep Speed
Speed: 10000Mb/s

OK great, so this is also talking from the iSCSI NAS Host at 10000MB's on both. Good. So theoretically, I have a system that has a back end private network that is communicating over 10GB's

4.) The cluster is setup as follows on the server(s)
Node1:
auto vmbr0
iface vmbr0 inet static
address 10.0.2.10
netmask 255.255.255.0
network 10.0.2.0
broadcast 10.0.2.255
bridge_ports eth3
bridge_stp off
bridge_fd 0

Node 2:
auto vmbr0
iface vmbr0 inet static
address 10.0.2.11
netmask 255.255.255.0
network 10.0.2.0
broadcast 10.0.2.255
bridge_ports eth3
bridge_stp off
bridge_fd 0

The cluster is built on the private direct connection between the both of them (vmbr0).

So the Issue:
The problem however, is that the current version of the ixgbe driver, which is currently at version 3.7.17 has some fairly bad performance in it compared to when these servers were running Ubuntu 10.04LTS Server (3.6.7), which I compiled by hand since the OS didn't have native support for it.

Copying data from 1 node to another, say, a large windows KVM Image (27GB) is only copying at around 74MB's when I run an scp on it to test:
Private network copy (10GB's):

scp -r 112/ root@10.0.2.11:/var/lib/vz/dump/
vm-112-disk-1.raw 100% 27GB 53.2MB/s 08:39

Public network copy (1GB's)
scp -r 112/ root@172.16.10.11:/var/lib/vz/dump
vm-112-disk-1.raw 100% 27GB 53.1MB/s 08:40

So as you can see above, not much difference in speed between the copy speeds over 10GB ethernet or over 1GB Ethernet. I don't think thats write, but perhaps its a limitation of the utils like rsync, ssh, scp, etc

When I run a dd test on it, I get the following:
/var/lib/vz/dump# dd if=/dev/zero of=output.img bs=8k count=256k
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 3.43304 s, 626 MB/s

So thats pretty good, but when we compare this to my Ubuntu Server that was previously running on this node, here are the results:

/node2Data# dd if=/dev/zero of=output.img bs=8k count=256k
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 1.84195 s, 1.2 GB/s

However, running this same command on the local internal device, which is a single 500GB SATA 2.5 7200 RPM drive, yields the same results as that of the 10GB dd above:

/var/lib/vzInt# dd if=/dev/zero of=output.img bs=8k count=256k
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 3.51932 s, 610 MB/s

The Ubuntu Server seems to be twice as fast when writing to the previous drive' iSCSI export.

Anyone have any suggestion on how to make my server's communication between the 2 nodes run faster when copying data between them? I would like to be able to keep my downtime to a minimum when migrating, backing up, and maintaining my VM's (openVZ and KVM) between the nodes, and as this grows and I introduce a 10GB Ethernet switch, between more nodes.

I don't have jumbo frames enabled on either of the ports on the servers, or on the NAS at this time since I don't think thats the issue since it wasn't needed before when it was running under Ubuntu 10.04 Server. Is this a driver bug or a parameter issue?

Thanks for your help if any that you can give me. If you need more information, please let me know and I would be more than happy to provide it for you.

Sorry for the novel

You're using SCP (with encryption): isn't one of your cores maxed out when you scp over the network ?
Can you you try ftp ?

udo · Jan 28, 2012

dthompson said:
Hi all,

...
So the Issue:
The problem however, is that the current version of the ixgbe driver, which is currently at version 3.7.17 has some fairly bad performance in it compared to when these servers were running Ubuntu 10.04LTS Server (3.6.7), which I compiled by hand since the OS didn't have native support for it.

...

Hi,
to compare the speed use iperf please (apt-get install iperf).

I have also two nodes connected directly via 10GB-E (but Solarflare NICs), Performance:

Code:

iperf -c 172.20.x.xx
------------------------------------------------------------
Client connecting to 172.20.x.xx, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 172.20.x.xx port 58209 connected with 172.20.x.xx port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  6.81 GBytes  5.85 Gbits/sec

If your values bad, perhaps you can try different things with ethtool like "ethtool -K eth3 tso off".

Udo

dthompson · Jan 28, 2012

You could be right in the the core is maxed out. I will try again if I get a chance over the weekend. I would rather not install and ftp server on the box if I can avoid it, but I suppose to test its not a big problem. I still don't see it being a problem with scp as even the DD over the 10GB network is the same speed as the 1GB network and its more than 1/2 slower when compared the the 10.04 Ubuntu install I had.

e100 · Jan 28, 2012

dthompson said:
Copying data from 1 node to another, say, a large windows KVM Image (27GB) is only copying at around 74MB's when I run an scp on it to test:
Private network copy (10GB's):

scp -r 112/ root@10.0.2.11:/var/lib/vz/dump/
vm-112-disk-1.raw 100% 27GB 53.2MB/s 08:39

Public network copy (1GB's)
scp -r 112/ root@172.16.10.11:/var/lib/vz/dump
vm-112-disk-1.raw 100% 27GB 53.1MB/s 08:40

SSH Encryption is the issue here.
Use cipher arcfour and you will likely get around 200-250MB/sec depending on the speed of your cpu.
Some CPUs support AES-NI which can accelerate aes ciphers, but getting AES-NI working in Proxmox requires using libs from Ubuntu and I do not like the idea of mixing such critical system libraries. The max I got with my hacked Proxmox/Ubuntu frankenstein using aes128-cbc cipher was 250MB/sec on a Xeon 3680.

You can change the ciphers that Proxmox uses for live migration by manually editing the files in /usr/share/perl5/PVE, change blowfish in the code to arcfour.
I've mostly been looking into this issue with Proxmox 2.0 so I do not know the exact files you need to edit for 1.9.
A reboot may be necessary after editing the code for the changes to take effect.

dthompson · Jan 28, 2012

e100 said:
SSH Encryption is the issue here.
Use cipher arcfour and you will likely get around 200-250MB/sec depending on the speed of your cpu.
Some CPUs support AES-NI which can accelerate aes ciphers, but getting AES-NI working in Proxmox requires using libs from Ubuntu and I do not like the idea of mixing such critical system libraries. The max I got with my hacked Proxmox/Ubuntu frankenstein using aes128-cbc cipher was 250MB/sec on a Xeon 3680.

You can change the ciphers that Proxmox uses for live migration by manually editing the files in /usr/share/perl5/PVE, change blowfish in the code to arcfour.
I've mostly been looking into this issue with Proxmox 2.0 so I do not know the exact files you need to edit for 1.9.
A reboot may be necessary after editing the code for the changes to take effect.

OK so I tried it as follows, and to make sure I got it on the 10GB side, i used IP over hostname:

scp -c arcfour -r 112 root@10.0.2.10:/var/lib/vz/dump
vm-112-disk-1.raw 37% 10GB 91.0MB/s

Not bad, but nothing even close to 250MB/s

So I tried it on the LinkAgg bond0 gig network:
scp -c arcfour -r 112 root@172.16.10.10:/var/lib/vz/dump
vm-112-disk-1.raw 8% 2230MB 105.3MB/s

So the linkAdd while still slow, is faster then the 10GB network.

I don't think thats its. Maybe I'm doing something wrong here.

dthompson · Jan 28, 2012

So once I do iperf -s on the master node, and then to the following on the client node, I get the following:

iperf -c 10.0.2.10
------------------------------------------------------------
Client connecting to 10.0.2.10, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.2.11 port 49189 connected with 10.0.2.10 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 10.4 GBytes 8.93 Gbits/sec

I am assuming that the 8.93Gbits/sec is correct.

When I do it against the bonded 2gig nics, I get the following:

iperf -c 172.16.10.10
------------------------------------------------------------
Client connecting to 172.16.10.10, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 3] local 172.16.10.11 port 33700 connected with 172.16.10.10 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.10 GBytes 944 Mbits/sec

So I am assuming that this is correct as well since its about 9X slower than when run against the 10GB network.

All in all though, it still seems slow when compared to my other Ubuntu setup that I had.

e100 · Jan 28, 2012

Your iperf showed 8.9Gbit, so the 10G is working.
Using arcfour you can only get 100Mb/sec, about 800Mbit

I suspect that maybe your CPU can not encrypt the data any faster.
If that is the problem then you will have one CPU core at 100% during the SCP operation.

It also might help to use Jumbo frames.

On my 10G network, which is Infiniband using IPoIB, I am using MTU 65520 (Max for Infiniband)
I just tested using arcfour
mtu 65520:

Code:

scp: 178MB/sec 
iperf: [  3]  0.0-10.0 sec  9.14 GBytes  7.85 Gbits/sec

mtu 32000:

Code:

scp: 158.8MB/s
iperf: [  3]  0.0-10.0 sec  9.12 GBytes  7.83 Gbits/sec

with mtu 1500:

Code:

scp: 77.2MB/s
iperf: [  3]  0.0-10.0 sec  2.63 GBytes  2.26 Gbits/sec

At the fastest setting the destination server is using 100% of one cpu core:

Code:

    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND          
 468554 root      20   0 72660 7616 2572 R  100  0.0   0:06.89 sshd

The network is fast, but SSH can not encrypt/decrypt the data fast enough or maybe it is the checksums it can not keep up with.

dthompson · Jan 30, 2012

OK well thats good to know for me. I just wanted to be sure that it wasn't something specific with proxmox and not working properly with 10GB Ethernet out of the box.

Thanks everyone for all your help for me!!

Search

Search

10 GB Ethernet Issue / Problem / Question

dthompson

Well-Known Member

tarball

Member

udo

Distinguished Member

dthompson

Well-Known Member

e100

Renowned Member

dthompson

Well-Known Member

dthompson

Well-Known Member

e100

Renowned Member

dthompson

Well-Known Member