10GB network performance issue

totalimpact · Sep 9, 2016

I have 2 Dell R730, 1cpu, 32ram, 6x ssd raid, and 3 different 10gb nics to choose from, going through a Nortel 5530 with 2x 10gb XFP, 9k frames, and switch ports VLAN'd (untagged) from all other ports.

iperf tests look great:

dd performance on the box locally is good:

But an scp from box to box is pegging the cpu, and only doing ~300MB/s.... ideas?? The 10g is used only for pve-zsync, same slow speed.

proxmox-ve: 4.2-64 (running kernel: 4.4.16-1-pve)
pve-manager: 4.2-18 (running version: 4.2-18/158720b9)
pve-kernel-4.4.16-1-pve: 4.4.16-64

Server1 I currently have a Mellanox:

Code:

04:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)

iface vmbr4 inet static
        address  192.168.22.41
        netmask  255.255.255.0
        bridge_ports eth4
        bridge_stp off
        bridge_fd 0
        bridge_vlan_aware yes
pre-up ifconfig eth4 mtu 9000

root@pve1:~# ethtool -k eth4
Features for eth4:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: on
rx-vlan-stag-filter: on [fixed]
l2-fwd-offload: off [fixed]
busy-poll: on [fixed]
hw-tc-offload: off [fixed]

Server2 with Dell RN219 Intel card:

Code:

05:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF Network Connection (rev 01)

root@pve2:~# ethtool -k eth4
Features for eth4:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: on [fixed]
hw-tc-offload: off [fixed]

I am going onsite tomorrow to start swapping cards to see if one is the culprit, I have 1 other model of Mellanox on the shelf, the Intel I only have one... not sure what might be the issue, maybe some of the offload settings, any tips appreciated.

dietmar · Sep 9, 2016

totalimpact said:
But an scp from box to box is pegging the cpu, and only doing ~300mb/s.... ideas??

scp is still single threaded, so I guess this is quite normal (encryption eats up all CPU power).

LnxBil · Sep 9, 2016

the single-core encryption for SSH/SFTP is the limiting factor. Please try NFS or direct netcat.

totalimpact · Sep 9, 2016

And I see pve-zsync is also encrypting... is there a way to disable encryption for zfs send?

doh! - of course the send is piping through ssh, any thoughts on sending through netcat instead?.... Im not big on perl, so not sure how far I can get on zsync.

adamb · Sep 9, 2016

totalimpact said:
And I see pve-zsync is also encrypting... is there a way to disable encryption for zfs send?

doh! - of course the send is piping through ssh, any thoughts on sending through netcat instead?.... Im not big on perl, so not sure how far I can get on zsync.

If your only seeing 300mb/s im guessing your cpu doesn't support aes-ni. You can check /proc/cpuinfo to see if your cpu has the aes flag. If it doesn't then I am willing to bet that arcfour would provide a nice speed increase.

totalimpact · Sep 9, 2016

ouch... arcfour gave ~150mb/s on a scp, so that didnt work well.

The cpu is a Xeon e5-2620 v3 (6core/2.4-3.2ghz), I know its low end, but appears to support most major features, was spec'd because VM workload did not require much, its only 1 small centos, a 2k8, and an xp, but I need fast sync for poormans HA.

As far as aes -
the intel spec page shows aes-ni=yes
cpuinfo shows aes

Code:

root@pve1:~# sort -u /proc/crypto | grep module
module       : aesni_intel
module       : aes_x86_64
root@pve1:~# lsmod | grep aesni
aesni_intel           167936  0
aes_x86_64             20480  1 aesni_intel
lrw                    16384  1 aesni_intel
glue_helper            16384  1 aesni_intel
ablk_helper            16384  1 aesni_intel
cryptd                 20480  2 aesni_intel,ablk_helper
root@pve1:~# openssl engine
(rsax) RSAX engine support
(rdrand) Intel RDRAND engine
(dynamic) Dynamic engine loading support

But I dont see it showing in the openssl engine, supposed to show aes-ni on engine output (per post here), running a openssl speed test gives me 33mb/s, so its off for sure, I believe it should be close to 100mb/s.... anyone know how to enable it, clearly the OS sees it.

I can run speed test and specify EVP (not sure on that switch), and it is worlds higher:

Code:

openssl speed  aes-128-cbc 
15229320 aes-128 cbc's in 3.00s

openssl speed -evp AES128 
80026941 aes-128-cbc's in 3.00s

totalimpact · Sep 9, 2016

Some more testing on this.... still not sure on if openssl is using aes-ni, but found other details:

Code:

root@pve1:~# apt-get install -y gnutls-bin
root@pve1:~# gnutls-cli --benchmark-ciphers
Checking cipher-MAC combinations, payload size: 16384
     SALSA20-256-SHA1 0.21 GB/sec
     AES-128-CBC-SHA1 0.37 GB/sec
     AES-128-CBC-SHA256 0.23 GB/sec
     AES-128-GCM 2.67 GB/sec <<<<<<<<<<<<<<<<<<<< this looks good??

Checking MAC algorithms, payload size: 16384
            SHA1 0.74 GB/sec
          SHA256 0.34 GB/sec
          SHA512 0.29 GB/sec

Checking ciphers, payload size: 16384
        3DES-CBC 24.14 MB/sec
     AES-128-CBC 0.72 GB/sec
     ARCFOUR-128 0.36 GB/sec
     SALSA20-256 0.43 GB/sec

root@pve1:~# scp -c aes128-gcm@openssh.com /rpool/ROOT/vm-116-disk-1.qcow2 192.168.22.42:/tmp/
vm-116-disk-1.qcow2                                                                                       13% 4361MB 318.9MB/s   01:31 ETA

aes128-gcm shows huge benchmarks, but GCM appears to be what its already using and its only so-so.

adamb · Sep 9, 2016

I just tested on two of my hosts with a 7g iso. They have E5-2670 v3's. No issues hitting speeds over 1G.

scp CentOS-6.6-CCS-vA1-x86_64.iso root@10.211.45.5:~
CentOS-6.6-CCS-vA1-x86_64.iso 100% 6695MB 291.1MB/s

On a side note rsync with the --inplace option might be a better choice to sync the vm disks. Its been years since I have done somthing like that, but testing in the past it seemed to work better than anything else.

totalimpact · Sep 9, 2016

Thanks for the comment Adam, but I missed your point there, your example shows 291MB/s, I am getting ~330MB/s, but on iperf I can move 11gb in close to 10 seconds. Your higher end CPU appears to show the same encryption bottle neck as mine or worse.

Anyone know why I can test out AES-128-GCM @ 2.67 GB/sec, but in real use it is still much lower?

adamb · Sep 9, 2016

totalimpact said:
Thanks for the comment Adam, but I missed your point there, your example shows 291MB/s, I am getting ~330MB/s, but on iperf I can move 11gb in close to 10 seconds. Your higher end CPU appears to show the same encryption bottle neck as mine or worse.

Anyone know why I can test out AES-128-GCM @ 2.67 GB/sec, but in real use it is still much lower?

You had it listed as mb/s not MB/s, are you sure on your numbers? If you are getting those numbers (MB/s) then that is all your going to do with ssh.

totalimpact · Sep 10, 2016

shoot - sorry, my fault. edited MB.... 330MB still sounds low to me on a server that has 1GB/s+ of local performance on the filesystem. I cp'd 35gb between 2 arrays on the same raid controller in 56 seconds = 625MB/s - which is worst case because its killing the raid card cache to read/write to the same device, were as zfs send is just a local read - I would think it would give me at least 500MB/s.

I think it has something to do with this AES-NI cipher not enabled in openssl??? working on a HPN-SSH compile with NONE for cipher, will see if that pans out for anything.

spirit · Sep 10, 2016

Here a patch for openssh:

https://www.psc.edu/index.php/hpn-ssh

Maybe can you try to compile it for testing.

adamb · Sep 12, 2016

totalimpact said:
shoot - sorry, my fault. edited MB.... 330MB still sounds low to me on a server that has 1GB/s+ of local performance on the filesystem. I cp'd 35gb between 2 arrays on the same raid controller in 56 seconds = 625MB/s - which is worst case because its killing the raid card cache to read/write to the same device, were as zfs send is just a local read - I would think it would give me at least 500MB/s.

I think it has something to do with this AES-NI cipher not enabled in openssl??? working on a HPN-SSH compile with NONE for cipher, will see if that pans out for anything.

330MB/s is not slow for SSH, imo its quite good! AES-NI is being used or you wouldn't even be remotely close to these numbers.

totalimpact · Sep 13, 2016

So I already installed the HPN patch with the NONE cipher option, supposedly that turns off all encryption accept the initial handshake:

Code:

scp -c NONE vm-116-disk-1.qcow2 pve2:/mnt/data2/

That gives me 340-370MB/s, I dont understand why its still so slow if no encryption is happening, I guess I will just live with that... wish there was a way to zfs send without ssh (over a secure VLAN).

mir · Sep 13, 2016

totalimpact said:
So I already installed the HPN patch with the NONE cipher option, supposedly that turns off all encryption accept the initial handshake:

Code:

scp -c NONE vm-116-disk-1.qcow2 pve2:/mnt/data2/

That gives me 340-370MB/s, I dont understand why its still so slow if no encryption is happening, I guess I will just live with that... wish there was a way to zfs send without ssh (over a secure VLAN).

Have you considered trying with netcat?

totalimpact · Sep 13, 2016

yes, I tried netcat also and a raw nc file transfer is giving 690 MB/s, but a ZFS send through netcat only gets 133MB/s (might as well be 1GB LAN) .... also tried mbuffer, comes out around 230MB/s. Example-

Code:

zfs send rpool/data/vm-161-disk-1@rep_test161_2016-09-08_21:38:57 | nc -q 0 -w 20 pve2 7777

#### @ PVE2:
nc -w 120 -l -p 7777 | zfs receive rpool/data/vm-161-disk-1@rep_test161_2016-09-08_21:38:57

Thanks for the ideas guys.... I guess for future projects I will just leave it to 1GB links, 10GB has too little ROI in this scenario... luckily Nortel switches only cost $150/each

felix84 · Jan 9, 2022

Hi guys! Lets resurrect this thread
We are facing similar problem. We have 10Gb link between locations with 24ms rtt.
On both using proxmox with ZFS+pvesr
Storage replication is limited to 80MByte/sec.
After tuning tcp buffers and congestion we are able to achieve 5Gb/sec between location in iperf and dd over netcat
but scp and dd over ssh shows only 80Mbyte/sec.
As is known, openssh uses hardcoded 2MB TCP window size. And pvesr relies on ssh. This is quite ridiculous situation in 2022, because storage replication can lasts eternity if VM size ~5TB...

After some googling we found HPN-SSH, which was developed to overcome this limitations, but unfortunately there is no clear and safe way to perform patch in debian with pve.

May be the wise way would be the option to choose underlying protocol for pvesr (or disabling encryption at all)
Would be appreciated for any advice or solutions.

Code:

 iperf3 -c server1
Connecting to host server1, port 5201
[  5] local server2 port 50740 connected to server1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   479 MBytes  4.02 Gbits/sec    0   31.4 MBytes
[  5]   1.00-2.00   sec   634 MBytes  5.31 Gbits/sec    0   31.0 MBytes
[  5]   2.00-3.00   sec   632 MBytes  5.31 Gbits/sec    0   31.0 MBytes
[  5]   3.00-4.00   sec   632 MBytes  5.31 Gbits/sec    0   31.0 MBytes
[  5]   4.00-5.00   sec   632 MBytes  5.31 Gbits/sec    0   31.2 MBytes
[  5]   5.00-6.00   sec   632 MBytes  5.31 Gbits/sec    0   31.3 MBytes
[  5]   6.00-7.00   sec   632 MBytes  5.31 Gbits/sec    0   31.1 MBytes
[  5]   7.00-8.00   sec   632 MBytes  5.31 Gbits/sec    0   31.5 MBytes
[  5]   8.00-9.00   sec   632 MBytes  5.30 Gbits/sec    0   30.9 MBytes
[  5]   9.00-10.00  sec   632 MBytes  5.31 Gbits/sec    0   31.1 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  6.03 GBytes  5.18 Gbits/sec    0             sender
[  5]   0.00-10.02  sec  6.02 GBytes  5.16 Gbits/sec                  receiver

Code:

dd if=/dev/zero bs=$((2**20)) count=$((2**15)) | nc -q 0 server1 12345
32768+0 records in
32768+0 records out
34359738368 bytes (34 GB, 32 GiB) copied, 52.2658 s, 657 MB/s

Code:

dd if=/dev/zero bs=$((2**20)) count=$((2**10)) | ssh -c aes128-gcm@openssh.com server1 dd of=/dev/nul
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 13.6188 s, 78.8 MB/s

LnxBil · Jan 14, 2022

felix84 said:
After some googling we found HPN-SSH, which was developed to overcome this limitations, but unfortunately there is no clear and safe way to perform patch in debian with pve.

Your numbers look very odd - what CPU are you using? I cannot give any wan-performance, but my lan-performance is hugely faster then your numbers.

Fairly new Intel Xeon Gold 6256, 3.6Ghz via OpenSSH 8.0 on Oracle Linux 8, 10 GBE:

Code:

root@oracle-n3 /opt/acfs/backup/fra/DB01S1/backupset/2022_01_09 > dd if=o1_mf_nnnd1_BL1_20220109_000001_jxn65yc7_.bkp bs=1M | ssh oracle-n4 dd of=/dev/null bs=1M
1074+1 records in
1074+1 records out
1126334464 bytes (1,1 GB, 1,0 GiB) copied, 1,37079 s, 822 MB/s
0+36162 records in
0+36162 records out
1126334464 bytes (1.1 GB, 1.0 GiB) copied, 1.23353 s, 913 MB/s

and with older Intel Xeon CPU E5-2667 3.2 GHz on PVE 7.1

Code:

root@proxmox6 ~ > dd if=/dev/san-dx200/vm-9000-disk-0 bs=1M count=8192 | ssh proxmox7 dd bs=1M of=/dev/null
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 32.1139 s, 267 MB/s
0+524270 records in
0+524270 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 31.8185 s, 270 MB/s

felix_84 · Jan 15, 2022

Thanks for reply. CPUs are Xeon Gold 6130 and 5218.
I think latency is the key. WAN performance totally depends on it

So as for now Proxmox does not allow geographical distributed clustering to be deployed with PVE storage replication utilising full bw of high speed links. This is very frustrating

LnxBil · Jan 15, 2022

felix_84 said:
Thanks for reply. CPUs are Xeon Gold 6130 and 5218.

Those CPUs are very, very bad for any crypto- or compression related single-thread workloads, so one part of the performance equation is from them, but that shoud not be so huge, just about factor 2.

Have you tried using the ProxyCommand (e.g. this hack) for the specific pvesr target? I used it years ago to solve another problem and just remembered it.

10GB network performance issue

Renowned Member

Proxmox Staff Member

Distinguished Member

Renowned Member

Famous Member

Renowned Member

Renowned Member

Famous Member

Renowned Member

Famous Member

Renowned Member

Distinguished Member

Famous Member

Renowned Member

Famous Member

Renowned Member

Member

Distinguished Member

Member

Distinguished Member

We value your privacy