10GB network performance issue

totalimpact

Renowned Member
Dec 12, 2010
133
19
83
I have 2 Dell R730, 1cpu, 32ram, 6x ssd raid, and 3 different 10gb nics to choose from, going through a Nortel 5530 with 2x 10gb XFP, 9k frames, and switch ports VLAN'd (untagged) from all other ports.

iperf tests look great:
iperf.png

dd performance on the box locally is good:
ddperf.png


But an scp from box to box is pegging the cpu, and only doing ~300MB/s.... ideas?? The 10g is used only for pve-zsync, same slow speed.

proxmox-ve: 4.2-64 (running kernel: 4.4.16-1-pve)
pve-manager: 4.2-18 (running version: 4.2-18/158720b9)
pve-kernel-4.4.16-1-pve: 4.4.16-64

Server1 I currently have a Mellanox:
Code:
04:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)

iface vmbr4 inet static
        address  192.168.22.41
        netmask  255.255.255.0
        bridge_ports eth4
        bridge_stp off
        bridge_fd 0
        bridge_vlan_aware yes
pre-up ifconfig eth4 mtu 9000

root@pve1:~# ethtool -k eth4
Features for eth4:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: on
rx-vlan-stag-filter: on [fixed]
l2-fwd-offload: off [fixed]
busy-poll: on [fixed]
hw-tc-offload: off [fixed]

Server2 with Dell RN219 Intel card:
Code:
05:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF Network Connection (rev 01)

root@pve2:~# ethtool -k eth4
Features for eth4:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: on [fixed]
hw-tc-offload: off [fixed]

I am going onsite tomorrow to start swapping cards to see if one is the culprit, I have 1 other model of Mellanox on the shelf, the Intel I only have one... not sure what might be the issue, maybe some of the offload settings, any tips appreciated.
 
Last edited:
And I see pve-zsync is also encrypting... is there a way to disable encryption for zfs send?

doh! - of course the send is piping through ssh, any thoughts on sending through netcat instead?.... Im not big on perl, so not sure how far I can get on zsync.
 
Last edited:
And I see pve-zsync is also encrypting... is there a way to disable encryption for zfs send?

doh! - of course the send is piping through ssh, any thoughts on sending through netcat instead?.... Im not big on perl, so not sure how far I can get on zsync.

If your only seeing 300mb/s im guessing your cpu doesn't support aes-ni. You can check /proc/cpuinfo to see if your cpu has the aes flag. If it doesn't then I am willing to bet that arcfour would provide a nice speed increase.
 
ouch... arcfour gave ~150mb/s on a scp, so that didnt work well.

The cpu is a Xeon e5-2620 v3 (6core/2.4-3.2ghz), I know its low end, but appears to support most major features, was spec'd because VM workload did not require much, its only 1 small centos, a 2k8, and an xp, but I need fast sync for poormans HA.

As far as aes -
the intel spec page shows aes-ni=yes
cpuinfo shows aes
Code:
root@pve1:~# sort -u /proc/crypto | grep module
module       : aesni_intel
module       : aes_x86_64
root@pve1:~# lsmod | grep aesni
aesni_intel           167936  0
aes_x86_64             20480  1 aesni_intel
lrw                    16384  1 aesni_intel
glue_helper            16384  1 aesni_intel
ablk_helper            16384  1 aesni_intel
cryptd                 20480  2 aesni_intel,ablk_helper
root@pve1:~# openssl engine
(rsax) RSAX engine support
(rdrand) Intel RDRAND engine
(dynamic) Dynamic engine loading support

But I dont see it showing in the openssl engine, supposed to show aes-ni on engine output (per post here), running a openssl speed test gives me 33mb/s, so its off for sure, I believe it should be close to 100mb/s.... anyone know how to enable it, clearly the OS sees it.

I can run speed test and specify EVP (not sure on that switch), and it is worlds higher:
Code:
openssl speed  aes-128-cbc 
15229320 aes-128 cbc's in 3.00s

openssl speed -evp AES128 
80026941 aes-128-cbc's in 3.00s
 
Some more testing on this.... still not sure on if openssl is using aes-ni, but found other details:

Code:
root@pve1:~# apt-get install -y gnutls-bin
root@pve1:~# gnutls-cli --benchmark-ciphers
Checking cipher-MAC combinations, payload size: 16384
     SALSA20-256-SHA1 0.21 GB/sec
     AES-128-CBC-SHA1 0.37 GB/sec
     AES-128-CBC-SHA256 0.23 GB/sec
     AES-128-GCM 2.67 GB/sec <<<<<<<<<<<<<<<<<<<< this looks good??

Checking MAC algorithms, payload size: 16384
            SHA1 0.74 GB/sec
          SHA256 0.34 GB/sec
          SHA512 0.29 GB/sec

Checking ciphers, payload size: 16384
        3DES-CBC 24.14 MB/sec
     AES-128-CBC 0.72 GB/sec
     ARCFOUR-128 0.36 GB/sec
     SALSA20-256 0.43 GB/sec

root@pve1:~# scp -c aes128-gcm@openssh.com /rpool/ROOT/vm-116-disk-1.qcow2 192.168.22.42:/tmp/
vm-116-disk-1.qcow2                                                                                       13% 4361MB 318.9MB/s   01:31 ETA

aes128-gcm shows huge benchmarks, but GCM appears to be what its already using and its only so-so.
 
I just tested on two of my hosts with a 7g iso. They have E5-2670 v3's. No issues hitting speeds over 1G.


scp CentOS-6.6-CCS-vA1-x86_64.iso root@10.211.45.5:~
CentOS-6.6-CCS-vA1-x86_64.iso 100% 6695MB 291.1MB/s

On a side note rsync with the --inplace option might be a better choice to sync the vm disks. Its been years since I have done somthing like that, but testing in the past it seemed to work better than anything else.
 
Thanks for the comment Adam, but I missed your point there, your example shows 291MB/s, I am getting ~330MB/s, but on iperf I can move 11gb in close to 10 seconds. Your higher end CPU appears to show the same encryption bottle neck as mine or worse.

Anyone know why I can test out AES-128-GCM @ 2.67 GB/sec, but in real use it is still much lower?
 
Thanks for the comment Adam, but I missed your point there, your example shows 291MB/s, I am getting ~330MB/s, but on iperf I can move 11gb in close to 10 seconds. Your higher end CPU appears to show the same encryption bottle neck as mine or worse.

Anyone know why I can test out AES-128-GCM @ 2.67 GB/sec, but in real use it is still much lower?

You had it listed as mb/s not MB/s, are you sure on your numbers? If you are getting those numbers (MB/s) then that is all your going to do with ssh.
 
Last edited:
shoot - sorry, my fault. edited MB.... 330MB still sounds low to me on a server that has 1GB/s+ of local performance on the filesystem. I cp'd 35gb between 2 arrays on the same raid controller in 56 seconds = 625MB/s - which is worst case because its killing the raid card cache to read/write to the same device, were as zfs send is just a local read - I would think it would give me at least 500MB/s.

I think it has something to do with this AES-NI cipher not enabled in openssl??? working on a HPN-SSH compile with NONE for cipher, will see if that pans out for anything.
 
shoot - sorry, my fault. edited MB.... 330MB still sounds low to me on a server that has 1GB/s+ of local performance on the filesystem. I cp'd 35gb between 2 arrays on the same raid controller in 56 seconds = 625MB/s - which is worst case because its killing the raid card cache to read/write to the same device, were as zfs send is just a local read - I would think it would give me at least 500MB/s.

I think it has something to do with this AES-NI cipher not enabled in openssl??? working on a HPN-SSH compile with NONE for cipher, will see if that pans out for anything.

330MB/s is not slow for SSH, imo its quite good! AES-NI is being used or you wouldn't even be remotely close to these numbers.
 
So I already installed the HPN patch with the NONE cipher option, supposedly that turns off all encryption accept the initial handshake:
Code:
scp -c NONE vm-116-disk-1.qcow2 pve2:/mnt/data2/

That gives me 340-370MB/s, I dont understand why its still so slow if no encryption is happening, I guess I will just live with that... wish there was a way to zfs send without ssh (over a secure VLAN).
 
So I already installed the HPN patch with the NONE cipher option, supposedly that turns off all encryption accept the initial handshake:
Code:
scp -c NONE vm-116-disk-1.qcow2 pve2:/mnt/data2/

That gives me 340-370MB/s, I dont understand why its still so slow if no encryption is happening, I guess I will just live with that... wish there was a way to zfs send without ssh (over a secure VLAN).
Have you considered trying with netcat?
 
yes, I tried netcat also and a raw nc file transfer is giving 690 MB/s, but a ZFS send through netcat only gets 133MB/s (might as well be 1GB LAN) .... also tried mbuffer, comes out around 230MB/s. Example-

Code:
zfs send rpool/data/vm-161-disk-1@rep_test161_2016-09-08_21:38:57 | nc -q 0 -w 20 pve2 7777

#### @ PVE2:
nc -w 120 -l -p 7777 | zfs receive rpool/data/vm-161-disk-1@rep_test161_2016-09-08_21:38:57

Thanks for the ideas guys.... I guess for future projects I will just leave it to 1GB links, 10GB has too little ROI in this scenario... luckily Nortel switches only cost $150/each
 
Last edited:
Hi guys! Lets resurrect this thread
We are facing similar problem. We have 10Gb link between locations with 24ms rtt.
On both using proxmox with ZFS+pvesr
Storage replication is limited to 80MByte/sec.
After tuning tcp buffers and congestion we are able to achieve 5Gb/sec between location in iperf and dd over netcat
but scp and dd over ssh shows only 80Mbyte/sec.
As is known, openssh uses hardcoded 2MB TCP window size. And pvesr relies on ssh. This is quite ridiculous situation in 2022, because storage replication can lasts eternity if VM size ~5TB...

After some googling we found HPN-SSH, which was developed to overcome this limitations, but unfortunately there is no clear and safe way to perform patch in debian with pve.

May be the wise way would be the option to choose underlying protocol for pvesr (or disabling encryption at all)
Would be appreciated for any advice or solutions.

Code:
 iperf3 -c server1
Connecting to host server1, port 5201
[  5] local server2 port 50740 connected to server1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   479 MBytes  4.02 Gbits/sec    0   31.4 MBytes
[  5]   1.00-2.00   sec   634 MBytes  5.31 Gbits/sec    0   31.0 MBytes
[  5]   2.00-3.00   sec   632 MBytes  5.31 Gbits/sec    0   31.0 MBytes
[  5]   3.00-4.00   sec   632 MBytes  5.31 Gbits/sec    0   31.0 MBytes
[  5]   4.00-5.00   sec   632 MBytes  5.31 Gbits/sec    0   31.2 MBytes
[  5]   5.00-6.00   sec   632 MBytes  5.31 Gbits/sec    0   31.3 MBytes
[  5]   6.00-7.00   sec   632 MBytes  5.31 Gbits/sec    0   31.1 MBytes
[  5]   7.00-8.00   sec   632 MBytes  5.31 Gbits/sec    0   31.5 MBytes
[  5]   8.00-9.00   sec   632 MBytes  5.30 Gbits/sec    0   30.9 MBytes
[  5]   9.00-10.00  sec   632 MBytes  5.31 Gbits/sec    0   31.1 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  6.03 GBytes  5.18 Gbits/sec    0             sender
[  5]   0.00-10.02  sec  6.02 GBytes  5.16 Gbits/sec                  receiver

Code:
dd if=/dev/zero bs=$((2**20)) count=$((2**15)) | nc -q 0 server1 12345
32768+0 records in
32768+0 records out
34359738368 bytes (34 GB, 32 GiB) copied, 52.2658 s, 657 MB/s

Code:
dd if=/dev/zero bs=$((2**20)) count=$((2**10)) | ssh -c aes128-gcm@openssh.com server1 dd of=/dev/nul
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 13.6188 s, 78.8 MB/s
 
Last edited:
After some googling we found HPN-SSH, which was developed to overcome this limitations, but unfortunately there is no clear and safe way to perform patch in debian with pve.
Your numbers look very odd - what CPU are you using? I cannot give any wan-performance, but my lan-performance is hugely faster then your numbers.

Fairly new Intel Xeon Gold 6256, 3.6Ghz via OpenSSH 8.0 on Oracle Linux 8, 10 GBE:

Code:
root@oracle-n3 /opt/acfs/backup/fra/DB01S1/backupset/2022_01_09 > dd if=o1_mf_nnnd1_BL1_20220109_000001_jxn65yc7_.bkp bs=1M | ssh oracle-n4 dd of=/dev/null bs=1M
1074+1 records in
1074+1 records out
1126334464 bytes (1,1 GB, 1,0 GiB) copied, 1,37079 s, 822 MB/s
0+36162 records in
0+36162 records out
1126334464 bytes (1.1 GB, 1.0 GiB) copied, 1.23353 s, 913 MB/s

and with older Intel Xeon CPU E5-2667 3.2 GHz on PVE 7.1

Code:
root@proxmox6 ~ > dd if=/dev/san-dx200/vm-9000-disk-0 bs=1M count=8192 | ssh proxmox7 dd bs=1M of=/dev/null
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 32.1139 s, 267 MB/s
0+524270 records in
0+524270 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 31.8185 s, 270 MB/s
 
Thanks for reply. CPUs are Xeon Gold 6130 and 5218.
I think latency is the key. WAN performance totally depends on it

So as for now Proxmox does not allow geographical distributed clustering to be deployed with PVE storage replication utilising full bw of high speed links. This is very frustrating
 
Last edited:
Thanks for reply. CPUs are Xeon Gold 6130 and 5218.
Those CPUs are very, very bad for any crypto- or compression related single-thread workloads, so one part of the performance equation is from them, but that shoud not be so huge, just about factor 2.

Have you tried using the ProxyCommand (e.g. this hack) for the specific pvesr target? I used it years ago to solve another problem and just remembered it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!