VM 10GBit NIC working slow like 1GBit

argonius

Active Member
Jan 17, 2012
46
0
26
Hi,

we have a cluster setup of 4 Dell R730 systems with X540-AT2 10GBit DualCards. The servers are equipped with Dell 1.92TB SSD Disk in Raid5.
In between we have Cisco 3172TQ 10GBit copper switches with JumboFrames Support.

We had initially configured mode 4 bonding and now have configured mode 0 bonding on the pve machines. Versus the mode 4 bonding in the beginning,
we see an improvement. We also have an MTU of 9000 set on the physical interfaces and the bonding. Also the vmbridge has the correct MTU setting.
Beside that, we set the values from https://darksideclouds.wordpress.com/2016/10/10/tuning-10gb-nics-highway-to-hell/ which we found referenced here in the forum.

So right now, we get quite adequate performance between pve and pve server.
But within VM (Ubuntu 20.04, 2 Cores, 4GB RAM, all drivers with virtio), we get a horrible speed.

Here is a config of one of the vm's:
Code:
agent: 1
boot: dcn
bootdisk: scsi0
cores: 2
description: host%3A gac32-pve01
ide2: none,media=cdrom
memory: 4096
name: gac33-tst01
net0: virtio=7E:36:4F:88:23:7F,bridge=vmbr1,tag=1233
net1: virtio=A6:B3:08:0F:53:2E,bridge=vmbr0,tag=1232
net4: virtio=16:B6:0F:A5:AD:02,bridge=vmbr0,tag=1235
numa: 0
onboot: 1
ostype: l26
scsi0: DATA0:vm-167-disk-0,cache=writeback,size=30G
scsihw: virtio-scsi-pci
smbios1: uuid=ce220f7d-f602-434d-899a-9dcdd26ad13b
sockets: 1
vmgenid: c5357b76-da86-4a4a-b803-fa3f8e5cb53a

Running iperf3 between these two basic vm's we get:

Code:
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 172.19.35.226, port 46614
[  5] local 172.19.35.225 port 5201 connected to 172.19.35.226 port 46616
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   190 MBytes  1.59 Gbits/sec                
[  5]   1.00-2.00   sec   206 MBytes  1.73 Gbits/sec                
[  5]   2.00-3.00   sec   215 MBytes  1.80 Gbits/sec                
[  5]   3.00-4.00   sec   204 MBytes  1.71 Gbits/sec                
[  5]   4.00-5.00   sec   213 MBytes  1.78 Gbits/sec                
[  5]   5.00-6.01   sec  76.2 MBytes   631 Mbits/sec                
[  5]   6.01-7.00   sec   296 MBytes  2.51 Gbits/sec                
[  5]   7.00-8.00   sec   206 MBytes  1.73 Gbits/sec                
[  5]   8.00-9.00   sec   214 MBytes  1.79 Gbits/sec                
[  5]   9.00-10.00  sec   134 MBytes  1.12 Gbits/sec                
[  5]  10.00-10.01  sec   128 KBytes   302 Mbits/sec                
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.01  sec  1.91 GBytes  1.64 Gbits/sec                  receiver

I've already upgraded to latest ixgbe driver from intel, we set sysctl settings:

Code:
# Maximum receive socket buffer size
net.core.rmem_max = 134217728
# Maximum send socket buffer size
net.core.wmem_max = 134217728
# Minimum, initial and max TCP Receive buffer size in Bytes
net.ipv4.tcp_rmem = 4096 87380 134217728
# Minimum, initial and max buffer space allocated
net.ipv4.tcp_wmem = 4096 65536 134217728
# Maximum number of packets queued on the input side
net.core.netdev_max_backlog = 300000
# Auto tuning
net.ipv4.tcp_moderate_rcvbuf =1
# Don't cache ssthresh from previous connection
net.ipv4.tcp_no_metrics_save = 1
# The Hamilton TCP (HighSpeed-TCP) algorithm is a packet loss based congestion control and is more aggressive pushing up to max bandwidth (total BDP) and favors hosts with lower TTL / VARTTL.
net.ipv4.tcp_congestion_control=htcp
# If you are using jumbo frames set this to avoid MTU black holes.
net.ipv4.tcp_mtu_probing = 1


We can not really understand, where this big performance issue comes from and would be happy to get some thoughts from whom, who got it already running.

Thanks,
Patrick
 

argonius

Active Member
Jan 17, 2012
46
0
26
additional tests shown:
when VM's are on the same host, we get up to 15GBit/s with iperf3.
when VM's are on different PVE (pve01 and pve02 where we get 20GBit/s with iperf3 ) we get only up to 2Gbit/s....
 

wigor

Member
Dec 5, 2019
40
5
8
Hey Patrick,

maybe misconfigured offloading features hurt the performance. I have no details here, but i often read about tso / lro / checksum offloading. maybe better turn it of in the vm.
iirc ethtool -K or something.
 

argonius

Active Member
Jan 17, 2012
46
0
26
Hi Wigor,

thanks, also tried that. no success.

i've get with iperf3:
Bash:
[  5] local 172.19.35.226 port 60382 connected to 172.19.35.225 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   168 MBytes  1.41 Gbits/sec   84    494 KBytes       
[  5]   1.00-2.00   sec   211 MBytes  1.77 Gbits/sec  2102    283 KBytes       
[  5]   2.00-3.00   sec   186 MBytes  1.56 Gbits/sec  379    342 KBytes       
[  5]   3.00-4.00   sec   178 MBytes  1.49 Gbits/sec   15    615 KBytes       
[  5]   4.00-5.00   sec   179 MBytes  1.50 Gbits/sec   12    771 KBytes       
[  5]   5.00-6.00   sec   181 MBytes  1.52 Gbits/sec   18    902 KBytes       
[  5]   6.00-7.00   sec   211 MBytes  1.77 Gbits/sec    0   1.02 MBytes       
[  5]   7.00-8.00   sec   209 MBytes  1.75 Gbits/sec    2   1.16 MBytes       
[  5]   8.00-9.00   sec   220 MBytes  1.85 Gbits/sec    0   1.26 MBytes       
[  5]   9.00-10.00  sec   200 MBytes  1.68 Gbits/sec    3   1017 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.90 GBytes  1.63 Gbits/sec  2615             sender
[  5]   0.00-10.00  sec  1.89 GBytes  1.63 Gbits/sec                  receiver


when running in a direct connected single link pve01 <---> pve02 i get with iperf3:
Bash:
[  5] local 10.1.1.2 port 46922 connected to 10.1.1.1 port 5201
d[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.12 GBytes  9.58 Gbits/sec    9   2.93 MBytes       
[  5]   1.00-2.00   sec  1.12 GBytes  9.64 Gbits/sec   75   2.43 MBytes       
[  5]   2.00-3.00   sec  1.14 GBytes  9.76 Gbits/sec    0   3.62 MBytes       
[  5]   3.00-4.00   sec  1.14 GBytes  9.79 Gbits/sec   55   2.68 MBytes       
[  5]   4.00-5.00   sec  1.14 GBytes  9.80 Gbits/sec   52   3.50 MBytes       
[  5]   5.00-6.00   sec  1.14 GBytes  9.81 Gbits/sec    2   3.75 MBytes       
[  5]   6.00-7.00   sec  1.10 GBytes  9.43 Gbits/sec  178   3.52 MBytes       
[  5]   7.00-8.00   sec  1.13 GBytes  9.72 Gbits/sec  312   3.58 MBytes       
[  5]   8.00-9.00   sec  1.03 GBytes  8.80 Gbits/sec    0   3.61 MBytes       
[  5]   9.00-10.00  sec  1.10 GBytes  9.48 Gbits/sec  601   3.61 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  11.2 GBytes  9.58 Gbits/sec  1284             sender
[  5]   0.00-10.00  sec  11.2 GBytes  9.58 Gbits/sec                  receiver


difference is:
- direct connection vs switch connection
- bonding (slow) vs single interface (fast)

the weird thing is just, that the communication between the two pve servers via bonding,vlan and switch is fast > 10GBit
inside vm only 2Gbit.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!