(7.x) 10GB NICs at 1GB Speed

kromberg

Member
Nov 24, 2021
81
5
13
52
I got two proxmox servers that are running the latest non-production updates of 7.x. Both machines have a dual 10gb rj-45 NIC installed with a pair of cat7 cables connecting the two machines together( port 0 on machines A direct connect to port 0 on machine B and port 1 on A direct connect to port 1 on B ). On each machine the two ports are bonded using balanced-rr and then the bond added to a bridge. The ports, bond, and bridge all have a MTU of 9000 for jumbo frames. When moving data between the machines I am only seeing about 80m/s to 85m/s and that is at the host to host level. Both hosts are going from one ramdisk to another, so the data storage isnt the bottleneck. With VMs connected to the bridges, going from one VM to another I am only seeing about 70m/s to 80m/s transfers. The VMs are on striped SSDs, so the storage isnt a bottleneck either. Also the VMs network devices have multiqueue enabled to help with network packet processing. Any ideas on what might be going on?
 
Just about every way has this behavior: sftp, rsync, copy across NFS, iSCSI, and PBS backups/restores. iperf3 show about 2.26 GBytes/sec going from VM to VM across the bonded bridges which is expected. In theory going from one ramdisk to another should see about the same results, but I am only seeing about 80m/s.
 
this is often CPU bound, yet NFS/iSCSI should not see such low performance.

Both the host and VM have low CPU utilization while data is being transferred. The VM does have multiple vCPUs still in IO wait.

That is also not that fast.

You are right, I was thinking it was giga bits per second, but it is giga bytes. About 1/4 the expected through put.

Anyone have any ideas on what might be causing this and/or possible settings to check?
 
Both the host and VM have low CPU utilization while data is being transferred. The VM does have multiple vCPUs still in IO wait.
The things I replied to (scp, rsync over ssh) are single threaded, so even the CPU utilization is 10% while having 10 cores, is still CPU bound. I've never seen a system that can scp files with 1 GB/s.

You are right, I was thinking it was giga bits per second, but it is giga bytes. About 1/4 the expected through put.
Wait ... wait ... maybe I misread it, is it GB/s or Gbit/s? Normally iperf3 only outputs Gbit/s, therefore my eyes only scanned for numbers. If it would be GB/s, then you would have the correct throughput of 2x10 GBE (theoretical limit of 2,5 GB/s under optimal conditions).
 
I was about to bring that up ;)

2.26 GB/s should be pretty much top line what you can expect from a dual 10Gbit link (and what I have seen in practice).

70/80 MB/s in the VM is not exactly fast.
Have you ensured you are not crippling the host by making your VMs too big. Often less is more in virtualization environments.

What does your VM config say about the CPU? Is it the default (KVM or kvm64)?
Which virtual NIC type are you using?
 
70/80 MB/s in the VM is not exactly fast.
Have you ensured you are not crippling the host by making your VMs too big. Often less is more in virtualization environments.

What does your VM config say about the CPU? Is it the default (KVM or kvm64)?
Which virtual NIC type are you using?

What do you mean by making the VM too big???!?!

The VMs are using the default CPU type of kvm64 and using the the VirtIO NIC type with Multiqueue enabled.
 
What do you mean by making the VM too big???!?!
You have a Server with 8 physical Cores (and 16 Threads) and your VM is bigger than 6 vCPUs for instance

The VMs are using the default CPU type of kvm64
That can limit your CPU performance within the VM. Select manually the corresponding CPU familiy of your processor(s) or use the "host" setting.
I have read though that this seems to be not ideal, but I can't confirm that.
 
You have a Server with 8 physical Cores (and 16 Threads) and your VM is bigger than 6 vCPUs for instance

How is that 'big'? You have 6 threads for the VM and the host has 10 threads sitting around for its own use. I could see having 15 threads allocated to VMs and only 1 thread left for the host usage being over taxing on the system.

For me, server A has 48 threads with 36 vCPUs allocated and server B has 64 threads with 52 vCPUs allocated. I certainly hope that proxmox does not bind the allocated threads/vCPUs to the VM and it use only. I know that with VMware if a VM is not using a vCPU( it is basically sitting idle or under low load ), the unused threads/vCPUs are available for other VMs or the host to use.

That can limit your CPU performance within the VM. Select manually the corresponding CPU familiy of your processor(s) or use the "host" setting.
I have read though that this seems to be not ideal, but I can't confirm that.

I could play around with the CPU types for the Linux based VMs. For a couple of the Windoze based VMs changing the CPU type looses the advantage of migrating the VM around to other servers.
 
How is that 'big'? You have 6 threads for the VM and the host has 10 threads sitting around for its own use. I could see having 15 threads allocated to VMs and only 1 thread left for the host usage being over taxing on the system.

For me, server A has 48 threads with 36 vCPUs allocated and server B has 64 threads with 52 vCPUs allocated. I certainly hope that proxmox does not bind the allocated threads/vCPUs to the VM and it use only. I know that with VMware if a VM is not using a vCPU( it is basically sitting idle or under low load ), the unused threads/vCPUs are available for other VMs or the host to use.
Even if you got 64 threads you only get the full performance of 32 cores. If all 52 vCPUs would run with 100% utilization at the same time your CPU would be over its limits and couldn't handle that much, so threads have to queue up and need to wait. Of cause, as long as some of the vCPUs are ideling that wouldn't be a big problem. But soon as the load average goes above 32 things can get slow.
 
Even if you got 64 threads you only get the full performance of 32 cores.

That is completely false. Under full load a 64 thread system will out perform a straight up 32 core machine, but will not match the match the performance of a true 64 core machine with no threads. Many, many, many years of running F@H on high count thread and core machines has proved that. If that was the case, we all would have been buying snake oil since Intel and AMD added threads.
 
That is completely false. Under full load a 64 thread system will out perform a straight up 32 core machine, but will not match the match the performance of a true 64 core machine with no threads. Many, many, many years of running F@H on high count thread and core machines has proved that. If that was the case, we all would have been buying snake oil since Intel and AMD added threads.
Jep, you with HT you get more performance out of the CPU than without because the cores can switch between tasks when they need to wait for data to be read/written but you are still far away from having the performance of 52 real cores without hyperthreading. So even with 52 vCPUs the CPU is overprovisioned.
 
Last edited:
How is that 'big'? You have 6 threads for the VM and the host has 10 threads sitting around for its own use. I could see having 15 threads allocated to VMs and only 1 thread left for the host usage being over taxing on the system.
Calm down! I am trying to help.
Aside that i am talking about an individual VM. Read my post.

I know that with VMware if a VM is not using a vCPU( it is basically sitting idle or under low load ), the unused threads/vCPUs are available for other VMs or the host to use.
You can think whatever you like about all hypervisors of this planet. That does not change the way the hardware works.
Virtualization comes at a cost, every vCPU comes at a cost. And if the VM is too big you will get nothing but rubbish out of it. (And we have not yet talked about numa boundaries yet!)

Of course you can push 20 People in a 7seater van, but when trying to accelerate, don't complain. It was built for only 7.

And by the way, if you have asked thr same question in a VMware forum. Guess what I would have asked? Yeah, you got it. Exact the same thing. Why? Because people do the same.mistakes after 20 years of virtualization, again, again, again.

@Dunuin explained it well. And if you dont believe it, then dont.

changing the CPU type looses the advantage of migrating the VM around to other servers.
Yeah, there is always a tradeoff in life.
 
Last edited:
I just run a series of tests and this are "my numbers":

Code:
pve host  <=> pve host   =  9.2 Gbit/sec
pve host  <=> pve guest  =  9.0 Gbit/sec
pve guest <=> pve guest  =  8.8 Gbit/sec

Without jumbo frames and no manual optimization, all Linux VMs without any changes to cpu type.
 
OK, finally got some time to play around with things. I broke apart the bonds and just have single physical ports that go into the bridges. I current have one bridge that has a single 10gb port where the port and bridge have a MTU of 9000 for jumbo frames. I have a single linux VM connected to the bridge with a VirtIO nic type and the VM has the MTU at 9000. Going from the VM to the host I am seeing about the same results in iperf3:

[ 5] 0.00-1.00 sec 2.30 GBytes 2.30 GBytes/sec [ 5] 1.00-2.00 sec 2.31 GBytes 2.31 GBytes/sec [ 5] 2.00-3.00 sec 2.32 GBytes 2.32 GBytes/sec [ 5] 3.00-4.00 sec 2.32 GBytes 2.32 GBytes/sec [ 5] 4.00-5.00 sec 2.29 GBytes 2.29 GBytes/sec [ 5] 5.00-6.00 sec 2.39 GBytes 2.39 GBytes/sec [ 5] 6.00-7.00 sec 2.32 GBytes 2.32 GBytes/sec [ 5] 7.00-8.00 sec 2.31 GBytes 2.31 GBytes/sec [ 5] 7.00-8.00 sec 2.31 GBytes 2.31 GBytes/sec

So i essence, the physical port is not playing any role in the test. I then removed the physical port from the bridge and retested, but again the same results. So what is causing the bottleneck on the proxmox side? Especially since the second test involved no hardware.

The host is a dual E5-2680 v2, 512GB DDR3-1866 ECC, Intel W2600IP4.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!