(7.1) Performance Issues

kromberg

Member
Nov 24, 2021
81
5
13
52
OK, I have been chasing a performance issue transferring data between two proxmox hosts and I cant seem to figure things out. The general issue is that I am seeing really low transfer rates of data between the machines. Copying a large 10GB file is only getting about 80MB/s in transfer. This is regardless of using scp, rsync, nfs, iscsi. Here is the basic layout of the two machines:

Host A
  • dual Xeon E5-2680v3
  • 256GB DDR4-2133 EEC RDIMMs
  • dual port 10gb RJ-45 NIC bonded( rr ) added to vmbr1 ( ports, bond, and bridge MTU 9000 )
  • zpool 6 iNTEL dc s3700 striped RAID0 compression=on shift=12 blocksize=32k
  • 60GB ram disk mounted
Host B
  • dual Xeon E5-2690v3
  • 512GB DDR4-2133 ECC RDIMMs
  • dual port 10gb RJ-45 NIC bonded( rr ) added to vmbr1 ( ports, bond, and bridge MTU 9000 )
  • 80GB ram disk mounted
  • zpool 8 6G 15K SAS2 striped RAID0 compression=on shift=12 blocksize=32k
The two machines are connected together with a pair of cat6 cables port0 to port0 and port1 to port1.

Using iperf3 going host A to B across the bonded bridge, I am getting:
Code:
[  5]   0.00-10.00  sec  22.9 GBytes  19.7 Gbits/sec  627             sender
[  5]   0.00-10.00  sec  22.9 GBytes  19.7 Gbits/sec                  receiver

Totally what I expect, one check here. Same results basically going either way between them. The bridge at the host level is working.

On host A, doing some basic disk performance using 'dd' I get:



Code:
root@odin:/mnt/pve/ram# dd if=/dev/random of=/vm2-zfs-r0/test/test.dat bs=1M count=20000
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB, 20 GiB) copied, 94.6553 s, 222 MB/s

root@odin:/mnt/pve/ram# dd if=/dev/zero of=/vm2-zfs-r0/test/test1.dat bs=1M count=20000
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB, 20 GiB) copied, 5.78259 s, 3.6 GB/s

root@odin:/mnt/pve/ram# dd if=/dev/random of=/mnt/pve/ram/test2.dat bs=1M count=20000  
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB, 20 GiB) copied, 102.238 s, 205 MB/s

root@odin:/mnt/pve/ram# dd if=/dev/zero of=/mnt/pve/ram/test3.dat bs=1M count=20000
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB, 20 GiB) copied, 15.1278 s, 1.4 GB/s

root@odin:/mnt/pve/ram# dd if=test2.dat of=/vm2-zfs-r0/test/test4.dat bs=1M
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB, 20 GiB) copied, 18.981 s, 1.1 GB/s

root@odin:/mnt/pve/ram# dd if=test3.dat of=/vm2-zfs-r0/test/test5.dat bs=1M
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB, 20 GiB) copied, 13.5282 s, 1.6 GB/s

root@odin:/mnt/pve/ram# time cp test2.dat /vm2-zfs-r0/test/test6.dat

real    0m19.271s
user    0m0.156s
sys     0m19.099s
root@odin:/mnt/pve/ram# time cp test3.dat /vm2-zfs-r0/test/test7.dat  

real    0m14.861s
user    0m0.112s
sys     0m14.739s

root@odin:/mnt/pve/ram# rsync -avp test2.dat /vm2-zfs-r0/test
sending incremental file list
test2.dat

sent 20,976,640,100 bytes  received 35 bytes  856,189,393.27 bytes/sec
total size is 20,971,520,000  speedup is 1.00
root@odin:/mnt/pve/ram# rsync -avp test3.dat /vm2-zfs-r0/test  
sending incremental file list
test3.dat

sent 20,976,640,099 bytes  received 35 bytes  1,133,872,439.68 bytes/sec
total size is 20,971,520,000  speedup is 1.00

Overall the disk performance is what I was kinda expecting. Though I was a little surprised about the ram disk performance: /dev/zero about 1.4GB/s and the transfer of the zero and random files to the zpool. Using straight cp and rsync produced low performance transfers, but still within the expected results.

I did the same type of disk benchmarks on host B and got the expected results. Moving data around with dd giving 1+GB/s transfers and cp/rsync around 800MB/s transfers.

So how I created a VM on eahc host with basically the same configuration:
  • 8 cores, host, numa=1
  • 16GB ram
  • VirtIO SCSI controller
  • 100G disk scsi
  • VirtIO NIC to vmbr1 firewall=off ( MTU 9000 set in guest os )
  • Fedora 33 x86 guest os
Now on Host A using iperf3 going from the VM to Host A, I am getting:
Code:
[  5]   0.00-10.00  sec  19.8 GBytes  17.0 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  19.8 GBytes  17.0 Gbits/sec                  receiver

Pretty much expected results as the two ends are using the same bridge on the same host.

How going from the VM on Host A to Host B, I am getting this in iperf3:
Code:
[  5]   0.00-10.00  sec  14.1 GBytes  12.1 Gbits/sec   30             sender
[  5]   0.00-10.00  sec  14.1 GBytes  12.1 Gbits/sec                  receiver

That was about 30% slower than expected at 12.1Gb/s. The traffic for this testing is making two hops: VM to host A and then Host A to Host B. Certainly the processing needed to handle the routing from first hop to the second hop cant take the big of a hit. Question 1: What would be causing the drop in performance/through put here?

Now going from VM host A to VM on Host B using iperf3, I am getting the following:
Code:
[  5]   0.00-10.00  sec  10.6 GBytes  9.11 Gbits/sec   25             sender
[  5]   0.00-10.00  sec  10.6 GBytes  9.11 Gbits/sec                  receiver

That was about 50% slower than expected at 9.11Gb/s. The traffic for this testing is making three hops: VM to host A, Host A to Host B, and then Host B to VM. Again I can not see the routing taking that big of a hit as it is not that comprex. Question 2: What would be causing the drop in performance/through put here?

Now doing some disk bench marking with dd on a VM with the disk sitting on the host zpool, I am getting:

Code:
[root@sauron gondor]# dd if=/dev/zero of=/gondor/test1.dat bs=1M count=20000
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB, 20 GiB) copied, 54.3383 s, 386 MB/s

[root@sauron gondor]# dd if=/dev/random of=/gondor/test2.dat bs=1M count=20000
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB, 20 GiB) copied, 136.695 s, 153 MB/s
The write performance in the VM is nowhere close to what is expected. Raw writes from /dev/zero is about 1/3 of the speed and writes from /dev/random is down about 30%. I know that there is overhead with the VM sitting on top of the zpool and handling all the abstration of hardware, but this seems waaaaaaay off. Question 3: What is causing the deduced disk performance from the VM perspective?

Now copying/moving/transfering data from one VM to the VM is the major source of pain where the transfer speeds are only around 80MB/s.
-- using tar, mbuffer, and ssh: in @ 77.9 MiB/s, out @ 77.9 MiB/s, 4202 MiB total, buffer 100% full apps/UpRev/video/06-Evans-Up
-- rysnc: sent 998,511,690 bytes received 35 bytes 60,515,862.12 bytes/sec


One thing I did notice is that the VM receiving the data has at least 50% of the cores pegged at 100% WA. What is causing the huge amount of IO wait? This might be the critical issue.
 
Overall the disk performance is what I was kinda expecting. Though I was a little surprised about the ram disk performance: /dev/zero about 1.4GB/s and the transfer of the zero and random files to the zpool. Using straight cp and rsync produced low performance transfers, but still within the expected results.
Using /dev/zero with ZFS is useless as a benchmark because of the block level compression. Just zeros is super compressible so the disks are nearly writing nothing. So for that either disable zfs compression or use /dev/random instead.

I think fio is best for benchmarking disks. You should try that: https://fio.readthedocs.io/en/latest/fio_doc.html
 
Last edited:
OK, I did use /dev/random and the host results where in the low 200MB/s range. How much of that is CPU number generation and how is disk performance. For several SSD or 15K SAS drives striped in RAID0, I would expect the write performance to be in 800+ MB/s range.
 
On one of the VMs:

[root@sauron gondor]# fio --filename=/gondor/test.dat --name=random-write --ioengine=posixaio --rw= randwrite --bs=4k --numjobs=1 --size=4g --iodepth=1 --time_based --end_fsync=1 fio: time_based requires a runtime/timeout setting random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengi ne=posixaio, iodepth=1 fio-3.26 Starting 1 process Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s] random-write: (groupid=0, jobs=1): err= 0: pid=2407: Mon Mar 21 07:11:14 2022 write: IOPS=8049, BW=31.4MiB/s (33.0MB/s)(4096MiB/130260msec); 0 zone resets slat (nsec): min=559, max=6765.7k, avg=4890.24, stdev=12423.89 clat (nsec): min=381, max=13792k, avg=21687.97, stdev=38906.36 lat (usec): min=7, max=13802, avg=26.58, stdev=41.57 clat percentiles (usec): | 1.00th=[ 8], 5.00th=[ 9], 10.00th=[ 11], 20.00th=[ 19], | 30.00th=[ 20], 40.00th=[ 20], 50.00th=[ 21], 60.00th=[ 21], | 70.00th=[ 21], 80.00th=[ 22], 90.00th=[ 27], 95.00th=[ 31], | 99.00th=[ 50], 99.50th=[ 87], 99.90th=[ 408], 99.95th=[ 578], | 99.99th=[ 1270] bw ( KiB/s): min=68264, max=273141, per=100.00%, avg=145751.95, stdev=24356.66, samples=57 iops : min=17066, max=68285, avg=36437.96, stdev=6089.17, samples=57 lat (nsec) : 500=0.01%, 750=0.01%, 1000=0.01% lat (usec) : 2=0.01%, 4=0.01%, 10=9.60%, 20=39.00%, 50=50.40% lat (usec) : 100=0.54%, 250=0.24%, 500=0.15%, 750=0.04%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01% cpu : usr=4.06%, sys=8.01%, ctx=1119344, majf=0, minf=20 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,1048576,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=31.4MiB/s (33.0MB/s), 31.4MiB/s-31.4MiB/s (33.0MB/s-33.0MB/s), io=4096MiB (4295MB), run=130260-130260msec Disk stats (read/write): sdc: ios=0/473525, merge=0/42, ticks=0/3918218, in_queue=4174008, util=16.23%

Again, most of the cores where stitting in IO wait. Watching the pool on the host pretty much gave these results the whole test"

capacity operations bandwidth pool alloc free read write read write ------------------------------------ ----- ----- ----- ----- ----- ----- vm1-zfs-r0 183G 3.45T 0 234 0 29.3M ata-CT1000BX500SSD1_1951E230E036 44.9G 883G 0 40 0 5.11M ata-CT1000BX500SSD1_1951E230E00E 44.4G 884G 0 45 0 5.74M ata-CT1000BX500SSD1_1951E230E085 49.8G 878G 0 73 0 9.23M ata-CT1000BX500SSD1_1951E230E495 43.8G 884G 0 73 0 9.23M ------------------------------------ ----- ----- ----- ----- ----- -----


This would give one of my old 386 DX/2s a run for their money.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!