Yet another disk perf issue (Host vs VM)

jmce

New Member
Aug 6, 2019
2
0
1
53
Hello everyone,

We are experiencing ProxMox 5.4-3 on a Dell PowerEdge R620 before rolling it out on a multi host cluster.
Globally, the experience is good except about disk performances.

The R620 is running 3 x 2Tb Hynix SSDs on a PERC H710 (configured in RAID5 : Element Size 64Kb, No Read Ahead, Write Through)
We went with the default disk setup of Proxmox so lvm-thin/raw.

We tested two different VMs : linux and windows in order to measure performances. For this, we used fio as recommended on this forum and in Proxmox doc.
Both VMs are using a VirtIO SCSI controller. Hard disk is configured with Cache = Write back.
Multiple other kind of virtual controllers (and cache configs) have been tested with similar (or worse) results.

Any help would be appreciated in order to understand why performances are so different (and not acceptable ?!) between host and vms.
If any other test with fio is usefull, just let us know !

Thanks a lot for your help !

Here are the outputs of fio :

Proxmox Host :
============

fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite - -bs=4k --direct=0 --size=2048M --numjobs=2 --runtime=240 --group_reporting
randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
...
fio-2.16
Starting 2 processes
randwrite: Laying out IO file(s) (1 file(s) / 2048MB)
randwrite: Laying out IO file(s) (1 file(s) / 2048MB)
Jobs: 2 (f=2): [w(2)] [75.0% done] [0KB/1129MB/0KB /s] [0/289K/0 iops] [eta 00m: Jobs: 2 (f=2): [w(2)] [100.0% done] [0KB/1132MB/0KB /s] [0/290K/0 iops] [eta 00m :00s]
randwrite: (groupid=0, jobs=2): err= 0: pid=8703: Wed Aug 7 13:16:38 2019
write: io=4096.0MB, bw=1005.5MB/s, iops=257382, runt= 4074msec
slat (usec): min=3, max=170, avg= 5.64, stdev= 3.19
clat (usec): min=0, max=162, avg= 0.95, stdev= 0.91
lat (usec): min=3, max=176, avg= 6.59, stdev= 3.49
clat percentiles (usec):
| 1.00th=[ 0], 5.00th=[ 0], 10.00th=[ 0], 20.00th=[ 1],
| 30.00th=[ 1], 40.00th=[ 1], 50.00th=[ 1], 60.00th=[ 1],
| 70.00th=[ 1], 80.00th=[ 1], 90.00th=[ 1], 95.00th=[ 2],
| 99.00th=[ 3], 99.50th=[ 3], 99.90th=[ 15], 99.95th=[ 15],
| 99.99th=[ 17]
lat (usec) : 2=91.47%, 4=8.23%, 10=0.06%, 20=0.24%, 50=0.01%
lat (usec) : 100=0.01%, 250=0.01%
cpu : usr=21.76%, sys=78.10%, ctx=14, majf=0, minf=20
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=1048576/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: io=4096.0MB, aggrb=1005.5MB/s, minb=1005.5MB/s, maxb=1005.5MB/s, mint=4 074msec, maxt=4074msec
Disk stats (read/write):
dm-1: ios=0/79, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=42/52, aggrmerge=0/102, aggrticks=4/28, aggrin_queue=32, aggrutil=0.47%
sda: ios=42/52, merge=0/102, ticks=4/28, in_queue=32, util=0.47%

Linux VM :
========

~$ fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite --bs=4k --direct=0 --size=2048M --numjobs=2 --runtime=240 --group_reporting
randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
...
fio-2.2.10
Starting 2 processes
Jobs: 1 (f=1): [_(1),w(1)] [99.5% done] [0KB/235.5MB/0KB /s] [0/60.3K/0 iops] [eta 00m:01s]
randwrite: (groupid=0, jobs=2): err= 0: pid=18176: Wed Aug 7 13:29:53 2019
write: io=4096.0MB, bw=22252KB/s, iops=5563, runt=188490msec
slat (usec): min=4, max=1726.8K, avg=209.62, stdev=4461.97
clat (usec): min=1, max=11352, avg= 5.72, stdev=38.93
lat (usec): min=7, max=1726.8K, avg=217.97, stdev=4462.47
clat percentiles (usec):
| 1.00th=[ 2], 5.00th=[ 2], 10.00th=[ 2], 20.00th=[ 4],
| 30.00th=[ 5], 40.00th=[ 5], 50.00th=[ 5], 60.00th=[ 6],
| 70.00th=[ 6], 80.00th=[ 6], 90.00th=[ 6], 95.00th=[ 7],
| 99.00th=[ 9], 99.50th=[ 11], 99.90th=[ 72], 99.95th=[ 225],
| 99.99th=[ 1992]
bw (KB /s): min= 0, max=295832, per=80.11%, avg=17827.12, stdev=24887.10
lat (usec) : 2=0.06%, 4=17.06%, 10=81.96%, 20=0.68%, 50=0.13%
lat (usec) : 100=0.01%, 250=0.05%, 500=0.02%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%
cpu : usr=2.29%, sys=18.69%, ctx=132238, majf=0, minf=20
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=1048576/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: io=4096.0MB, aggrb=22252KB/s, minb=22252KB/s, maxb=22252KB/s, mint=188490msec, maxt=188490msec
Disk stats (read/write):
sda: ios=24/812083, merge=0/212747, ticks=484/7583520, in_queue=7566944, util=95.81%

Windows VM :
==========

C:\Temp>fio --name=randwrite --ioengine=windowsaio --iodepth=1 --rw=randwrite --bs=4k --direct=0 --size=2048M --numjobs=2 --runtime=240 --group_reporting
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=windowsaio, iodepth=1
...
fio-3.13
Starting 2 threads
randwrite: Laying out IO file (1 file / 2048MiB)
randwrite: Laying out IO file (1 file / 2048MiB)
Jobs: 1 (f=1): [_(1),w(1)][100.0%][w=28.5MiB/s][w=7302 IOPS][eta 00m:00s]
randwrite: (groupid=0, jobs=2): err= 0: pid=14508: Wed Aug 7 04:23:22 2019
write: IOPS=16.6k, BW=64.7MiB/s (67.8MB/s)(4096MiB/63318msec)
slat (usec): min=9, max=157105, avg=40.22, stdev=183.19
clat (nsec): min=500, max=13027M, avg=60910.45, stdev=14846335.69
lat (usec): min=14, max=13184k, avg=101.14, stdev=15003.00
clat percentiles (nsec):
| 1.00th=[ 1208], 5.00th=[ 15296], 10.00th=[ 16512],
| 20.00th=[ 17024], 30.00th=[ 17280], 40.00th=[ 17792],
| 50.00th=[ 18560], 60.00th=[ 20352], 70.00th=[ 26496],
| 80.00th=[ 39680], 90.00th=[ 49920], 95.00th=[ 74240],
| 99.00th=[ 218112], 99.50th=[ 288768], 99.90th=[ 782336],
| 99.95th=[1044480], 99.99th=[1761280]
bw ( KiB/s): min= 23, max=177349, per=100.00%, avg=103436.66, stdev=24457.09, samples=170
iops : min= 5, max=44337, avg=25858.74, stdev=6114.27, samples=170
lat (nsec) : 750=0.01%, 1000=0.24%
lat (usec) : 2=1.19%, 4=0.58%, 10=0.92%, 20=56.13%, 50=31.01%
lat (usec) : 100=6.17%, 250=3.06%, 500=0.49%, 750=0.09%, 1000=0.05%
lat (msec) : 2=0.05%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%
cpu : usr=7.17%, sys=29.57%, ctx=0, majf=0, minf=0
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,1048576,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=64.7MiB/s (67.8MB/s), 64.7MiB/s-64.7MiB/s (67.8MB/s-67.8MB/s), io=4096MiB (4295MB), run=63318-63318msec
 
UP : Just as a follow-up, as it doesn't seem that my description rang a bell to anyone, we will benchmark another hypervisor in order to see if it reaches acceptable disk performances. I'm however keeping this thread open in case someone has a clue about that disk performance issue (and in order to update it myself in case someone would have the same issue).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!