50% slower NVMe Perfomance in guest

loe

New Member
Nov 11, 2024
3
0
1
Hi there,

we got a strange perfomance draw in Disk speed.

We have a freshly installed Proxmox 8.2 (upgraded to 8.3).
- HP DL380 Gen10
- 256 GB Total RAM
- 2x6144 Xeon Gold
- 2x240 GB SSD Raid 1 for Proxmox Host
- 6x1.6 TB NVME Hardware Raid10 SupremeRaid (GraidTech)
- mounted as ext4 directory

We created one single Instance with opensuse 15.6. and the read rates from the disk reduce by approx 50%.


We did simple tests with
Code:
fio

Test on the proxmox Host:
Code:
fio --filename=/dev/gdg0n1 --direct=1 --rw=randread --bs=64k --ioengine=libaio --iodepth=64 --runtime=20 --numjobs=4 --time_based --group_reporting --name=throughput-test-job --eta-newline=1 --readonly
throughput-test-job: (g=0): rw=randread, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=64
...
fio-3.33
Starting 4 processes
Jobs: 4 (f=4): [r(4)][15.0%][r=10.8GiB/s][r=177k IOPS][eta 00m:17s]
Jobs: 4 (f=4): [r(4)][25.0%][r=10.8GiB/s][r=177k IOPS][eta 00m:15s]
Jobs: 4 (f=4): [r(4)][35.0%][r=10.7GiB/s][r=176k IOPS][eta 00m:13s]
Jobs: 4 (f=4): [r(4)][45.0%][r=10.8GiB/s][r=177k IOPS][eta 00m:11s]
Jobs: 4 (f=4): [r(4)][55.0%][r=10.8GiB/s][r=177k IOPS][eta 00m:09s]
Jobs: 4 (f=4): [r(4)][65.0%][r=10.8GiB/s][r=177k IOPS][eta 00m:07s]
Jobs: 4 (f=4): [r(4)][75.0%][r=10.8GiB/s][r=177k IOPS][eta 00m:05s]
Jobs: 4 (f=4): [r(4)][85.0%][r=10.8GiB/s][r=177k IOPS][eta 00m:03s]
Jobs: 4 (f=4): [r(4)][95.0%][r=10.8GiB/s][r=177k IOPS][eta 00m:01s]
Jobs: 4 (f=4): [r(4)][100.0%][r=10.8GiB/s][r=177k IOPS][eta 00m:00s]
throughput-test-job: (groupid=0, jobs=4): err= 0: pid=139578: Fri Nov 29 17:32:01 2024
  read: IOPS=176k, BW=10.8GiB/s (11.6GB/s)(215GiB/20002msec)
    slat (usec): min=3, max=559, avg= 4.75, stdev= 1.05
    clat (usec): min=392, max=5586, avg=1445.61, stdev=372.36
     lat (usec): min=400, max=5590, avg=1450.36, stdev=372.36
    clat percentiles (usec):
     |  1.00th=[  611],  5.00th=[  750], 10.00th=[  922], 20.00th=[ 1205],
     | 30.00th=[ 1319], 40.00th=[ 1385], 50.00th=[ 1450], 60.00th=[ 1500],
     | 70.00th=[ 1582], 80.00th=[ 1696], 90.00th=[ 1975], 95.00th=[ 2147],
     | 99.00th=[ 2278], 99.50th=[ 2311], 99.90th=[ 2409], 99.95th=[ 2606],
     | 99.99th=[ 4752]
   bw (  MiB/s): min=10927, max=11106, per=100.00%, avg=11030.02, stdev= 9.51, samples=156
   iops        : min=174832, max=177710, avg=176480.15, stdev=152.13, samples=156
  lat (usec)   : 500=0.02%, 750=5.04%, 1000=6.58%
  lat (msec)   : 2=78.96%, 4=9.37%, 10=0.03%
  cpu          : usr=9.96%, sys=25.38%, ctx=2308859, majf=0, minf=28397
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=3528344,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=10.8GiB/s (11.6GB/s), 10.8GiB/s-10.8GiB/s (11.6GB/s-11.6GB/s), io=215GiB (231GB), run=20002-20002msec

Disk stats (read/write):
  gdg0n1: ios=3503912/6, merge=0/1, ticks=5053315/3, in_queue=5053317, util=100.00%


Same Test in VM:
Code:
fio --filename=/dev/sdb --direct=1 --rw=randread --bs=64k --ioengine=libaio --iodepth=64 --runtime=15 --numjobs=4 --time_based --group_reporting --name=throughput-test-job --eta-newline=1
throughput-test-job: (g=0): rw=randread, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=64
...
fio-3.23
Starting 4 processes
Jobs: 4 (f=4): [r(4)][20.0%][r=6252MiB/s][r=100k IOPS][eta 00m:12s]
Jobs: 4 (f=4): [r(4)][33.3%][r=6319MiB/s][r=101k IOPS][eta 00m:10s]
Jobs: 4 (f=4): [r(4)][46.7%][r=6242MiB/s][r=99.9k IOPS][eta 00m:08s]
Jobs: 4 (f=4): [r(4)][60.0%][r=6242MiB/s][r=99.9k IOPS][eta 00m:06s]
Jobs: 4 (f=4): [r(4)][73.3%][r=6247MiB/s][r=99.9k IOPS][eta 00m:04s]
Jobs: 4 (f=4): [r(4)][86.7%][r=6254MiB/s][r=100k IOPS][eta 00m:02s]
Jobs: 4 (f=4): [r(4)][100.0%][r=6274MiB/s][r=100k IOPS][eta 00m:00s]
throughput-test-job: (groupid=0, jobs=4): err= 0: pid=14021: Fri Nov 29 17:26:10 2024
  read: IOPS=100k, BW=6251MiB/s (6555MB/s)(91.6GiB/15003msec)
    slat (usec): min=2, max=1151, avg= 6.44, stdev=12.10
    clat (usec): min=54, max=71820, avg=2551.56, stdev=1523.41
     lat (usec): min=61, max=71843, avg=2558.11, stdev=1523.43
    clat percentiles (usec):
     |  1.00th=[  494],  5.00th=[  873], 10.00th=[ 1123], 20.00th=[ 1450],
     | 30.00th=[ 1713], 40.00th=[ 1975], 50.00th=[ 2245], 60.00th=[ 2540],
     | 70.00th=[ 2868], 80.00th=[ 3326], 90.00th=[ 4359], 95.00th=[ 5473],
     | 99.00th=[ 7701], 99.50th=[ 8586], 99.90th=[10683], 99.95th=[11994],
     | 99.99th=[24249]
   bw (  MiB/s): min= 5392, max= 6964, per=100.00%, avg=6259.38, stdev=62.92, samples=116
   iops        : min=86278, max=111430, avg=100149.90, stdev=1006.67, samples=116
  lat (usec)   : 100=0.01%, 250=0.10%, 500=0.93%, 750=2.23%, 1000=3.93%
  lat (msec)   : 2=34.05%, 4=46.02%, 10=12.56%, 20=0.15%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu          : usr=6.82%, sys=20.32%, ctx=766993, majf=0, minf=2124
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=1500518,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=6251MiB/s (6555MB/s), 6251MiB/s-6251MiB/s (6555MB/s-6555MB/s), io=91.6GiB (98.3GB), run=15003-15003msec

Disk stats (read/write):
  sdb: ios=1489747/0, merge=7/0, ticks=3625738/0, in_queue=3625738, util=99.50%


VM-Config:
Code:
agent: 1
balloon: 0
bios: seabios
boot: order=scsi0;ide2;net0
cores: 16
cpu: host
ide2: local:iso/openSUSE-Leap-15.6-NET-x86_64-Media.iso,media=cdrom,size=261M
machine: q35,viommu=intel
memory: 128000
meta: creation-qemu=9.0.2,ctime=1732894143
name: svb-server17
net0: virtio=BC:24:11:39:43:B7,bridge=vmbr0,firewall=1
numa: 1
ostype: l26
scsi0: graid:105/vm-105-disk-0.qcow2,aio=native,cache=none,iothread=1,size=150G
scsi1: graid:105/vm-105-disk-1.qcow2,aio=io_uring,cache=directsync,iothread=1,size=250G
scsi2: graid:105/vm-105-disk-2.qcow2,aio=native,backup=0,cache=directsync,size=150G
scsihw: virtio-scsi-single
smbios1: uuid=53da87cc-950f-4692-b4c9-2871c
sockets: 2
vmgenid: ee5c1e62-ffb4-40fe-8f6c-494dbbd1e3

We already tried with differet aio/cache settings. The read perfomance changes between 5.500 and 6.200 - but not more.

Any ideas / is there anyway to raise the VM-Disk perfomance?

A similar thread here on basic NVMe perfomance: https://forum.proxmox.com/threads/nvme-pcie-5-benchmarking.158056/
 
Tested - a small step:
1732903391791.png

Test with RAW device - virtio:

Code:
fio --filename=/dev/vda --direct=1 --rw=randread --bs=64k --ioengine=libaio --iodepth=64 --numjobs=4 --time_based --group_reporting --name=throughput-test-job --eta-newline=1  --runtime=20
throughput-test-job: (g=0): rw=randread, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=64
...
fio-3.23
Starting 4 processes
Jobs: 4 (f=4): [r(4)][15.0%][r=6737MiB/s][r=108k IOPS][eta 00m:17s]
Jobs: 4 (f=4): [r(4)][25.0%][r=6783MiB/s][r=109k IOPS][eta 00m:15s]
Jobs: 4 (f=4): [r(4)][35.0%][r=6782MiB/s][r=109k IOPS][eta 00m:13s]
Jobs: 4 (f=4): [r(4)][45.0%][r=7373MiB/s][r=118k IOPS][eta 00m:11s]
Jobs: 4 (f=4): [r(4)][55.0%][r=6668MiB/s][r=107k IOPS][eta 00m:09s]
Jobs: 4 (f=4): [r(4)][65.0%][r=6676MiB/s][r=107k IOPS][eta 00m:07s]
Jobs: 4 (f=4): [r(4)][75.0%][r=6899MiB/s][r=110k IOPS][eta 00m:05s]
Jobs: 4 (f=4): [r(4)][85.0%][r=6676MiB/s][r=107k IOPS][eta 00m:03s]
Jobs: 4 (f=4): [r(4)][95.0%][r=6640MiB/s][r=106k IOPS][eta 00m:01s]
Jobs: 4 (f=4): [r(4)][100.0%][r=6656MiB/s][r=107k IOPS][eta 00m:00s]
throughput-test-job: (groupid=0, jobs=4): err= 0: pid=24859: Fri Nov 29 18:58:29 2024
  read: IOPS=108k, BW=6771MiB/s (7100MB/s)(132GiB/20003msec)
    slat (nsec): min=1943, max=730367, avg=3651.73, stdev=2925.91
    clat (usec): min=530, max=5702, avg=2358.35, stdev=258.67
     lat (usec): min=536, max=6433, avg=2362.10, stdev=258.06
    clat percentiles (usec):
     |  1.00th=[ 1565],  5.00th=[ 1958], 10.00th=[ 2057], 20.00th=[ 2212],
     | 30.00th=[ 2278], 40.00th=[ 2343], 50.00th=[ 2376], 60.00th=[ 2409],
     | 70.00th=[ 2442], 80.00th=[ 2474], 90.00th=[ 2638], 95.00th=[ 2802],
     | 99.00th=[ 3163], 99.50th=[ 3359], 99.90th=[ 3523], 99.95th=[ 3556],
     | 99.99th=[ 3752]
   bw (  MiB/s): min= 6612, max= 7590, per=100.00%, avg=6786.35, stdev=52.32, samples=156
   iops        : min=105800, max=121444, avg=108581.26, stdev=837.10, samples=156
  lat (usec)   : 750=0.01%, 1000=0.01%
  lat (msec)   : 2=6.70%, 4=93.30%, 10=0.01%
  cpu          : usr=6.43%, sys=12.00%, ctx=59238, majf=0, minf=2939
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=2167114,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=6771MiB/s (7100MB/s), 6771MiB/s-6771MiB/s (7100MB/s-7100MB/s), io=132GiB (142GB), run=20003-20003msec

Disk stats (read/write):
  vda: ios=2155674/4005, merge=0/0, ticks=4617764/10524, in_queue=4633728, util=99.66%

Tested also with virtio as QCOW2- now i am > 7.000

Code:
fio --filename=/dev/vdc --direct=1 --rw=randread --bs=64k --ioengine=libaio --iodepth=64 --numjobs=4
--time_based --group_reporting --name=throughput-test-job --eta-newline=1  --runtime=20
throughput-test-job: (g=0): rw=randread, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=64
...
fio-3.23
Starting 4 processes
Jobs: 4 (f=4): [r(4)][15.0%][r=7198MiB/s][r=115k IOPS][eta 00m:17s]
Jobs: 4 (f=4): [r(4)][28.6%][r=7140MiB/s][r=114k IOPS][eta 00m:15s]
Jobs: 4 (f=4): [r(4)][35.0%][r=7095MiB/s][r=114k IOPS][eta 00m:13s]
Jobs: 4 (f=4): [r(4)][47.6%][r=7176MiB/s][r=115k IOPS][eta 00m:11s]
Jobs: 4 (f=4): [r(4)][55.0%][r=8091MiB/s][r=129k IOPS][eta 00m:09s]
Jobs: 4 (f=4): [r(4)][65.0%][r=8013MiB/s][r=128k IOPS][eta 00m:07s]
Jobs: 4 (f=4): [r(4)][75.0%][r=7145MiB/s][r=114k IOPS][eta 00m:05s]
Jobs: 4 (f=4): [r(4)][85.0%][r=7763MiB/s][r=124k IOPS][eta 00m:03s]
Jobs: 4 (f=4): [r(4)][95.0%][r=7788MiB/s][r=125k IOPS][eta 00m:01s]
Jobs: 4 (f=4): [r(4)][100.0%][r=7736MiB/s][r=124k IOPS][eta 00m:00s]
throughput-test-job: (groupid=0, jobs=4): err= 0: pid=24874: Fri Nov 29 19:00:20 2024
  read: IOPS=119k, BW=7447MiB/s (7809MB/s)(145GiB/20002msec)
    slat (nsec): min=1904, max=749387, avg=3122.15, stdev=1882.07
    clat (usec): min=577, max=7027, avg=2144.51, stdev=213.21
     lat (usec): min=594, max=7030, avg=2147.73, stdev=213.01
    clat percentiles (usec):
     |  1.00th=[ 1582],  5.00th=[ 1860], 10.00th=[ 1909], 20.00th=[ 1975],
     | 30.00th=[ 2024], 40.00th=[ 2114], 50.00th=[ 2180], 60.00th=[ 2212],
     | 70.00th=[ 2245], 80.00th=[ 2278], 90.00th=[ 2343], 95.00th=[ 2442],
     | 99.00th=[ 2769], 99.50th=[ 2900], 99.90th=[ 3261], 99.95th=[ 3326],
     | 99.99th=[ 3621]
   bw (  MiB/s): min= 7025, max= 8230, per=100.00%, avg=7466.55, stdev=104.01, samples=156
   iops        : min=112404, max=131686, avg=119464.64, stdev=1664.09, samples=156
  lat (usec)   : 750=0.01%, 1000=0.01%
  lat (msec)   : 2=25.81%, 4=74.18%, 10=0.01%
  cpu          : usr=5.69%, sys=11.27%, ctx=69739, majf=0, minf=2724
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=2383402,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=7447MiB/s (7809MB/s), 7447MiB/s-7447MiB/s (7809MB/s-7809MB/s), io=145GiB (156GB), run=20002-20002msec

Disk stats (read/write):
  vdc: ios=2371130/0, merge=0/0, ticks=4670668/0, in_queue=4670669, util=99.66%
 
Last edited:
ok so i should simply accept my values?
Playing around a bit with settings - i could maximize to 8.500 (without partition in vm) wich is approx 80% which is ok
 
Well, ask yourself, how realistic is the test to the workload?
Can your settings harm the applications? E.g. write-cache in RAM, which is not transparent for the application, can lead to inconsistent data in databases on a software failure on the hypervisor or a power-outage.
Why you use virtualization at all? It may be sufficient to have 20% of bare-metal performance when running 10 VMs...
In case you do not know the workload, prefer stability and integrity over performance.