Hi all,
I am having some pretty severe problems when it comes to disk performance. Basically, the speeds that PVE reports when running tests is nothing like the speeds that are shown inside a VM when running that exact same test. I was about to buy a new NVMe, thinking that it was the disk itself that is the problem, but I can see that if I went down this path I would only be fighting the symptom and not the cause.
For example, these are the FIO results I get when running the test from Proxmox.
When I SSH into the VM and run the exact same test, the results are really bad.
This translates to a:
Read reduction of -56.98%
Write reduction of -57.04%
I know with virtualisation comes extra resource overheads, but surely it isn't this bad?
Proxmox information:
Linux 6.2.16-11-bpo11-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-11~bpo11+2 (2023-09-04T14:49Z)
pve-manager/7.4-17/513c62be
VM information:
Ubuntu Server 22.04.3
Linux 6.2.0-39
Disk type is LVM-Thin
SCSI Controller: VirtIO SCSI single
Hard Disk: aio=io_uring,discard=on,iothread=1,ssd=1
I've also just installed the QEMU agent but that had no impact on the test results.
I am having some pretty severe problems when it comes to disk performance. Basically, the speeds that PVE reports when running tests is nothing like the speeds that are shown inside a VM when running that exact same test. I was about to buy a new NVMe, thinking that it was the disk itself that is the problem, but I can see that if I went down this path I would only be fighting the symptom and not the cause.
For example, these are the FIO results I get when running the test from Proxmox.
Code:
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [m(1)][100.0%][r=514MiB/s,w=170MiB/s][r=132k,w=43.6k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=1796551: Sun Dec 17 13:00:50 2023
read: IOPS=129k, BW=506MiB/s (530MB/s)(3070MiB/6069msec)
bw ( KiB/s): min=344256, max=585616, per=99.93%, avg=517602.00, stdev=64283.77, samples=12
iops : min=86064, max=146404, avg=129400.50, stdev=16070.94, samples=12
write: IOPS=43.3k, BW=169MiB/s (177MB/s)(1026MiB/6069msec); 0 zone resets
bw ( KiB/s): min=115072, max=196960, per=99.92%, avg=172975.33, stdev=21894.97, samples=12
iops : min=28768, max=49240, avg=43243.83, stdev=5473.74, samples=12
cpu : usr=15.18%, sys=72.48%, ctx=30790, majf=0, minf=7
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=506MiB/s (530MB/s), 506MiB/s-506MiB/s (530MB/s-530MB/s), io=3070MiB (3219MB), run=6069-6069msec
WRITE: bw=169MiB/s (177MB/s), 169MiB/s-169MiB/s (177MB/s-177MB/s), io=1026MiB (1076MB), run=6069-6069msec
When I SSH into the VM and run the exact same test, the results are really bad.
Code:
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.28
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=228MiB/s,w=75.2MiB/s][r=58.5k,w=19.3k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2853: Sun Dec 17 12:52:20 2023
read: IOPS=55.5k, BW=217MiB/s (228MB/s)(3070MiB/14148msec)
bw ( KiB/s): min=142842, max=255696, per=100.00%, avg=222202.07, stdev=31067.74, samples=28
iops : min=35710, max=63924, avg=55550.50, stdev=7766.98, samples=28
write: IOPS=18.6k, BW=72.5MiB/s (76.0MB/s)(1026MiB/14148msec); 0 zone resets
bw ( KiB/s): min=47337, max=85096, per=99.99%, avg=74252.32, stdev=10239.74, samples=28
iops : min=11834, max=21274, avg=18563.07, stdev=2559.96, samples=28
cpu : usr=8.59%, sys=78.74%, ctx=235996, majf=0, minf=7
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=217MiB/s (228MB/s), 217MiB/s-217MiB/s (228MB/s-228MB/s), io=3070MiB (3219MB), run=14148-14148msec
WRITE: bw=72.5MiB/s (76.0MB/s), 72.5MiB/s-72.5MiB/s (76.0MB/s-76.0MB/s), io=1026MiB (1076MB), run=14148-14148msec
Disk stats (read/write):
dm-0: ios=778690/260239, merge=0/0, ticks=91932/9152, in_queue=101084, util=99.40%, aggrios=785920/262691, aggrmerge=0/8, aggrticks=96697/10822, aggrin_queue=107535, aggrutil=99.08%
sda: ios=785920/262691, merge=0/8, ticks=96697/10822, in_queue=107535, util=99.08%
This translates to a:
Read reduction of -56.98%
Write reduction of -57.04%
I know with virtualisation comes extra resource overheads, but surely it isn't this bad?
Proxmox information:
Linux 6.2.16-11-bpo11-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-11~bpo11+2 (2023-09-04T14:49Z)
pve-manager/7.4-17/513c62be
VM information:
Ubuntu Server 22.04.3
Linux 6.2.0-39
Disk type is LVM-Thin
SCSI Controller: VirtIO SCSI single
Hard Disk: aio=io_uring,discard=on,iothread=1,ssd=1
I've also just installed the QEMU agent but that had no impact on the test results.