Significant IOPS Drop in Proxmox 9.1.9 VMs Compared to Other Hypervisors

koovis

New Member
Nov 18, 2025
3
0
1
I am investigating a performance issue where IOPS in Proxmox 9.1.9 VMs is significantly lower than in other virtualization environments, despite having normal throughput.

1. IOPS Benchmark (Random Read)I used the following fio command to measure 4K Random Read performance:

fio --name=randread --ioengine=libaio --direct=1 --bs=4k --iodepth=64 --size=1G --readwrite=randread --runtime=60 --group_reporting

  • Proxmox 9.1.9 Host (Bare-metal): ~200k IOPS
  • Proxmox 9.1.9 VM (Idle state): 100k ~ 110k IOPS
  • RHEL9 KVM VM: 160k ~ 170k IOPS
  • VMware 9 VM: ~200k IOPS
2. Throughput Benchmark (Sequential Write)Interestingly, when measuring throughput with sequential writes, the results were nearly identical across the bare-metal host and all three virtualized environments:

fio --name=seqwrite --ioengine=libaio --direct=1 --bs=1m --iodepth=16 --size=1G --readwrite=write --runtime=60 --group_reporting

The Problem:While throughput remains consistent across all platforms, Proxmox alone shows a nearly 50% drop in IOPS compared to the host, whereas VMware maintains near-native performance and RHEL KVM shows much less overhead.

Is there a specific reason why IOPS is uniquely throttled or degraded in Proxmox 9? I would appreciate any insights into why this discrepancy exists only for IOPS and not for throughput.
 
You did not say anything of the underlying filesystems...?

For ZFS it may be expected as ZFS does a lot more work than classic filesystems under the hood, resulting in multiple IO-operations used for metadata etc...
 
  • Like
Reactions: Johannes S and news
I updated to version 9.1.19 today and re-ran the tests. Here are the latest results:


Hardware Spec
CPU : AMD Ryzen Threadripper PRO 5955WX 16-Cores
RAM : 16GB
SSD : Samsung PCIe4 NVME SSD 1TB

I installed Proxmox using all the default settings.

Linux dr16 7.0.2-6-pve #1 SMP PREEMPT_DYNAMIC PMX 7.0.2-6 (2026-05-20T08:55Z) x86_64 GNU/Linux

Since I went with the default installation options, I assume the system partition is formatted as ext4 and the VM disks are stored on LVM (LVM-Thin). Is this correct?

Bash:
root@dr16:~# df -hPT
Filesystem           Type      Size  Used Avail Use% Mounted on
udev                 devtmpfs  6.9G     0  6.9G   0% /dev
tmpfs                tmpfs     1.6G  2.5M  1.6G   1% /run
/dev/mapper/pve-root ext4       94G   19G   71G  21% /
tmpfs                tmpfs     7.8G   55M  7.7G   1% /dev/shm
efivarfs             efivarfs  128K   62K   62K  50% /sys/firmware/efi/efivars
tmpfs                tmpfs     5.0M     0  5.0M   0% /run/lock
tmpfs                tmpfs     1.0M     0  1.0M   0% /run/credentials/systemd-journald.service
tmpfs                tmpfs     7.8G     0  7.8G   0% /tmp
/dev/nvme0n1p2       vfat     1022M  9.1M 1013M   1% /boot/efi
/dev/fuse            fuse      128M   16K  128M   1% /etc/pve
tmpfs                tmpfs     1.0M     0  1.0M   0% /run/credentials/getty@tty1.service
tmpfs                tmpfs     1.6G  4.0K  1.6G   1% /run/user/0

root@dr16:~# lvs
  LV            VG  Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data          pve twi-aotz-- <816.21g             0.61   0.25
  root          pve -wi-ao----   96.00g
  swap          pve -wi-ao----    8.00g
  vm-100-disk-0 pve Vwi-aotz--  100.00g data        4.95

Proxmox node benchmark Result
  • no - vm loaded (not running)
Bash:
root@dr16:~# fio --name=randread --ioengine=libaio --direct=1 --bs=4k --iodepth=64 --size=10G --readwrite=randread --runtime=60 --group_reporting
randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.39
Starting 1 process
randread: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [r(1)][100.0%][r=649MiB/s][r=166k IOPS][eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=53553: Thu May 21 14:53:52 2026
  read: IOPS=165k, BW=644MiB/s (675MB/s)(10.0GiB/15903msec)
    slat (nsec): min=3056, max=70474, avg=4619.04, stdev=1724.72
    clat (usec): min=53, max=8834, avg=379.47, stdev=41.61
     lat (usec): min=56, max=8841, avg=384.09, stdev=41.63
    clat percentiles (usec):
     |  1.00th=[  367],  5.00th=[  371], 10.00th=[  371], 20.00th=[  375],
     | 30.00th=[  375], 40.00th=[  375], 50.00th=[  379], 60.00th=[  379],
     | 70.00th=[  383], 80.00th=[  383], 90.00th=[  388], 95.00th=[  392],
     | 99.00th=[  404], 99.50th=[  408], 99.90th=[  424], 99.95th=[  457],
     | 99.99th=[  775]
   bw (  KiB/s): min=447424, max=669144, per=100.00%, avg=659588.65, stdev=39399.72, samples=31
   iops        : min=111856, max=167286, avg=164897.16, stdev=9849.93, samples=31
  lat (usec)   : 100=0.01%, 250=0.01%, 500=99.96%, 750=0.02%, 1000=0.01%
  lat (msec)   : 2=0.01%, 10=0.01%
  cpu          : usr=21.63%, sys=78.29%, ctx=263, majf=0, minf=75
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=2621440,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=644MiB/s (675MB/s), 644MiB/s-644MiB/s (675MB/s-675MB/s), io=10.0GiB (10.7GB), run=15903-15903msec

Disk stats (read/write):
    dm-1: ios=2586172/125, sectors=20689376/1344, merge=0/0, ticks=102695/12, in_queue=102707, util=98.42%, aggrios=2621448/29, aggsectors=20973568/1344, aggrmerge=0/97, aggrticks=103061/18, aggrin_queue=103081, aggrutil=91.48%
  nvme0n1: ios=2621448/29, sectors=20973568/1344, merge=0/97, ticks=103061/18, in_queue=103081, util=91.48%

IOPS : 160k ~ 170k


### in VM benchmark
# VM created for default setting
# Q35
# VirtIO SCSI Single
# Local-lvm 100GB
# Rocky 9
# xfs

Bash:
[root@rocky9 ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k --iodepth=64 --size=10G --readwrite=randread --runtime=60 --group_reporting
randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.35
Starting 1 process
randread: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [r(1)][100.0%][r=428MiB/s][r=110k IOPS][eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=4841: Thu May 21 14:56:51 2026
  read: IOPS=110k, BW=429MiB/s (450MB/s)(10.0GiB/23848msec)
    slat (usec): min=3, max=128, avg= 7.66, stdev= 3.30
    clat (usec): min=52, max=9264, avg=573.13, stdev=53.74
     lat (usec): min=59, max=9276, avg=580.79, stdev=53.89
    clat percentiles (usec):
     |  1.00th=[  494],  5.00th=[  519], 10.00th=[  537], 20.00th=[  545],
     | 30.00th=[  562], 40.00th=[  570], 50.00th=[  578], 60.00th=[  586],
     | 70.00th=[  586], 80.00th=[  603], 90.00th=[  611], 95.00th=[  619],
     | 99.00th=[  635], 99.50th=[  644], 99.90th=[  824], 99.95th=[  906],
     | 99.99th=[ 1123]
   bw (  KiB/s): min=400936, max=449088, per=100.00%, avg=439879.15, stdev=6226.87, samples=47
   iops        : min=100234, max=112272, avg=109969.79, stdev=1556.72, samples=47
  lat (usec)   : 100=0.01%, 250=0.01%, 500=1.55%, 750=98.22%, 1000=0.20%
  lat (msec)   : 2=0.03%, 10=0.01%
  cpu          : usr=15.71%, sys=84.15%, ctx=136, majf=0, minf=73
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=2621440,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=429MiB/s (450MB/s), 429MiB/s-429MiB/s (450MB/s-450MB/s), io=10.0GiB (10.7GB), run=23848-23848msec

Disk stats (read/write):
    dm-0: ios=2617860/3, merge=0/0, ticks=157067/5, in_queue=157072, util=99.46%, aggrios=2621440/3, aggrmerge=0/0, aggrticks=164755/5, aggrin_queue=164759, aggrutil=80.79%
  sda: ios=2621440/3, merge=0/0, ticks=164755/5, in_queue=164759, util=80.79%

IOPS : 100k~110k


#### RESULT

Proxmox (Baremetal) : 160k~170k
VM on Proxmox : 100k~110k

RHEL9 (Baremetal, LVM) : 160k~170k
VM on RHEL9 (KVM, disk on host's directory/file QCOW2) : 160k~170k

VM on VMWare 9(ESXi) : 150k~200k (possibly cached?)



All tests were performed on the same machine.
 
Last edited:
I have identified the cause of the performance discrepancy.

1. Hardware Specifications

  • CPU: AMD Ryzen Threadripper PRO 5955WX (16 Cores / 32 Threads)
  • RAM: 16GB
  • Storage: Samsung PCIe Gen4 NVMe SSD 1TB
  • Note: All benchmarks were conducted on the exact same physical hardware to ensure consistency.

2. Test Environment & VM Configurations

  • Guest OS: Rocky Linux 9 (Minimal Installation, fully updated)
  • VM Specs: 4 vCPUs, 8GB RAM, 100GB Disk (XFS Filesystem)
  • FIO Workloads:
  • 4K Random Read: fio --name=randread --ioengine=libaio --direct=1 --bs=4k --iodepth=64 --size=10G --readwrite=randread --runtime=60 --group_reporting
  • 1M Sequential Write: fio --name=seqwrite --ioengine=libaio --direct=1 --bs=1m --iodepth=16 --size=10G --readwrite=write --runtime=60 --group_reporting

3. Benchmark Results

Table 1: 4K Random Read Performance (IOPS & Latency Focus)

Test Environment
Storage Backend / Format
IOPS
Bandwidth (BW)
CPU Usage (usr / sys)
Proxmox Host
Baremetal (Native)
166k
647 MiB/s
21.30% / 78.61%
Proxmox VM
local (Directory / Raw)
173k
676 MiB/s
19.31% / 75.50%
Proxmox VM
local-lvm (LVM-Thin)
109k
426 MiB/s
14.93% / 84.82%
RHEL 9 Host
Baremetal (LVM / XFS)
173k
678 MiB/s
17.33% / 59.60%
RHEL 9 VM
KVM (Host Dir / QCOW2)
158k
618 MiB/s
14.56% / 84.75%
VMware 9 VM
ESXi (VMFS6 Datastore)
197k
770 MiB/s
17.67% / 82.01%

Table 2: 1M Sequential Write Performance (Throughput Focus)

Test Environment
Storage Backend / Format
IOPS
Bandwidth (BW)
CPU Usage (usr / sys)
Proxmox Host
Baremetal (Native)
5,014
5,015 MiB/s
3.28% / 21.85%
Proxmox VM
local (Directory / Raw)
5,009
5,010 MiB/s
2.89% / 7.49%
Proxmox VM
local-lvm (LVM-Thin)
4,958
4,959 MiB/s
3.78% / 7.36%
RHEL 9 Host
Baremetal (LVM / XFS)
5,004
5,005 MiB/s
2.74% / 9.98%
RHEL 9 VM
KVM (Host Dir / QCOW2)
1,551
1,551 MiB/s
5.71% / 2.51%
VMware 9 VM
ESXi (VMFS6 Datastore)
4,749
4,750 MiB/s
3.48% / 9.61%

4. Key Insights & Technical Analysis

① Proxmox Backend Discrepancy: local vs local-lvm

  • The Problem: In 4K Random Reads, Proxmox's file-based local directory backend shows excellent performance (173k IOPS), even slightly outperforming the baremetal host due to QEMU/Host-side page caching alignment. However, the block-based local-lvm (LVM-Thin) drops drastically to 109k IOPS, with average latency increasing from 365 $\mu s$ to 578 $\mu s$.
  • Root Cause: This degradation is caused by the virtualization metadata layer overhead unique to LVM-Thin. Under high queue depths (iodepth=64), managing the thin-pool block mapping allocations induces lookup latency and metadata lock contention.
  • Sequential Exception: For 1M Sequential Writes, the metadata lookup overhead becomes negligible because the number of I/O operations decreases significantly. As a result, both Proxmox backends saturate the physical PCIe Gen4 x4 link at approximately 5.0 GiB/s.

② RHEL 9 VM Sequential Write Bottleneck (QCOW2)

  • Observation: The RHEL 9 VM demonstrates solid 4K Random Read performance (158k IOPS) but encounters a severe bottleneck during Sequential Writes, dropping to 1,551 MiB/s (a ~70% performance loss compared to host).
  • Root Cause: This is a classic "Virtualization Tax" imposed by the QCOW2 image format on a local directory. When writing large sequential blocks, the dynamic cluster allocation and metadata updates within the QCOW2 file create an intensive write bottleneck. Switching to a pre-allocated QCOW2 or a raw image file would restore the native ~5.0 GiB/s throughput.

③ VMware 9 (ESXi) Aggressive Read Caching

  • Observation: VMware ESXi achieves the highest random read metric at 197k IOPS with the lowest latency (320 $\mu s$).
  • Root Cause: This near-native (and sometimes hyper-native) result is driven by the VMkernel's aggressive read-ahead caching mechanisms and highly optimized storage stack (VMFS6), which effectively filters and buffers small 4K random I/O packets at the hypervisor level.

 
I wonder about the discrepancy between block- and file-based storage. I remember several reports, that block-storage actually perform äs better since it avoids the overhead of a filesystem.
 
  • Like
Reactions: UdoB
I remember several reports, that block-storage actually perform äs better since it avoids the overhead of a filesystem.
Yes, that's my understanding too.

Only when the (additional!) filesystem layer has an additional read-cache or an additional write-cache (lying about "finished writing your datablock") in any way, it may actually look faster.

Disclaimer: with total conviction but w/o proof...