Significant IOPS Drop in Proxmox 9.1.9 VMs Compared to Other Hypervisors

koovis · May 20, 2026

I am investigating a performance issue where IOPS in Proxmox 9.1.9 VMs is significantly lower than in other virtualization environments, despite having normal throughput.

1. IOPS Benchmark (Random Read)I used the following fio command to measure 4K Random Read performance:

fio --name=randread --ioengine=libaio --direct=1 --bs=4k --iodepth=64 --size=1G --readwrite=randread --runtime=60 --group_reporting

Proxmox 9.1.9 Host (Bare-metal): ~200k IOPS
Proxmox 9.1.9 VM (Idle state): 100k ~ 110k IOPS
RHEL9 KVM VM: 160k ~ 170k IOPS
VMware 9 VM: ~200k IOPS

2. Throughput Benchmark (Sequential Write)Interestingly, when measuring throughput with sequential writes, the results were nearly identical across the bare-metal host and all three virtualized environments:

fio --name=seqwrite --ioengine=libaio --direct=1 --bs=1m --iodepth=16 --size=1G --readwrite=write --runtime=60 --group_reporting

The Problem:While throughput remains consistent across all platforms, Proxmox alone shows a nearly 50% drop in IOPS compared to the host, whereas VMware maintains near-native performance and RHEL KVM shows much less overhead.

Is there a specific reason why IOPS is uniquely throttled or degraded in Proxmox 9? I would appreciate any insights into why this discrepancy exists only for IOPS and not for throughput.

UdoB · May 20, 2026

You did not say anything of the underlying filesystems...?

For ZFS it may be expected as ZFS does a lot more work than classic filesystems under the hood, resulting in multiple IO-operations used for metadata etc...

t.lamprecht · May 20, 2026

Yeah, server and storage hardware, software and configuration details, including VM configuration would be nice to have to try reproducing this.

koovis · May 21, 2026

I updated to version 9.1.19 today and re-ran the tests. Here are the latest results:

Hardware Spec
CPU : AMD Ryzen Threadripper PRO 5955WX 16-Cores
RAM : 16GB
SSD : Samsung PCIe4 NVME SSD 1TB

I installed Proxmox using all the default settings.

Linux dr16 7.0.2-6-pve #1 SMP PREEMPT_DYNAMIC PMX 7.0.2-6 (2026-05-20T08:55Z) x86_64 GNU/Linux

Since I went with the default installation options, I assume the system partition is formatted as ext4 and the VM disks are stored on LVM (LVM-Thin). Is this correct?

Bash:

root@dr16:~# df -hPT
Filesystem           Type      Size  Used Avail Use% Mounted on
udev                 devtmpfs  6.9G     0  6.9G   0% /dev
tmpfs                tmpfs     1.6G  2.5M  1.6G   1% /run
/dev/mapper/pve-root ext4       94G   19G   71G  21% /
tmpfs                tmpfs     7.8G   55M  7.7G   1% /dev/shm
efivarfs             efivarfs  128K   62K   62K  50% /sys/firmware/efi/efivars
tmpfs                tmpfs     5.0M     0  5.0M   0% /run/lock
tmpfs                tmpfs     1.0M     0  1.0M   0% /run/credentials/systemd-journald.service
tmpfs                tmpfs     7.8G     0  7.8G   0% /tmp
/dev/nvme0n1p2       vfat     1022M  9.1M 1013M   1% /boot/efi
/dev/fuse            fuse      128M   16K  128M   1% /etc/pve
tmpfs                tmpfs     1.0M     0  1.0M   0% /run/credentials/getty@tty1.service
tmpfs                tmpfs     1.6G  4.0K  1.6G   1% /run/user/0

root@dr16:~# lvs
  LV            VG  Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data          pve twi-aotz-- <816.21g             0.61   0.25
  root          pve -wi-ao----   96.00g
  swap          pve -wi-ao----    8.00g
  vm-100-disk-0 pve Vwi-aotz--  100.00g data        4.95

Proxmox node benchmark Result

no - vm loaded (not running)

Bash:

root@dr16:~# fio --name=randread --ioengine=libaio --direct=1 --bs=4k --iodepth=64 --size=10G --readwrite=randread --runtime=60 --group_reporting
randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.39
Starting 1 process
randread: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [r(1)][100.0%][r=649MiB/s][r=166k IOPS][eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=53553: Thu May 21 14:53:52 2026
  read: IOPS=165k, BW=644MiB/s (675MB/s)(10.0GiB/15903msec)
    slat (nsec): min=3056, max=70474, avg=4619.04, stdev=1724.72
    clat (usec): min=53, max=8834, avg=379.47, stdev=41.61
     lat (usec): min=56, max=8841, avg=384.09, stdev=41.63
    clat percentiles (usec):
     |  1.00th=[  367],  5.00th=[  371], 10.00th=[  371], 20.00th=[  375],
     | 30.00th=[  375], 40.00th=[  375], 50.00th=[  379], 60.00th=[  379],
     | 70.00th=[  383], 80.00th=[  383], 90.00th=[  388], 95.00th=[  392],
     | 99.00th=[  404], 99.50th=[  408], 99.90th=[  424], 99.95th=[  457],
     | 99.99th=[  775]
   bw (  KiB/s): min=447424, max=669144, per=100.00%, avg=659588.65, stdev=39399.72, samples=31
   iops        : min=111856, max=167286, avg=164897.16, stdev=9849.93, samples=31
  lat (usec)   : 100=0.01%, 250=0.01%, 500=99.96%, 750=0.02%, 1000=0.01%
  lat (msec)   : 2=0.01%, 10=0.01%
  cpu          : usr=21.63%, sys=78.29%, ctx=263, majf=0, minf=75
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=2621440,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=644MiB/s (675MB/s), 644MiB/s-644MiB/s (675MB/s-675MB/s), io=10.0GiB (10.7GB), run=15903-15903msec

Disk stats (read/write):
    dm-1: ios=2586172/125, sectors=20689376/1344, merge=0/0, ticks=102695/12, in_queue=102707, util=98.42%, aggrios=2621448/29, aggsectors=20973568/1344, aggrmerge=0/97, aggrticks=103061/18, aggrin_queue=103081, aggrutil=91.48%
  nvme0n1: ios=2621448/29, sectors=20973568/1344, merge=0/97, ticks=103061/18, in_queue=103081, util=91.48%

IOPS : 160k ~ 170k

### in VM benchmark
# VM created for default setting
# Q35
# VirtIO SCSI Single
# Local-lvm 100GB
# Rocky 9
# xfs

Bash:

[root@rocky9 ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k --iodepth=64 --size=10G --readwrite=randread --runtime=60 --group_reporting
randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.35
Starting 1 process
randread: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [r(1)][100.0%][r=428MiB/s][r=110k IOPS][eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=4841: Thu May 21 14:56:51 2026
  read: IOPS=110k, BW=429MiB/s (450MB/s)(10.0GiB/23848msec)
    slat (usec): min=3, max=128, avg= 7.66, stdev= 3.30
    clat (usec): min=52, max=9264, avg=573.13, stdev=53.74
     lat (usec): min=59, max=9276, avg=580.79, stdev=53.89
    clat percentiles (usec):
     |  1.00th=[  494],  5.00th=[  519], 10.00th=[  537], 20.00th=[  545],
     | 30.00th=[  562], 40.00th=[  570], 50.00th=[  578], 60.00th=[  586],
     | 70.00th=[  586], 80.00th=[  603], 90.00th=[  611], 95.00th=[  619],
     | 99.00th=[  635], 99.50th=[  644], 99.90th=[  824], 99.95th=[  906],
     | 99.99th=[ 1123]
   bw (  KiB/s): min=400936, max=449088, per=100.00%, avg=439879.15, stdev=6226.87, samples=47
   iops        : min=100234, max=112272, avg=109969.79, stdev=1556.72, samples=47
  lat (usec)   : 100=0.01%, 250=0.01%, 500=1.55%, 750=98.22%, 1000=0.20%
  lat (msec)   : 2=0.03%, 10=0.01%
  cpu          : usr=15.71%, sys=84.15%, ctx=136, majf=0, minf=73
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=2621440,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=429MiB/s (450MB/s), 429MiB/s-429MiB/s (450MB/s-450MB/s), io=10.0GiB (10.7GB), run=23848-23848msec

Disk stats (read/write):
    dm-0: ios=2617860/3, merge=0/0, ticks=157067/5, in_queue=157072, util=99.46%, aggrios=2621440/3, aggrmerge=0/0, aggrticks=164755/5, aggrin_queue=164759, aggrutil=80.79%
  sda: ios=2621440/3, merge=0/0, ticks=164755/5, in_queue=164759, util=80.79%

IOPS : 100k~110k

#### RESULT

Proxmox (Baremetal) : 160k~170k
VM on Proxmox : 100k~110k

RHEL9 (Baremetal, LVM) : 160k~170k
VM on RHEL9 (KVM, disk on host's directory/file QCOW2) : 160k~170k

VM on VMWare 9(ESXi) : 150k~200k (possibly cached?)

All tests were performed on the same machine.

koovis · May 21, 2026

I have identified the cause of the performance discrepancy.

1. Hardware Specifications

CPU: AMD Ryzen Threadripper PRO 5955WX (16 Cores / 32 Threads)

RAM: 16GB

Storage: Samsung PCIe Gen4 NVMe SSD 1TB

Note: All benchmarks were conducted on the exact same physical hardware to ensure consistency.

2. Test Environment & VM Configurations

Guest OS: Rocky Linux 9 (Minimal Installation, fully updated)

VM Specs: 4 vCPUs, 8GB RAM, 100GB Disk (XFS Filesystem)

FIO Workloads:

4K Random Read: fio --name=randread --ioengine=libaio --direct=1 --bs=4k --iodepth=64 --size=10G --readwrite=randread --runtime=60 --group_reporting

1M Sequential Write: fio --name=seqwrite --ioengine=libaio --direct=1 --bs=1m --iodepth=16 --size=10G --readwrite=write --runtime=60 --group_reporting

3. Benchmark Results

Table 1: 4K Random Read Performance (IOPS & Latency Focus)

Test Environment
Storage Backend / Format
IOPS
Bandwidth (BW)
CPU Usage (usr / sys)
Proxmox Host
Baremetal (Native)
166k
647 MiB/s
21.30% / 78.61%
Proxmox VM
local (Directory / Raw)
173k
676 MiB/s
19.31% / 75.50%
Proxmox VM
local-lvm (LVM-Thin)
109k
426 MiB/s
14.93% / 84.82%
RHEL 9 Host
Baremetal (LVM / XFS)
173k
678 MiB/s
17.33% / 59.60%
RHEL 9 VM
KVM (Host Dir / QCOW2)
158k
618 MiB/s
14.56% / 84.75%
VMware 9 VM
ESXi (VMFS6 Datastore)
197k
770 MiB/s
17.67% / 82.01%

Table 2: 1M Sequential Write Performance (Throughput Focus)

Test Environment
Storage Backend / Format
IOPS
Bandwidth (BW)
CPU Usage (usr / sys)
Proxmox Host
Baremetal (Native)
5,014
5,015 MiB/s
3.28% / 21.85%
Proxmox VM
local (Directory / Raw)
5,009
5,010 MiB/s
2.89% / 7.49%
Proxmox VM
local-lvm (LVM-Thin)
4,958
4,959 MiB/s
3.78% / 7.36%
RHEL 9 Host
Baremetal (LVM / XFS)
5,004
5,005 MiB/s
2.74% / 9.98%
RHEL 9 VM
KVM (Host Dir / QCOW2)
1,551
1,551 MiB/s
5.71% / 2.51%
VMware 9 VM
ESXi (VMFS6 Datastore)
4,749
4,750 MiB/s
3.48% / 9.61%

4. Key Insights & Technical Analysis

① Proxmox Backend Discrepancy: local vs local-lvm

The Problem: In 4K Random Reads, Proxmox's file-based local directory backend shows excellent performance (173k IOPS), even slightly outperforming the baremetal host due to QEMU/Host-side page caching alignment. However, the block-based local-lvm (LVM-Thin) drops drastically to 109k IOPS, with average latency increasing from 365 $\mu s$ to 578 $\mu s$.

Root Cause: This degradation is caused by the virtualization metadata layer overhead unique to LVM-Thin. Under high queue depths (iodepth=64), managing the thin-pool block mapping allocations induces lookup latency and metadata lock contention.

Sequential Exception: For 1M Sequential Writes, the metadata lookup overhead becomes negligible because the number of I/O operations decreases significantly. As a result, both Proxmox backends saturate the physical PCIe Gen4 x4 link at approximately 5.0 GiB/s.

② RHEL 9 VM Sequential Write Bottleneck (QCOW2)

Observation: The RHEL 9 VM demonstrates solid 4K Random Read performance (158k IOPS) but encounters a severe bottleneck during Sequential Writes, dropping to 1,551 MiB/s (a ~70% performance loss compared to host).

Root Cause: This is a classic "Virtualization Tax" imposed by the QCOW2 image format on a local directory. When writing large sequential blocks, the dynamic cluster allocation and metadata updates within the QCOW2 file create an intensive write bottleneck. Switching to a pre-allocated QCOW2 or a raw image file would restore the native ~5.0 GiB/s throughput.

③ VMware 9 (ESXi) Aggressive Read Caching

Observation: VMware ESXi achieves the highest random read metric at 197k IOPS with the lowest latency (320 $\mu s$).

Root Cause: This near-native (and sometimes hyper-native) result is driven by the VMkernel's aggressive read-ahead caching mechanisms and highly optimized storage stack (VMFS6), which effectively filters and buffers small 4K random I/O packets at the hypervisor level.

news · May 21, 2026

Please don't write in Bold and Black!
It's horrible to read with a black screen config.

Johannes S · May 21, 2026

I wonder about the discrepancy between block- and file-based storage. I remember several reports, that block-storage actually perform äs better since it avoids the overhead of a filesystem.

UdoB · May 21, 2026

Johannes S said:
I remember several reports, that block-storage actually perform äs better since it avoids the overhead of a filesystem.

Yes, that's my understanding too.

Only when the (additional!) filesystem layer has an additional read-cache or an additional write-cache (lying about "finished writing your datablock") in any way, it may actually look faster.

Disclaimer: with total conviction but w/o proof...

waltar · May 21, 2026

What would be the sentense of a "filesystem cache" ? It caches the filesystem and so hide I/O requests to the underlaying disks. So as the cache for read and write is mostly always better as the more far I/O path instead of just to memory ideal and even mostly real a filesystem should better perform as block storage - it would be a surprise otherwise. Just think about all talking about increasing RAM in a host with zfs ... why ? Because it helps zfs caching the I/O, yeah, it's exactly what cache could and should do, hide I/O latencies to the application.

alexskysilk · May 21, 2026

Johannes S said:
I remember several reports, that block-storage actually perform äs better since it avoids the overhead of a filesystem.

True generally, untrue in the OPs testing matrix. LVM-thin introduces its own problems. Since it is of almost no benefit in a modern deployment (LVM thick for SAN, zfs/btrfs for local) I dont know why OP even bothered testing it.

BobhWasatch · May 21, 2026

Maybe because it is the installer default if you have a single disk.

alexskysilk · May 21, 2026

hmm.

yes, I see your point.

I am guilty of thinking of PVE as an infrastructure. I forget that many (most?) of its users are homelabbers. consider me chastised

Johannes S · May 21, 2026

alexskysilk said:
I forget that many (most?) of its users are homelabbers. consider me chastised

I actually doubt this. Most users on reddit maybe, but I doubt that the largest part of Proxmox revenue (which actually funds the development) comes from homelabbers instead of companys ditching Vmware

koovis · May 22, 2026

One of the primary reasons I conducted this test was to evaluate the performance gap between the virtualized environment and bare metal ("Bare Metal vs. Virtualization").

Although this baseline benchmark was performed on a single bare-metal machine, a direct comparison is challenging because the customer's actual production environment will be a multi-node clustered architecture. I plan to conduct additional validation tests in the actual deployed environment in the future.

The key objective here is to satisfy one of the customer's strict requirements, which stipulates that "virtual machine performance must be at least 90% of bare-metal performance." To verify this, I have been conducting comprehensive testing across the CPU, network, and storage domains.

While I have successfully confirmed that both CPU and network performance exceed the 90% threshold, storage performance has fallen short, prompting me to test under various configurations and conditions.

Furthermore, when utilizing Ceph storage, I compared the host-level IOPS measured directly via RBD against the in-VM performance using libaio. However, possibly due to differences from a true production environment, the virtual machine yielded only about 50% of the bare-metal performance.

This specific discrepancy in storage performance will require further research and deeper investigation.

Thank you.

alexskysilk · May 22, 2026

This is a great breakdown of your progress, and hitting that 90% threshold on CPU and network is a solid milestone. However, regarding the storage deficit (the 50% delta on Ceph vs. Bare Metal), I want to save you some cycles on the "further research" phase.

You are running into an architectural wall here, not a misconfiguration. Comparing local bare-metal NVMe to a clustered Ceph environment is an inherently uneven comparison for a few reasons:

1. When you write to a local bare-metal disk, the write is acknowledged the microsecond it hits the local controller. When you write to a Ceph VM, that write must be processed by the hypervisor layer (libaio/qemu), sent over the network, written to multiple OSDs on different hosts, and acknowledged back over the network before the VM marks it as complete. You aren't just testing disk speed; you are testing network round-trip time (RTT) and storage replication protocols.
2. A requirement of "90% of bare-metal storage performance" is standard for local virtualization (like Proxmox local LVM), where the virtualization overhead is minimal. However, the moment you move to a clustered, high-availability architecture like Ceph, that 90% target becomes mathematically and physically impossible for random I/O latency. The trade-off for surviving a node failure without data loss is the introduction of network latency.

As stated, it is not possible to meet your customer requirements. As a service provider its incumbent upon you to explain those limitations, but I would add that the "bare metal performance" isnt of any actual relevance here- the storage (as well as the compute and networking) must achieve a MINIMUM ACCPETANCE CRITERIA for the application they are serving, not an arbitrary value the server is theoretically capable of.

Bu66as · Jun 10, 2026

For your local LVM-thin results, a couple things to check: is iothread=1 actually set on the SCSI controller? Should be default with virtio-scsi-single on recent PVE but I've seen cases where it wasn't. Also try aio=io_uring instead of the default, can help with latency at high queue depths.
Check lvs -o+chunk_size for your thin pool too. The default chunk size can be rough for small random IO, that might explain part of the ~35% drop.
Re Ceph and the 90% requirement, @alexskysilk nailed it. Replicated storage adds network RTT per write by design, that's the price for redundancy. In my experience 4k random on Ceph typically lands at 40-60% of local NVMe, you can push it higher with dedicated WAL/DB on NVMe and a proper storage network (25G+) but 90% of bare-metal won't happen. What matters is whether the application performs acceptably, not what percentage of theoretical max it reaches.

Test Environment	Storage Backend / Format	IOPS	Bandwidth (BW)	CPU Usage (usr / sys)
Proxmox Host	Baremetal (Native)	5,014	5,015 MiB/s	3.28% / 21.85%
Proxmox VM	local (Directory / Raw)	5,009	5,010 MiB/s	2.89% / 7.49%
Proxmox VM	local-lvm (LVM-Thin)	4,958	4,959 MiB/s	3.78% / 7.36%
RHEL 9 Host	Baremetal (LVM / XFS)	5,004	5,005 MiB/s	2.74% / 9.98%
RHEL 9 VM	KVM (Host Dir / QCOW2)	1,551	1,551 MiB/s	5.71% / 2.51%
VMware 9 VM	ESXi (VMFS6 Datastore)	4,749	4,750 MiB/s	3.48% / 9.61%

Significant IOPS Drop in Proxmox 9.1.9 VMs Compared to Other Hypervisors

New Member

Distinguished Member

Proxmox Staff Member

New Member

New Member

1. Hardware Specifications​

CPU: AMD Ryzen Threadripper PRO 5955WX (16 Cores / 32 Threads) RAM: 16GB Storage: Samsung PCIe Gen4 NVMe SSD 1TB Note: All benchmarks were conducted on the exact same physical hardware to ensure consistency. ​

​

2. Test Environment & VM Configurations​

Guest OS: Rocky Linux 9 (Minimal Installation, fully updated) VM Specs: 4 vCPUs, 8GB RAM, 100GB Disk (XFS Filesystem) FIO Workloads: ​

4K Random Read: fio --name=randread --ioengine=libaio --direct=1 --bs=4k --iodepth=64 --size=10G --readwrite=randread --runtime=60 --group_reporting 1M Sequential Write: fio --name=seqwrite --ioengine=libaio --direct=1 --bs=1m --iodepth=16 --size=10G --readwrite=write --runtime=60 --group_reporting

​

3. Benchmark Results​

Table 1: 4K Random Read Performance (IOPS & Latency Focus)​

​

Table 2: 1M Sequential Write Performance (Throughput Focus)​

​

4. Key Insights & Technical Analysis​

① Proxmox Backend Discrepancy: local vs local-lvm​

​

② RHEL 9 VM Sequential Write Bottleneck (QCOW2)​

​

③ VMware 9 (ESXi) Aggressive Read Caching​

Famous Member

Distinguished Member

Distinguished Member

Famous Member

Distinguished Member

Distinguished Member

Distinguished Member

Distinguished Member

New Member

Distinguished Member

Famous Member

We value your privacy

1. Hardware Specifications

CPU: AMD Ryzen Threadripper PRO 5955WX (16 Cores / 32 Threads)

RAM: 16GB

Storage: Samsung PCIe Gen4 NVMe SSD 1TB

Note: All benchmarks were conducted on the exact same physical hardware to ensure consistency.

2. Test Environment & VM Configurations

Guest OS: Rocky Linux 9 (Minimal Installation, fully updated)

VM Specs: 4 vCPUs, 8GB RAM, 100GB Disk (XFS Filesystem)

FIO Workloads:

3. Benchmark Results

Table 1: 4K Random Read Performance (IOPS & Latency Focus)

Table 2: 1M Sequential Write Performance (Throughput Focus)

4. Key Insights & Technical Analysis

① Proxmox Backend Discrepancy: local vs local-lvm

② RHEL 9 VM Sequential Write Bottleneck (QCOW2)

③ VMware 9 (ESXi) Aggressive Read Caching