How to best setup disk in VM for fast access?

CelticWebs

Member
Mar 14, 2023
75
3
8
I can't seem to find a concrete answer for this issue, I see a lot of people commenting on poor disk speed inside VM but I'm not seeing any real ways to get it sorted.

I have a storage pool setup using NVME disks, using fio to benchmark with the following command

fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --size=4k --filename=/Pool/file

I get over 1100 MiB/s. Perfect. Mount a disk inside my VM whcih resides on this storage a date same test command only gets about 120MiB/s. The software thats running needs fast access to the disk as it's database driven. How can I get performance close to Proxmox speed within the VM? It's on a ZFS Raid10 of 4 NVME, processing power and memory on system are more than capable of supporting the system. Being ZFS its a RAW disk image, I've experimented a little with cache and settled on Write back but they didn't really make any significant difference to be honest. Async IO is enabled as Threads. I can't really see anything else to change?

Any tips on best way to get better performance?
 
"Virtio SCSI single" is already used for the VM?
Keep in mind that there will be additional overhead. you are basically comparing:
"ZFS Pool -> Dataset" vs "ZFS Pool" -> Zvol -> Virtio -> guest filesystem"
Each nested filesystem and abstraction layer multiplies overhead.
 
Last edited:
Hi Both,

I tried the suggestion of disabling cache as per that thread, it hans't improve things. While I understand there will be some losse due to overheads, this does appear a significant drop in perfoamcen from testing the pool on Prox direct vs inside the VM.

The ZFS raid pool is used solely of this one VM, I guess I could possibly pass the disks directly but then I'd have to create the raid internally in the VM, or is there another way to pass it directly?

What additional information would be useful to help debug here?
 
Yes, PCI passthrough of the NVMe disks would be an option to get bare metal performance as this will skip the entire virtualization layer. But it is only useful if you are not running a cluster as you cannot migrate that VM anymore. And there are some other caveats when working with PCI passthrough (IOMMU groups have to be isolated, upgrade might break passthrough, ...).
 
That's a point I didn't actually think about, the node is part of a small cluster, though migration is unlikely due ot the fact that this node is the only one with this specific pool, it wood be nice to have the flexibility to backup and/or migrate.

It just seems very strange that performance would be so much slower in the VM, it's can't be normal, I must have something set wrong?
 
Last edited:
You are doing 4K sync IO. First, zvols on ZFS by default are working with a 8K volblocksize. So each 4K IO will result in 8K read/written. And you got the additional overhead of the guests filesystem. When I do 1x 4K writes to a guests ext4 this will cause something like 12-16K IO to the virtual disk because of the metadata and journaling. This all adds up. Or better multiplies. So a 4K IO can easily cause 32K of IO.
Especially bad when doing sync IO.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!