VMs performing poorly

Archie45D

New Member
Oct 28, 2021
2
0
1
34
Hello there, I'm currently working with resolving some performance issues on a Proxmox installation.

There is a ZFS array on the network which is hosting the VMs, and the behavior is strange.

Currently, the server is hosting 3 Windows 10 VMs with the configuration shown in the attached image. The issue we're currently working to resolve is with regards to the performance of these machines. VM 1 is the primary VM, and two other VMs were created from the template of that VM.

When VM 1 has IO, VM2 and VM3 are brought to a crawl, with task manager showing their disk I/O as completely pinned at 100%. This behavior has varied, sometimes VM 2/3 will be pinned at 100% disk I/O while VM1 is transferring 8 mb/s, other times it will be at 87% while transferring 20 mb/s.

Are there any tuning variables or otherwise that I should take into consideration to resolve this behavior?

Thanks for any insight.
 

Attachments

  • vm config.jpg
    vm config.jpg
    61.4 KB · Views: 18
  • vm2 with vm1 iops.jpg
    vm2 with vm1 iops.jpg
    37 KB · Views: 19
This sounds like your ZFS is too slow.
Do you use HDDs for the Pool? Also, how is the pool set up (mirror, raidz)?
 
The ZFS array is making use of 30 * 18 TB HGST Ultrastars. 2 VDevs of 15 drives, RaidZ2.
That would also mean that your 4K random reads/writes are basically limited to the performance of below 2x the speed of a single HGST Ultrastar.
Maybe some 3 or 4 way mirrored NVMe SSDs as a special device might help so all those IOPS caused by metadata wouldn't hit your HDDs anymore? And I don't know if you run DBs but in that case a pair of durable SSDs as a mirrored SLOG might also make sense.
Wow.

How are the 4K IOPS on a ZVol? You can test it with fio. See also https://pve.proxmox.com/wiki/Benchmarking_Storage
I would guess random 4K sync writes will be horrible. Wouldn't wonder it that pool can't reach 1MB/s.

By the way, I really don't like that wiki article. People are just copy & pasting the commands mentioned there to their running production systems without understanding fio and wonder why PVE isn't booting anymore. I've read too many threads here where people just run fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=1M --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --filename=/dev/sda overwriting "/dev/sda" and killing the installation. There should be a big fat warning that this command will destroy all data on that disk or the example command should use something more save like "--directoy=/var/tmp" instead of "--filename=/dev/sda".
 
Last edited:
  • Like
Reactions: UdoB

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!