High Hard Drive Usage/Low RW Performance in VM Guest Win10

TheFunk

Member
Oct 25, 2016
35
4
8
31
Hi all!

I recently got GPU passthrough working on one of my VMs and I'm delighted to say that everything appears to be working as it should, with one minor exception. I'm noticing some serious slowness in the IO/hard disk department. Sometimes the guest machine appears to be fine. Other times, task manager reports that the disk is at 100% utilization and the IO plummets.

I'm using FreeNAS as my storage solution with essentially a stripe of mirrors. That's a 12 disk stripe. All matching. Even though it's just a software RAID, I believe I should see some very decent numbers with that kind of working room. I should note that via an SMB share I can transfer into my FreeNAS box from my laptop at roughly 100MBps, rock solid.

The NAS is available to the Proxmox node via NFS.

I haven't enabled jumbo frames on my network. Should I?

I've heard of "hugepages" mentioned before on other forums when this topic has arisen. Can someone give me the 411 on what those are and whether or not I should enable them?

The host is running Proxmox 5.0 Beta 2 with the latest updates.

The Win 10 VM specs are as follows with a little commentary:

8 CPU cores, 1 socket
8GB RAM (I've heard some people say to try using less RAM, but that kinda defeats the purpose of the machine in my case)
virtio balloon driver in use (I've heard this sometimes causes problems too.)
virtio driver for disk
raw disk image w/ write through cache
1 passed through GPU
1 passed through USB port

Help y'all! I want this baby to purr!
 
I haven't enabled jumbo frames on my network. Should I?

If your switch supports it, it shouldn't be to hard to setup and could be worth a try.

I've heard of "hugepages" mentioned before on other forums when this topic has arisen. Can someone give me the 411 on what those are and whether or not I should enable them?

Your memory is divided into chunks when assigned to programs, for efficiency. Those chunks are normally 4K big. As modern systems have far more RAM available and programs use also more of this to be faster the 4K has become a bit to small. Modern Hardware and Kernels support something called Huge Pages (or Large Pages) which use bigger chunks, this makes it faster to look up where a exact memory address is and saves some overhead as there are less page table entries. x86_64 can use 2MB chunks (check-able by: `lscpu | grep -ow pse`) or 1GB chunks (`lscpu | grep -ow pdpe1gb`), maybe even some more.
This should not directly impact IO but rather usage and a bit speed+ regarding memory (maybe @spirit can chip in here, he uses it quite a lot AFAIK). This is more a concern if your VM has quite a bit RAM assigned.

The Win 10 VM specs are as follows with a little commentary:

Can you post also your VM config:
Code:
qm config VMID

Also doing a IO test from the PVE node to the FreeNas would help to see what performance we could expect from the VM.
 
Hi, small fun question. Are you open to idea of spinning up another VM, possibly Linux of some kind / which is otherwise similar to the windows host (ie, resource allocation, same VM underlying storage etc) ? Reason I ask - is that in my experience, Win10 as an OS is sometimes "inconsistent as hell" in terms of what it is doing. (ie, speaking about client laptop workstations for example - when it gets in their mind, it is time to export patches to a nearby client via secret windows updates protocol; when it is time to do full disk index search and 'background' tasks like this - I can randomly see NIC getting saturated / or DiskIO getting pinned to 100%, or both, more or less without any regard to what the human user may be doing (or not) on the system. So from my perspective, Win10 is a fairly not-ideal platform as a VM guest / when compared to say Win7Pro or even Win8Pro which had less of this kind of random-bad-house-guest-behaviour.

ie, possibly your issue is to be debugged in the guest acting weird, rather than the Proxmox layer. Hence having a separate VM you can spin up, and then challenge for CPU:NIC:DISK in a controlled manner, might allow you to get some sensible refefrence baseline. Or maybe throw on a temporary Win7Pro VM and see how it behaves. etc etc.

Just 2 cents worth ..

Tim
 
Thanks all!

Here's my config file

Code:
agent: 1
bios: ovmf
bootdisk: scsi0
cores: 4
cpu: host
efidisk0: local-lvm:vm-102-disk-2,size=128K
hostpci0: 81:00.0,romfile=sapphire.rom,x-vga=on
keyboard: en-us
machine: q35
memory: 12288
name: Aries
net0: virtio=32:03:97:11:69:4A,bridge=vmbr0
numa: 0
ostype: win10
scsi0: local-lvm:vm-102-disk-1,backup=0,cache=writethrough,discard=on,iothread=1,size=65G
scsi1: VMDS:102/vm-102-disk-1.raw,backup=0,cache=writethrough,discard=on,iothread=1,size=500G
scsihw: virtio-scsi-pci
smbios1: uuid=a2bee0e0-3f1b-4266-a968-21bd01ac0840
sockets: 2
usb0: host=3-1

I spun up a second Win10 host and this time told the host to use local storage on the node (Samsung 840 Evo). I still noticed the atrocious read write speeds. Interestingly, I didn't see this until after I passed through the GPU. Maybe I didn't look for long enough before I passed through the GPU, but what I noticed was that before passing it through, my read/write seemed close to what I would normally expect from this drive, particularly during the OS install.

Anyway, since I tested this fully locally before adding my second 500GB disk today, we can eliminate the NAS as a source of slowness. The issue is now either with the host or with the guest.

I will check the phantom update service @fortechitsolutions Thanks for the reminder! I don't know if I disabled that on install or not. I also have an 8.1 guest on this server that I'll test out shortly to see if I notice any improvement there.

The 12GB of RAM lends me to believe that I should be using hugepages for sure! @spirit any advice?

The oddest part is that the disk usage is showing at 100% while simultaneously showing extremely low r/w. So it'll say 100% but be sitting at something like 50KBps write speed.

Thank you @t.lamprecht !
 
Just an update,

I can confirm that I only have this issue on VMs with a passed through GPU. I can also confirm that brand of GPU (AMD or NVIDIA) is a non-factor. I was having issues with VMs not shutting down when passing through a GPU as well. I've since discovered a post over at the Unraid forums that suggested turning on message based interrupts in device manager on my guest for the GPU. Doing so fixed my issue with being unable to reboot/shutdown the VM from within the guest OS. Could something like this also be causing the disk usage issue I'm seeing? It looks like the virtio disks use this form of IRQ by default.

In other forums, I'd read that on Windows 10 you should use regular IRQs for disk drives to prevent this issue. The virtio disks use message signal IRQs by default. Could I change this setting and see if that helps? Will this break the virtio disk implementation?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!