[SOLVED] IO delay - how to find out reason for?

Tobias.F

New Member
Dec 12, 2023
23
7
3
Hello,

I am running into issues with corrupted data in VM. I am quite new to Proxmox and do not know how to search to find the reason for as I did a couple of changes in my system.

In my home system, I migrated my VMs from Hyper-V to Proxmox on the same hardware some weeks ago. All was running fine. I changed Mainboard, CPU and RAM to a more powerful set-up some days ago. All is running fine under "normal" load. When I did some intensive data operations (backup via Windows VM to local data store), after a while, I noticed slow data transfer abortion or the backup program. The backup VM was hanging, reacting slow and crashed one time. On another VM got some corrupted data. On Proxmox I noticed IO delay up to 90%. Proxmox itself was running stable all the time. Disk status and ZFS status is shown as OK. No hint I can find in the logs.

simplified setup description
  • consumer SATA SSD - ZFS: Proxmox 8.1.3
  • consumer NVME - ZFS: VMs
  • SAS HDD - ZFS Mirror: Data (Target of data operations)
My current assumption for root cause is that the NVME with VM is overheating. Same NVME, housing and cooler I use since years. But the new system now can "stress" it more as the old one. In the Windows VM logs I can see disk access errors. Incompatibility of NVME with the mainboard is an other assumption.

Any idea where to search to find the root cause?
 
You could check with iotop, iostat or zpool iostat what disk is overwhelmed.
My guess would be those HDDs who are slowing everything down.

But consumer SSD, especially when used with ZFS, could also be terrible slow when hitting them with sync writes or simply continous async writes for kore then a few seconds. Especially if those are conumer QLC SSDs.
 
Show yor VM-s configuration.
Do you have enable virtio drivers on VM-s?
Have you checked your disk temperatures? Show smartctl -x /dev/your_device
 
You could check with iotop, iostat or zpool iostat what disk is overwhelmed.
My guess would be those HDDs who are slowing everything down.

But consumer SSD, especially when used with ZFS, could also be terrible slow when hitting them with sync writes or simply continous async writes for kore then a few seconds. Especially if those are conumer QLC SSDs.

For sure, the HDD as backup target is the bottleneck in the data transaction. But this hasn't in my set-up since years.
PC with NVME --> 10G LAN --> Switch --> 10G LAN --> VM-Host --> Win-VM with Veeam B&R --> HDD Pool on VM-Host

Slow HDD explains to me a "traffic jam" but not data corruption on a different storage.

ZFS is a new player in this game, yes. But was running since weeks before HW upgrade.

Show yor VM-s configuration.

VM config
Code:
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0;ide0;net0
cores: 6
cpu: host
efidisk0: local-nvme-zfs:vm-108-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
ide0: none,media=cdrom
machine: pc-q35-8.1
memory: 6144
meta: creation-qemu=8.1.2,ctime=1701617700
name: veeam-backup
net0: virtio=BC:24:11:BC:AB:73,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
protection: 1
scsi0: local-nvme-zfs:vm-108-disk-1,cache=writeback,discard=on,iothread=1,size=150G
scsi1: local-HDD-pool2-zfs-raidz2:vm-108-disk-0,backup=0,cache=writeback,discard=on,iothread=1,size=1570G
scsi2: local-HDD-pool1-zfs-mirror:vm-108-disk-0,backup=0,cache=writeback,discard=on,iothread=1,size=4000G
scsihw: virtio-scsi-single
smbios1: uuid=4d33e700-9e27-4444-8c07-187a8c2d7785
sockets: 1
tags: net-lan;os-windows
tpmstate0: local-nvme-zfs:vm-108-disk-2,size=4M,version=v2.0
vmgenid: a23f2c76-7968-40f8-8cbb-b38d0e3e3c4f


Do you have enable virtio drivers on VM-s?

Yes.
I checked the Windows error log and found a couple of warnings. (I freely translated to English)

Code:
vioscsi: Reset to device "\Device\RaidPort3" was issued.
vioscsi: Reset to device "\Device\RaidPort1" was issued.

Have you checked your disk temperatures? Show smartctl -x /dev/your_device

I did not checked when it happend first. But today I mounted a small fan directly over the NVME holding the VMs and tested again with only one VM enabled: happend again!

NVME: 35 °C under load (It was between 50 and 60 °C °C idle without fan)
HDD: 35 °C
 
My issue is solved.
I cannot say which of these points is the root cause and how much each point contributes to the solution. But after having done some changes, my issue is gone.

I can exclude hardware issues as originally assumed. PVE and Linux VM were running fine over days. Moving large data from one storage pool to another also worked fine. The issue only was related wo Windows VM and doing backups to Veeam B&R on Windows VM.

Change #1:
On Windows VMs in the Windows settings, I disabled "Core Isolation".
This works on bare metal and on Hyper-V host. As my VM were migrated from Hyper-V, I missed this point.
After this change Windows was running smooth without lagging.

Change #2:
I changed the settings of the ZFS file system on PME and the VM file system of the Windows VM.
ReFS I used as before migration the VM were running on Storage Spaces. Looks like CoW FS (ReFS) on top of CoW FS (ZFS) is not a good idea.
64K for the file system and 128 k for the underlying storage is the recommended setting of Veeam.

before:​
  • HDD: sector size = 4K / 512e
  • PVE ZFS pool: ashift = 12 (= default)
  • PVE ZFS volblocksize = 16k (default)
  • VM = ReFS / 64k

before:​
  • HDD: sector size = 4K / 512e
  • PVE ZFS pool: ashift = 12 (= default)
  • PVE ZFS volblocksize = 128k
  • VM = NTFS / 64k

So far, all is running fine.
 
Last edited:
  • Like
Reactions: mow and leesteken
Hello, tobias,
We have similar problems and I saw that you had iothreads=1 set. Did you change these settings again?
Regards
Peter
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!