IO delay issues

autumnwalker

Member
Sep 30, 2019
20
2
8
39
I suspect the answer here will be consumer SSD / read the FAQ - but I'm trying to understand if there is anything else I am missing. I'm curious as all four of these SSDs were running on Proxmox 6 in a cluster on different hosts without issue. When I built these two new-to-me servers the drives were wiped and re-initialized.

I built two similar servers. HP DL380P G8's with the following specs:

Server 1:
  • Dual Xeon E5-2643 v2
  • 384 GB RAM
  • Integrated P420i in HBA mode
  • Dual Kingston SA400S3 SSDs (ZFS01 and ZFS02)
Server 2:
  • Dual Xeon E5-2690
  • 384 GB RAM
  • H220 HBA in PCI-e x8
  • One Kingston SA400S3 SSD (ZFS01)
  • One Kingston SUV400S3 SSD (ZFS02)
Server 2 was previously configured with the P420i in HBA mode (before the H220) and experienced similar issue. I disabled the P420i and put in the H220 as there are reported known issues with the P420i and that does not seem to have made any difference.

When Server 2 is "idle" (10% CPU usage) it's fine. When it starts doing work it seems that the IO delay ramps up in parallel with the CPU usage and often exceeds it. IO delay will get into the 20's and I've seen it over 50%.

When it is "idle" it is running an NVR VM which records a 4K camera continuously to the SUV400S3 drive which is configured as VirtIO SCSI with Write Back to a Windows 10 VM. So I would assume this is "high" or constant IO - you'd think it would cause IO delay constantly.

I've tried tuning ZFS based on recommendations on this forum and doing my own research - but to be honest I'm in over my head at this point. Server 1 seems "fine" ... but I've not paid that close attention to it as I've been troubleshooting on Server 2 before I do anything to take Server 1 offline.

These are currently running as independent hosts. I previously had these configured in a cluster and when I tried to migrate a VM from Server 1 to Server 2 the IO delay on Server 2 caused the cluster sync to die and the Server 2 web gui became borderline unresponsive while the migration was underway.

Thoughts as to what might be causing the IO delay issue on Server 2? Anything I can do to test, troubleshoot, etc.?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!