Very Slow Proxmox 5.3 Host with High IO Delay and only one VM

Dec 17, 2017

I have 3 Proxmox hosts, one with version 4.4, another one with version 5.2 and the latest with Proxmox 5.3.

From these 3 hosts, I would expect the 5.3 host to be the most performing taking into account the better hardware. All of the hosts have hardware RAID 10, with the latest (5.3 installation), being the only one with an SSD based configuration.

This host, the 5.3 one, is using a modified firmware to support MacOS Virtual Machines. I don't believe this to be the cause of the high load, as I have noticed sporadic slowness in PVE web interface even before setting up the MacOS support changes and setting up the Mac VM.

Even though this is an SSD based, hardware 10 host, I see high IO delay from time to time in the host, which I do not notice in the other setups.

Here are 2 consecutive executions of pveperf in this machine. You can spot the alarming difference in the buffered reads speed.


And here is a chart of the CPU power and IO delay:


I am kind of lost here.

I am able to simply redo the install / configuration from scratch, but I don't understand if this is some configuration issue, and if so, what should I be changing. Redoing the configuration if nothing is changed will ultimately lead me to the same situation I have at the moment, and I need to understand the underlying cause before moving forward.

Could this be hardware related? Can the RAID controller or the SSD disks be responsible for this? What can I do to find the culprit?

Any help would be greatly appreciated.
Do you have to provide a bit more information like what fs do you use?
If you use ZFS on an HW Raid this would explain this behavior.
What HW raid controller do you use?
I have been investigating this with my provider and this seems to be solved.

There were some errors being reported by proxmox (caught through the logs in the server) which pointed to an hardware fault. The RAID controller was failing every few minutes, and resetting itself, which caused a total crawl of the server and then back to life.

After switching the RAID controller and the disks, things are now working as I would expect, with pveperf reporting values similar to the first sample I left above.

The faulty raid controller was LSI-9265-8i. I have had no issues until now with these controllers.

