High IO wait following PVE 4 upgrade

jonty-comp

New Member
Jan 5, 2016
2
0
1
32
I recently had some server downtime for my office Proxmox server over Christmas and took the opportunity to upgrade to 4.1 from 3.4, and everything seemed to go alright. However, now everyone's back in the office, i'm noticing a severe slowdown across the whole system.

I set up a Zabbix server (in a container on this server) months ago, and that's recorded a marked increase in system-wide IOwait since the upgrade. The system itself was switched off for about a week while we had some electrical work done, but other than that, the upgrade is the only change that's taken place.

Here is a graph from before the upgrade took place:
proxmox3.4.png

The spike in iowait corresponds with the nightly backup; that's no problem. The data resolution is decreased slightly because the data is more than 7 days old.

Here's a graph from after the upgrade:
proxmox4.1.png

As you can see, the iowait is not huge, but it is causing a severe slowdown on our MySQL VM that's making some of our business software almost unusable.

The server is a Lenovo TS140 with a Xeon E3-1225 v3, 16GB of RAM and two 500GB SATA disks in software RAID 1 and LVM with a 333GB data partition and a 100GB root partition (both ext3 with default mount options). I know software RAID isn't officially advised but we're saving up for a hardware RAID card, and it's not caused any problems for the last 18 months. :p

There's nothing particularly strange in any of the logs I know about, but i'm not terrible experienced with these things, and any advice would be greatly appreciated.
 
The key here may be Sata controller or Raid1 setup. I have had similar issues after upgrading to Proxmox4 and here is how i solved it.

In my logs I had a bunch of messages like this and have had them since the 2.6 kernel with no poor iowait performance or high load:

ata5.03: failed command: WRITE FPDMA QUEUED
ata5.01: failed command: READ FPDMA QUEUED

The CPU/memory was ok but the load was high and there was lots of iowait causing it with no clear smoking gun at first. What I did was check the disk performance on all disks:

for i in {a..z} ; do hdparm -t /dev/sd$i;done

What this does is check the speed of the drives. In my case some of them were reporting < 10mb/sec for reads which is very bad. I would figure at least 100mb/s should be minimum. If you have this condition then proceed with the following.

Check if NCQ is enabled

for i in {a..z} ; do echo /dev/sd$i; cat /sys/block/sd$i/device/queue_depth;done

Any value greater than 1 means its enabled (usually its 31) for the block device

So all you need to do is set that value to 1 for each block device to disable NCQ

for i in {a..z} ; do echo /dev/sd$i; echo 1 > /sys/block/sd$i/device/queue_depth;done

You will see an IMMEDIATE improvement. Test by running the hdparam command again and the speeds should go way up. If this is doesn't fix your issue you can always rollback the command by changing the 1 back to 31 and look elsewhere. I haven't checked to see if this is persistent and survives a reboot (probably not) so please keep that in mind.

Hope this helps
 
Hi, and thanks for your help.

Unfortunately, it doesn't seem like this is the problem. hdparm reports both disks as reading at around 100MB/s, and disabling NCQ doesn't really make any difference:

Code:
root@proxmox:~# for d in {a..b}; do hdparm -t /dev/sd$d; done

/dev/sda:
Timing buffered disk reads: 272 MB in  3.04 seconds =  89.61 MB/sec

/dev/sdb:
Timing buffered disk reads: 320 MB in  3.01 seconds = 106.31 MB/sec

I've looked into it a little more, and pveperf output states that my fsyncs/sec is very low:
Code:
root@proxmox:~# pveperf /var/lib/vz
CPU BOGOMIPS:      25542.60
REGEX/SECOND:      2753992
HD SIZE:           332.83 GB (/dev/mapper/pve-data)
BUFFERED READS:    73.39 MB/sec
AVERAGE SEEK TIME: 13.88 ms
FSYNCS/SECOND:     13.15
DNS EXT:           83.50 ms
DNS INT:           0.37 ms

iotop usually shows jbd2 or kworker at the top of the IO list, so I guess it's probably nothing to do with Proxmox, and perhaps something to do with my chipset on kernel 4.x...very mysterious! Still nothing strange in dmesg or syslog.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!