High Wait IO after and Load Average upgrade to Proxmox 4

Clement87

Member
Dec 12, 2015
4
0
21
Hi,
I'm experiencing high Wait IO (and high Load Average) since I upgraded from Proxmox 3 to Proxmox 4 (see attached picture).
I'm only using container (about 10), no KVM. Upgrade from OpenVZ to LXC was OK following the recommended official procedure.
I tried to reinstall everything from scratch (directly Proxmox 4 on a fresh Debian 8 install) and restore container backups, result was the same.
I tried to stop containers one by one to see if one the them was the source of the issue, with no success.
I upgraded the host and all containers to the last version and kernel.
The server is a Kimsufi (from OVH) Core i5, 16G of ram and 2To Hard Drive.
I spent days looking to solve the issue but now I'm out of idea.
Is someone else experiencing the same problem ?
Thanks
Clement
cpu-week.png
 
Re: High Wait IO and Load Average after upgrade to Proxmox 4

Sorry for the title, it should be "High Wait IO and Load Average after upgrade to Proxmox 4"
 
Hello, i'm experiencing the same issue, i tried to find why, with no luck. I'm thinking going back to Proxmox 3.
 
I finally found a workaround. On my side, high iowait mainly come from jdb2 processes (ext4 journal) which run very on all my vm (10+), because since promox 4 and lxc, default disk of each vm is a raw device formatted in ext4.

Backup then restore vm using chroot instead of raw image did the tricks for me, no more iowait.
You can restore a backup in chroot instead of raw image using a 0 size local storage :

Code:
pct restore id backup.tar.gz --rootfs local:0

Please note that raw image is the default in proxmox 4 because chroot mode does not allow to set a quota on disk usage.
 
  • Like
Reactions: Clement87
I finally found a workaround. On my side, high iowait mainly come from jdb2 processes (ext4 journal) which run very on all my vm (10+), because since promox 4 and lxc, default disk of each vm is a raw device formatted in ext4.

Backup then restore vm using chroot instead of raw image did the tricks for me, no more iowait.
You can restore a backup in chroot instead of raw image using a 0 size local storage :

Code:
pct restore id backup.tar.gz --rootfs local:0

Please note that raw image is the default in proxmox 4 because chroot mode does not allow to set a quota on disk usage.

Thanks a lot Drakaz !!!
I restored all my containers from raw to chroot and now WaitIO and Load are back to normal (like before the upgrade to Proxmox 4.0).
 
Does this also affect I/O Wait on KVM containers? Have always used OpenVZ containers but lack of user quota in LXC may force me to switch to KVM. My containers are I/O intensive. How will this effect me?
 
Does this also affect I/O Wait on KVM containers? Have always used OpenVZ containers but lack of user quota in LXC may force me to switch to KVM. My containers are I/O intensive. How will this effect me?
They are no change for kvm. so no problem.
The problem seem to come from lxc with raw files mounted with loop.
For lxc, using zfs subvolume should solve this and keep quota management.
 
I've been studying high IO delay at one of my servers and found that barriers could increase IO delay.
If barrier is turned on (/proc/mounts) it could be a reason of high IO delay.
 
All I can see is lot of users on this forum are affected and Proxmox staff does not react to the problem.
 
thanks, I was asking because I run Proxmox 4.1 (upgraded from 4.0) and am hit by seemingly random high I/O waits. Still struggling to figure out where they come from as everything slows down so much, the only thing I can do is reboot so I am struggling to debug the source of the issues :-(

This thread seemed like my best bet so far.
 
Same here.
I have hight iowait stats on my container hosting a mysql server (ext4 on a lvm-thin group hosted on software raid1).

What is the definitive answer ?
remove journal on ext4 ? user nobarrier ? do not use lvm (so no snapshot ?)
 
FS loop over an other one always produces IO amplification.
Maybe try noatime and nodiratime on the host (if /var/lib/vz is a separated FS) and VM guests (if your VM daemons does not require atime and diratime). Furthermore, try to alginate FS cluster sizes between VM (FS) and host (VM + disk/RAID).
Do not play with journal and barrier disabling, because in a power outage case, you will loose data (and FS will be in an inconsistent state).
 
Just wanted to share what I ran into. I have LXC containers on a 2x2TB MDRAID1, and I'm also using the raw disk file vs chroot. I managed to cut down my %iowait from a consistent 18-24%, to around 0.8-2%. Changes were adding 'noatime,barrier=0' to the /var/lib/vz mount options, and my drives are set to use deadline (without tuning). noatime also implies nodiratime so there's no need to add both.

Before:

Code:
root@saber:~# vmstat -SM 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  2      2    275    972  27628    0    0    14    95   30   39  1  1 95  4  0
 0  2      2    275    972  27628    0    0     0   748 2584 3555  1  0 82 17  0
 0  2      2    275    972  27628    0    0     0   792 2145 3497  1  0 73 26  0
 0  2      2    275    972  27628    0    0     0   736 2125 3496  1  1 79 20  0
^C
root@saber:/var/lib/vz/dump# pveperf /var/lib/vz
CPU BOGOMIPS:      35201.12
REGEX/SECOND:      1954170
HD SIZE:           1765.90 GB (/dev/mapper/pve-data)
BUFFERED READS:    154.66 MB/sec
AVERAGE SEEK TIME: 15.48 ms
FSYNCS/SECOND:     6.95
DNS EXT:           16.84 ms
DNS INT:           10.04 ms (nn.biz)
root@saber:/var/lib/vz/dump#

After:

Code:
root@saber:/var/lib/vz/dump# pveperf /var/lib/vz
CPU BOGOMIPS:      35201.12
REGEX/SECOND:      1960617
HD SIZE:           1765.90 GB (/dev/mapper/pve-data)
BUFFERED READS:    108.30 MB/sec
AVERAGE SEEK TIME: 13.48 ms
FSYNCS/SECOND:     369.32
DNS EXT:           11.83 ms
DNS INT:           17.15 ms (nn.biz)
root@saber:/var/lib/vz/dump# vmstat -SM 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      2  29718     44    406    0    0    16    97   36   47  1  1 95  4  0
 1  0      2  29726     44    406    0    0     4   560 3374 4175  1  1 98  0  0
 0  0      2  29725     44    406    0    0     0   260 3529 3968  1  1 98  0  0
 0  2      2  29724     44    406    0    0     8   128 2805 3254  1  0 98  1  0
^C
root@saber:/var/lib/vz/dump#
 
Last edited:
Andrei ZeeGiant : your optimisation is effective.

///// DISCLAMER (my professionnal responsibility as Proxmox Reseller and system expert)
But you increase risk of loosing data in case of power outage.
Furthermore, you don't know if a daemon uses atime/diratime in the system. This can lead to problems, furthermore in cluster configuration.
https://en.wikipedia.org/wiki/Write_barrier
http://searchenterpriselinux.techtarget.com/tip/Deciding-when-to-use-Linux-file-system-barriers
https://lonesysadmin.net/2013/12/08/gain-30-linux-disk-performance-noatime-nodiratime-relatime/
https://en.wikipedia.org/wiki/Stat_(system_call)
/////

Andrei, is it possible for you to :
- try with lazytime rather than noatime
- reenabling barriers
- post the result ?