I/O separation

massivescale

Renowned Member
May 15, 2012
18
4
68
localhost
Hi everyone,

Recently I received a notification from Nagios about high server load. All virtual machines have been slow - web pages were loading slowly, SSH login took almost a minute. There were many processes in D state.

Using iotop and my patched vztop, I've quickly located the problem - one of the clients' VMs had many sendmail processes doing lots of random I/O. I wanted to address it using only OpenVZ, not interfering with what client runs in his VM. I started with:
Code:
vzctl set 110 --ioprio 0 --save
and waited to see if other VMs get faster. They didn't. So I tried:
Code:
vzctl set 110 --cpuunits 100 --cpulimit 50 --save
and it still didn't make other VMs work faster.

I ended up stopping Sendmail on his machine upon first noticing. However, when choosing a VPS technology, I chose OpenVZ because it had fair I/O scheduling and priorities. Is there something I did wrong, or is OpenVZ just not that good at managing virtual machines' disk priorities?

Code:
# uname -a
Linux le02 2.6.32-11-pve #1 SMP Wed Apr 11 07:17:05 CEST 2012 x86_64 GNU/Linux
 
I will, but i/o priorities are not a new feature and I don't think updating 2 minor versions up will help. Does someone have an actual advice or experience with I/O issues in OpenVZ?
 
Have you set IO priorities on your other containers? I think it is relative to each other.http://wiki.openvz.org/I/O_priorities_for_containers. Also, I think it uses the CFQ scheduler and you might want to change to the Deadline scheduler and make sure it is applied to your block device(s). It's been a really long time since I used OpenVZ containers, but I seem to remember there were major issues related to dirty bits not being counted and this could also cause an IOWAIT.

Sorry... all that I remember.
 
Thanks for your answer, charnov.

The other containers have a priority of 4 by default, so every one of them should receive ~1.5x more time than a container with prio=1. This wasn't the case, though...
 
Well, I tried the Deadline scheduler today (on server hardware, not on my laptop) and when running fio in one container, the websites on other containers didn't respond...
 
Is this local disk, iSCSI, or NFS? Maybe starving the bus, controller, or if using HyperThreading, the cache... with ISCI or NFS, you may be starving the network connection.
 
The original advice to upgrade your kernel is sounding better and better. I have been looking at the OpenVZ kernel changelogs and there is quite a bit about deadlocks in there.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!