I/O separation

massivescale · Aug 26, 2012

Hi everyone,

Recently I received a notification from Nagios about high server load. All virtual machines have been slow - web pages were loading slowly, SSH login took almost a minute. There were many processes in D state.

Using iotop and my patched vztop, I've quickly located the problem - one of the clients' VMs had many sendmail processes doing lots of random I/O. I wanted to address it using only OpenVZ, not interfering with what client runs in his VM. I started with:

Code:

vzctl set 110 --ioprio 0 --save

and waited to see if other VMs get faster. They didn't. So I tried:

Code:

vzctl set 110 --cpuunits 100 --cpulimit 50 --save

and it still didn't make other VMs work faster.

I ended up stopping Sendmail on his machine upon first noticing. However, when choosing a VPS technology, I chose OpenVZ because it had fair I/O scheduling and priorities. Is there something I did wrong, or is OpenVZ just not that good at managing virtual machines' disk priorities?

Code:

# uname -a
Linux le02 2.6.32-11-pve #1 SMP Wed Apr 11 07:17:05 CEST 2012 x86_64 GNU/Linux

mir · Aug 26, 2012

Maybe you should try to upgrade your nodes. Latest version is 2.6.32-14-pve

massivescale · Aug 27, 2012

I will, but i/o priorities are not a new feature and I don't think updating 2 minor versions up will help. Does someone have an actual advice or experience with I/O issues in OpenVZ?

charnov · Aug 27, 2012

Have you set IO priorities on your other containers? I think it is relative to each other.http://wiki.openvz.org/I/O_priorities_for_containers. Also, I think it uses the CFQ scheduler and you might want to change to the Deadline scheduler and make sure it is applied to your block device(s). It's been a really long time since I used OpenVZ containers, but I seem to remember there were major issues related to dirty bits not being counted and this could also cause an IOWAIT.

Sorry... all that I remember.

massivescale · Aug 27, 2012

Thanks for your answer, charnov.

The other containers have a priority of 4 by default, so every one of them should receive ~1.5x more time than a container with prio=1. This wasn't the case, though...

charnov · Aug 27, 2012

Check out this thread... they recommend setting the scheduler to Deadline. Also, apparently mixing OpenVZ with KVM VMs which use virtio devices is a bad idea, too.

http://forum.proxmox.com/threads/10...sponsiveness-even-if-on-different-SAS-storage

massivescale · Aug 27, 2012

This is interesting. Too bad that deadline does not support ionice/ioprio, but still it may be better than non-working CFQ. Thanks!

massivescale · Aug 28, 2012

Well, I tried the Deadline scheduler today (on server hardware, not on my laptop) and when running fio in one container, the websites on other containers didn't respond...

charnov · Aug 28, 2012

Is this local disk, iSCSI, or NFS? Maybe starving the bus, controller, or if using HyperThreading, the cache... with ISCI or NFS, you may be starving the network connection.

charnov · Aug 28, 2012

Might want to start digging into the Beancounters to track down where the bottleneck is happening: http://wiki.openvz.org/Proc/user_beancounters

charnov · Aug 28, 2012

The original advice to upgrade your kernel is sounding better and better. I have been looking at the OpenVZ kernel changelogs and there is quite a bit about deadlocks in there.

Search

Search

I/O separation

massivescale

Renowned Member

mir

Famous Member

massivescale

Renowned Member

charnov

Guest

massivescale

Renowned Member

charnov

Guest

massivescale

Renowned Member

massivescale

Renowned Member

charnov

Guest

charnov

Guest

charnov

Guest