Unable to stop a container from blocking the host

rsauvat

New Member
Mar 9, 2010
17
0
1
Hi,

On a production server I have setup 8 virtual machine. They share the 16Go of RAM on the host but only using 5Go when running. My problem is when I start an index creation in MySQL on a particular VM after 2-3 minutes the whole host stop responding correctly. I can see in top a lot of processes waiting for I/O and the load on the host goes up to 300. I have the ioprio parameter to 0 for this vm and 7 for other but the problem is still the same. The process take a really long time to respond to a kill -9, more than 10 minutes.

The mysql table is really big (29Go) so the index creation must be I/O consuming. But a iostat on the host show no more than a 1Mo/s on I/O.

I am running Proxmox 1.5 with debian 5 on the host.


root@HWnode:~# pveversion -v
pve-manager: 1.5-10 (pve-manager/1.5/4822)
running kernel: 2.6.24-11-pve
proxmox-ve-2.6.24: 1.5-23
pve-kernel-2.6.18-2-pve: 2.6.18-5
pve-kernel-2.6.24-11-pve: 2.6.24-23
pve-kernel-2.6.24-10-pve: 2.6.24-21
qemu-server: 1.1-16
pve-firmware: 1.0-5
libpve-storage-perl: 1.0-13
vncterm: 0.9-2
vzctl: 3.0.23-1pve11
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1dso1
pve-qemu-kvm: 0.12.4-1

This problem is very annoying because I can't create any indexes in MySQL. And this defies the purpose of containers. How a single container can make the others unstable. I can understand a container slowing down the others but in this case even postfix daemon on an other container can't write mail to the user mailbox.

If anyone has an idea on what is causing this problem any help will be appreciated.

Best.
 
This is the ouput for pveperf but only when the server is running normally. I can't send you this output when the server is blocked. Maybe I will be able to do it later.

CPU BOGOMIPS: 32000.41
REGEX/SECOND: 665334
HD SIZE: 9.92 GB (/dev/md1)
BUFFERED READS: 17.62 MB/sec
AVERAGE SEEK TIME: 46.67 ms
FSYNCS/SECOND: 99.75
DNS EXT: 47.65 ms
DNS INT: 18.78 ms (ovh.net)
 
FSYNCS/SECOND: 99.75
DNS INT: 18.78 ms (ovh.net)
[/quote]
Ah - that explains it all. I had a similar situation with them - are you running hardware RAID? If so you need the battery backed RAID Controller cache unit (BBU). Your FSYNCs really need to be in the thousands as this is an indicator of the capability of your concurrent IO performance.

There are other things you can try (enable DMA in your guests) but to be honest, until you solve those FSYNCs you are polishing a pig....

Col
 
looks like you did what we NEVER recommend - running softraid. Go for hardware raid with BBU and enable write cache!
 
Thank you for the quick answers.

Unfortunately my dedicated server reseller has set up the server with softRaid and I am not sure how to disable it.

I have tested pveperf on an old server without softraid and I can see the fsyncs/second are ten times higher.

Since I can't change the configuration right now I will look into mysql to avoid to many fsync.

I still have a question. Why openvz doesn't prevent the number of fsync per VM or at least force the waiting ones on the other containers based on ioprio?
 
I still have a question. Why openvz doesn't prevent the number of fsync per VM or at least force the waiting ones on the other containers based on ioprio?

I guess thats a question for the OpenVZ forum, because it requires deep knowledge on the underlying implementation.
 
Hi

I have a new server with hardware RAID1 and this improved a little bit but it's not great. Now the host and the others VM are not totally blocked by a single VM. But the response time is very low. more than a minute to serve a php page when it takes usually less than a second. For information the fsyncs/seconds where around 130.

I tried the last kernel from OpenVZ (2.6.32-budarin.1 ) and it seams to have solved my problem.
I can load a virtual machine with a lot of I/O without impacting too much the others. I get a response time around 1 second for my php web page.

But for pveperf the fsync/seconds are very low. around 28.
So I am not sure this benchmark is really reliable.
 
For information the fsyncs/seconds where around 130.

this is very slow! That value should be above 1000

I tried the last kernel from OpenVZ (2.6.32-budarin.1 ) and it seams to have solved my problem.
I can load a virtual machine with a lot of I/O without impacting too much the others. I get a response time around 1 second for my php web page.

But for pveperf the fsync/seconds are very low. around 28.
So I am not sure this benchmark is really reliable.

We already reported that bug to the debian kernel team:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=591778
 
I will follow the bug to see if there is any solutions. Do you have an idea of when you will release a 2.6.32 kernel with OpenVZ?
 
I will follow the bug to see if there is any solutions. Do you have an idea of when you will release a 2.6.32 kernel with OpenVZ?

When those bugs are fixed -currently there are two blocker:

1.) slow fsync rate
2.) KSM does not work at all
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!