Slow restart of OpenVZ guests

zzhjkrqlne

Renowned Member
Oct 16, 2008
38
0
71
Hi,

I'm seeing abnormal restart times on OpenVZ guests after upgrading to Proxmox 1.1 the other day, and was wondering if anyone else is seeing the same.

I've followed steps on http://pve.proxmox.com/wiki/Downloads to upgrade from 1.0, except that I recompiled the kernel from the source available at ftp://pve.proxmox.com/sources/pve-kernel-2.6.24_2009-01-15.tar.gz with a patch from http://openamt.svn.sourceforge.net/viewvc/openamt/amt-rescue-cd/patches/linux-2.6.25.rc8-ider.patch for IDE redirection support on AMT platform.

On 1.0, a standard OpenVZ CentOS 5 guest takes average of 10 seconds to restart, whereas on 1.1 it takes 2 minutes for the same thing.

Upon running strace on the restart process, I see the following on the Proxmox 1.1 installation. (trimmed to show relevant parts only)

# strace -tt vzctl restart
...
10:12:54.169016 write(1, "Stopping container ...\n", 23) = 23
10:12:54.169085 gettimeofday({1233184374, 169104}, NULL) = 0
10:12:54.169120 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=785, ...}) = 0
10:12:54.169176 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=785, ...}) = 0
10:12:54.169233 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=785, ...}) = 0
10:12:54.169301 write(3, "2009-01-29T10:12:54+1100 vzctl :"..., 65) = 65
10:12:54.169379 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, chi
ld_tidptr=0x7f19eb11d760) = 8148
10:12:54.169676 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
10:12:54.169739 rt_sigaction(SIGCHLD, NULL, {SIG_IGN}, 8) = 0
10:12:54.169795 nanosleep({1, 0}, {1, 0}) = 0
10:12:55.169914 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
10:12:55.169990 ioctl(4, 0x400c2e05, 0x7ffff3127360) = 0
10:12:55.170070 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
10:12:55.170127 rt_sigaction(SIGCHLD, NULL, {SIG_IGN}, 8) = 0
10:12:55.170182 nanosleep({1, 0}, {1, 0}) = 0

[last 5 lines above keeps repeating]

10:14:53.211823 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
10:14:53.211877 ioctl(4, 0x400c2e05, 0x7ffff3127360) = 0
10:14:53.211925 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
10:14:53.211965 rt_sigaction(SIGCHLD, NULL, {SIG_IGN}, 8) = 0
10:14:53.212017 nanosleep({1, 0}, {1, 0}) = 0
10:14:54.212128 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
10:14:54.212199 ioctl(4, 0x400c2e05, 0x7ffff3127360) = 0
10:14:54.212279 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, chi
ld_tidptr=0x7f19eb11d760) = 8371
10:14:54.212689 nanosleep({0, 500000000}, NULL) = 0
10:14:54.712771 ioctl(4, 0x400c2e05, 0x7ffff3127360) = -1 ESRCH (No such process)
10:14:54.712871 write(1, "Container was stopped\n", 22) = 22
10:14:54.713008 gettimeofday({1233184494, 713026}, NULL) = 0

As you can see from above, it almost takes 2 minutes to shut down the guest while waiting for something.

Any suggestions as to what might be causing such delay?
 
I guess that is related to the new init-logger implementation. Do you observer that on all templates, or only centos?
 
I guess that is related to the new init-logger implementation. Do you observer that on all templates, or only centos?

I've only tested out CentOS 4 & CentOS 5 templates, as those are the only ones I'm using at this time. Tried restarting the CentOS 4 guest after reading your post, and it restarts much quicker at 10 seconds or so.

How does new implementation of init-logger have such impact on Proxmox 1.1 and not on 1.0 (and only on CentOS 5 template), and are there any ways around it you can think of?

Thank you for your help.
 
How does new implementation of init-logger have such impact on Proxmox 1.1 and not on 1.0 (and only on CentOS 5 template), and are there any ways around it you can think of?

What I found out so far is that it happens when one service does not correctly daemonize itself on startup. What services/programs do you run? Is there a way to reproduce that behaviour?
 
What I found out so far is that it happens when one service does not correctly daemonize itself on startup. What services/programs do you run? Is there a way to reproduce that behaviour?

It's just a pretty plain CentOS 5 guest, with dnscache running on it via daemontools.

I haven't had a chance to look into this further yet as I've reverted back to the previous version for now. I'll try downloading the fresh copy of appliance from http://download.proxmox.com/appliances/system/centos-5-standard_5.2-1_i386.tar.gz and see if I can reproduce this with it.
 
It's just a pretty plain CentOS 5 guest, with dnscache running on it via daemontools.

Tested a little more, and found that daemontools was causing the CentOS 5 guest to restart slowly. Once daemontools was turned off, CentOS 5 guest restarts in a matter of seconds.

No idea why, but I guess it's easier to find an equivalent program to replace daemontools on my setup.