Unable to stop container: operation timed out

Y

yonatan

Guest
I have one container stuck , cant power it on , or turn it off.
in both cases it just hangs, and i cannot kill the hanging processes
after some investigation i have found:

Code:
 vzctl --skiplock stop 146
Stopping container ...
Unable to stop container: operation timed out

 vzlist | grep 146
       146          2 running   x.x.x.1  server1.domain.info


# lsof /vz/root/146/
COMMAND    PID USER   FD   TYPE DEVICE  SIZE     NODE NAME
shutdown  3437 root  cwd    DIR   0,89  4096 73836224 /var/lib/vz/root/146
shutdown  3437 root  rtd    DIR   0,89  4096 73836224 /var/lib/vz/root/146
shutdown  3437 root  txt    REG   0,89 18328 73836332 /var/lib/vz/root/146/sbin/
shutdown 14692 root  cwd    DIR   0,89  4096 73836224 /var/lib/vz/root/146
shutdown 14692 root  rtd    DIR   0,89  4096 73836224 /var/lib/vz/root/146
shutdown 14692 root  txt    REG   0,89 18328 73836332 /var/lib/vz/root/146/sbin/


# pveversion
pve-manager/1.8/6070

# cat /etc/debian_version
5.0.8

# uname -r
2.6.32-4-pve


any suggestion on next course of action besides a power cycle ?
( which means 20 minutes downtime and 30 minutes of booting kvm machines and ovz containers and should be avoided at all costs. ).
 
sorry for the thread bump, i got some new information on this , and i would like the share it.

this happens on any ovz vm.

ps to grep the shutdown commands on the hardware node shows:

3437 ? Ds 0:00 shutdown -h 0 w TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin:/usr/X11R6/bin:/root/bin LOGNAME=root USER=root USERHELPER_UID=0 HOME=/root
1628 ? Ds 0:00 shutdown -h 0 w TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin:/usr/X11R6/bin:/root/bin LOGNAME=root USER=root USERHELPER_UID=0 HOME=/root


last time i checked, no way to kill zombies without a reboot, i really hope anyone could shed some light on this issue for me.

im left with 2 stuck containers ,

vzctl id start indicates that the container is already running.
vzctl stop does not stop the container.
 
vzctl id start indicates that the container is already running.
vzctl stop does not stop the container.

Are you able to enter those containers:

# vzctl enter <vmid>

What processes are running inside?

Any hints in syslog or init.log?
 
Are you able to enter those containers:

# vzctl enter <vmid>

What processes are running inside?

Any hints in syslog or init.log?

Hi,
the syslog and init.log shows the normal operation log, nothing out of the order,
these are the processes inside:

ps to grep the shutdown commands on the hardware node shows:

3437 ? Ds 0:00 shutdown -h 0 w TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin:/usr/X11R6/bin:/root/bin LOGNAME=root USER=root USERHELPER_UID=0 HOME=/root
1628 ? Ds 0:00 shutdown -h 0 w TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin:/usr/X11R6/bin:/root/bin LOGNAME=root USER=root USERHELPER_UID=0 HOME=/root


when i vzlist, the NPROC shows 2 or 1 procs running, when i investigate which procs , its the "shutdown" operation.

thing is , this is a production server, and i had to power cycle, it due to an event of a simple "reboot" from inside the VM.
so currently i don't have the case in front of me anymore due to the reboot.

the shutdown zombie from hell !

is this a bug in ovz?
 
and 'kill -9 PID' does not help?



What is the output of

# pveversion -v

kill -9 won't kill a zombie process, the command is issued but there is no affect to the actual process.

I have tried to chroot to the vm private and mount proc in order to kill it from there, it didn't help , had the same problem.
I even went to the extent of cd /proc/ , and try dealing with it directly via the /proc mountpoint, which leaded to " operation not permitted " error when i try to touch any of the directives under the PID.


I think i am updated up to the latest version:

# pveversion -v
pve-manager: 1.8-18 (pve-manager/1.8/6070)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.8-33
pve-kernel-2.6.32-4-pve: 2.6.32-33
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.28-1pve1
vzdump: 1.2-14
vzprocps: 2.0.11-2
vzquota: 3.0.11-1dso1
pve-qemu-kvm: 0.14.1-1
ksm-control-daemon: 1.0-6
 
kill -9 won't kill a zombie process, the command is issued but there is no affect to the actual process.

Well, that is true ;-) So that looks like an OpenVZ bug. Do you have a way to reproduce/trigger that bug?

Besides, the only 'stable' OpenVZ branch is the 2.6.18 kernel. So if you use OpenVZ on such production machines it may be better to use
kernel 2.6.18.

- Dietmar
 
Well, that is true ;-) So that looks like an OpenVZ bug. Do you have a way to reproduce/trigger that bug?

Besides, the only 'stable' OpenVZ branch is the 2.6.18 kernel. So if you use OpenVZ on such production machines it may be better to use
kernel 2.6.18.

- Dietmar

Can't test drive on this production machine, when the ovz is stuck that means a reboot.
i am using it for both kvm and ovz , so the kernel is 2.6.32X ...

never had this issue, it came to life after the last apt-get update
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!