Unable to stop container: operation timed out

well i now have 2.6.18 running on our severs. as we use KVM and open-vz we had to

Code:
aptitude install pve-qemu-kvm-2.6.18 pve-kernel-2.6.18-4-pve
and check /boot/grub/menu.lst to make sure the 2.6.18 kernel is the default

so I'll re enable backing up all containers . I had been backing up all but the ldap one.

if there are any problems I'll post info.

also this is our current config info:
Code:
pveversion -v
pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.18-4-pve
pve-kernel-2.6.18-4-pve: 2.6.18-10
qemu-server: 1.1-25
pve-firmware: 1.0-9
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-9
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
 
arf ... I just did a updated Proxmox 1.7 That is 6 days on my 10servers, I can not afford to recut: /

But we shall know if it solved the problem

regards
 
Yes, I know but I can not afford to cut their services 300 clients you see the things its going to scream to the support;)

I expect either the patch or if an update prod on my side of I downgrade the server

And especially whether 2.6.18 is really better, and the driver Intel ® 82576 controller works well on the nod and the vm (download bug has 5K/20K)
 
Hello,
We just updated on 2.6.32 and pve-manager 1.7, and a OpenVz container with just httpd freezed...unable to stop it in a decent way so we have rebooted the hard node...

The ways we tried to stop it :

Method 1 :
- "pstree -nup | grep init", to look after the pid of the init
- for each pid, execute "vzpid pid" and you will find your VM's pid
- once determined this pid , with pstree -nup, kill the whole childs of your init
For us, no child for the freezed VM

Method 2 :
- remove ctid.lck dans /var/lib/vz/lock
- run vzctl chkpnt ctid --kill
For us, do not work

We wil go back to 2.6.18 or try KVM for more stability...
 
I had this problem on 2.6.24 couple of weeks ago, and now on 2.6.32 as well.

- On 2.6.24 everything was fine until the snapshot backup started, after that none of the VE's could be stopped, and I wasn't even able to log in the webinterface (timeout).
- On 2.6.32 webinterface works, but none of the VE's can be stopped or restarted (timeout). Also the 2.6.32 system had a minimum of 10 load caused by the kernel, since no task was showing high CPU usage.

Neither of them can be restarted via normal init process, only shutdown -n is able to reset the host ("do not go through 'init' but go down real fast.")

I reckon it's somehow connected to LVM and snapshots, because it only happens after snapshot backups.
 
Last edited:
is that for openvz or kvm? for us openvz restarts have worked. however I have not tested all our vm's.

also I'm using the 2.6.18 series, as we mainly use openvz. here is a newly installed from debian-6.0-standard_6.0-4 template vz , init 6:


root@fbc152 ~ # date
Thu Jun 2 12:31:16 EDT 2011
root@fbc152 ~ # init 6
root@fbc152 ~ # Connection to fbc152 closed by remote host.
Connection to fbc152 closed.
proxmox4: ~ # ssh fbc152
Linux fbc152 2.6.32-4-pve #1 SMP Wed Nov 24 05:32:29 CET 2010 i686
------------------------------------
vm 2152 fbc152 ldap slave server

fresh squeeze install
-----------------------------------
Last login: Thu Jun 2 12:31:08 2011 from 10.0.7.4
root@fbc152 ~ # date
Thu Jun 2 12:31:28 EDT 2011
 
I got the same result using vzctl . i wanted to check in case somehow ssh and vzctl had differences.


proxmox4: ~ # vzctl enter 2152
entered into CT 2152
root@fbc152 / # date
Thu Jun 2 12:35:30 EDT 2011
root@fbc152 / # init 6
root@fbc152 / # got signal 15
exited from CT 2152
proxmox4: ~ # ssh fbc152
Linux fbc152 2.6.32-4-pve #1 SMP Wed Nov 24 05:32:29 CET 2010 i686
------------------------------------
vm 2152 fbc152 ldap slave server

fresh squeeze install
-----------------------------------
Last login: Thu Jun 2 12:31:27 2011 from 10.0.7.4
root@fbc152 ~ # date
Thu Jun 2 12:35:40 EDT 2011
 
Here is a nice little script to quickly do method 1


Code:
[COLOR=#000000][FONT=verdana]#!/bin/sh[/FONT][/COLOR]
[COLOR=#000000][FONT=verdana]echo "Enter the parent process ID"[/FONT][/COLOR]
[COLOR=#000000][FONT=verdana]read ppid[/FONT][/COLOR]
[COLOR=#000000][FONT=verdana]for i in `ps -ef| awk '$3 == '${ppid}' { print $2 }'`[/FONT][/COLOR]
[COLOR=#000000][FONT=verdana]do[/FONT][/COLOR]
[COLOR=#000000][FONT=verdana]echo killing $i[/FONT][/COLOR]
[COLOR=#000000][FONT=verdana]kill -9 $i[/FONT][/COLOR]
[COLOR=#000000][FONT=verdana]done[/FONT][/COLOR]

sadly .. I am unable to kill any child processes in my containers!!