Unable to stop container: operation timed out

bread-baker · Dec 22, 2010

well i now have 2.6.18 running on our severs. as we use KVM and open-vz we had to

Code:

aptitude install pve-qemu-kvm-2.6.18 pve-kernel-2.6.18-4-pve

and check /boot/grub/menu.lst to make sure the 2.6.18 kernel is the default

so I'll re enable backing up all containers . I had been backing up all but the ldap one.

if there are any problems I'll post info.

also this is our current config info:

Code:

pveversion -v
pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.18-4-pve
pve-kernel-2.6.18-4-pve: 2.6.18-10
qemu-server: 1.1-25
pve-firmware: 1.0-9
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-9
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1

sytry · Dec 22, 2010

ok so you have to downgrade the kernel?

bread-baker · Dec 22, 2010

sytry said:
ok so you have to downgrade the kernel?

yes.

but make sure to boot 2.6.18 as the 2.6.32 one will probably be the default .

sytry · Dec 22, 2010

arf ... I just did a updated Proxmox 1.7 That is 6 days on my 10servers, I can not afford to recut: /

But we shall know if it solved the problem

regards

bread-baker · Dec 22, 2010

sytry said:
arf ... I just did a updated Proxmox 1.7 That is 6 days on my 10servers, I can not afford to recut: /

But we shall know if it solved the problem

regards

you can use Proxmox 1.7 with a 2.6.18 kernel

sytry · Dec 22, 2010

Yes, I know but I can not afford to cut their services 300 clients you see the things its going to scream to the support

I expect either the patch or if an update prod on my side of I downgrade the server

And especially whether 2.6.18 is really better, and the driver Intel ® 82576 controller works well on the nod and the vm (download bug has 5K/20K)

bread-baker · Dec 29, 2010

we have been running for over a week with out the problem. so 2.6.18 kernel seems to be more stable for us.

sytry · Jan 6, 2011

ok, nice

thanks for feedback

bread-baker · Jan 6, 2011

our system is still running normal proxmox super stable.

have you switched back to 2.6.18 ?

ltor · Jan 26, 2011

Hello,
We just updated on 2.6.32 and pve-manager 1.7, and a OpenVz container with just httpd freezed...unable to stop it in a decent way so we have rebooted the hard node...

The ways we tried to stop it :

Method 1 :
- "pstree -nup | grep init", to look after the pid of the init
- for each pid, execute "vzpid pid" and you will find your VM's pid
- once determined this pid , with pstree -nup, kill the whole childs of your init
For us, no child for the freezed VM

Method 2 :
- remove ctid.lck dans /var/lib/vz/lock
- run vzctl chkpnt ctid --kill
For us, do not work

We wil go back to 2.6.18 or try KVM for more stability...

bread-baker · Jan 26, 2011

open-vz is for us is totally stable using 2.6.18

SuSt · Feb 10, 2011

This bug has been fixed in the 042test006.1 OpenVZ kernel version: http://wiki.openvz.org/Download/kernel/rhel6/042test006.1#Kernel_patch
Also, see the related topic on this forum: http://forum.proxmox.com/threads/4997-openvz-server-and-NFS-gt-problem-with-quot-stop-quot
Discussion of this bug in OpenVZ's bugzilla: http://bugzilla.openvz.org/show_bug.cgi?id=1626

sytry · Feb 10, 2011

bread-baker said:
our system is still running normal proxmox super stable.

have you switched back to 2.6.18 ?

YEs one nod for test

but other no bug :/

kernel 2.6.18 don't run archi-linux :/

for this patch going to fix the kernel Proxmox?

gkovacs · Jun 2, 2011

I had this problem on 2.6.24 couple of weeks ago, and now on 2.6.32 as well.

- On 2.6.24 everything was fine until the snapshot backup started, after that none of the VE's could be stopped, and I wasn't even able to log in the webinterface (timeout).
- On 2.6.32 webinterface works, but none of the VE's can be stopped or restarted (timeout). Also the 2.6.32 system had a minimum of 10 load caused by the kernel, since no task was showing high CPU usage.

Neither of them can be restarted via normal init process, only shutdown -n is able to reset the host ("do not go through 'init' but go down real fast.")

I reckon it's somehow connected to LVM and snapshots, because it only happens after snapshot backups.

bread-baker · Jun 2, 2011

is that for openvz or kvm? for us openvz restarts have worked. however I have not tested all our vm's.

also I'm using the 2.6.18 series, as we mainly use openvz. here is a newly installed from debian-6.0-standard_6.0-4 template vz , init 6:

root@fbc152 ~ # date
Thu Jun 2 12:31:16 EDT 2011
root@fbc152 ~ # init 6
root@fbc152 ~ # Connection to fbc152 closed by remote host.
Connection to fbc152 closed.
proxmox4: ~ # ssh fbc152
Linux fbc152 2.6.32-4-pve #1 SMP Wed Nov 24 05:32:29 CET 2010 i686
------------------------------------
vm 2152 fbc152 ldap slave server

fresh squeeze install
-----------------------------------
Last login: Thu Jun 2 12:31:08 2011 from 10.0.7.4
root@fbc152 ~ # date
Thu Jun 2 12:31:28 EDT 2011

bread-baker · Jun 2, 2011

I got the same result using vzctl . i wanted to check in case somehow ssh and vzctl had differences.

proxmox4: ~ # vzctl enter 2152
entered into CT 2152
root@fbc152 / # date
Thu Jun 2 12:35:30 EDT 2011
root@fbc152 / # init 6
root@fbc152 / # got signal 15
exited from CT 2152
proxmox4: ~ # ssh fbc152
Linux fbc152 2.6.32-4-pve #1 SMP Wed Nov 24 05:32:29 CET 2010 i686
------------------------------------
vm 2152 fbc152 ldap slave server

fresh squeeze install
-----------------------------------
Last login: Thu Jun 2 12:31:27 2011 from 10.0.7.4
root@fbc152 ~ # date
Thu Jun 2 12:35:40 EDT 2011

Petrus4 · Aug 31, 2012

Here is a nice little script to quickly do method 1

Code:

[COLOR=#000000][FONT=verdana]#!/bin/sh[/FONT][/COLOR]
[COLOR=#000000][FONT=verdana]echo "Enter the parent process ID"[/FONT][/COLOR]
[COLOR=#000000][FONT=verdana]read ppid[/FONT][/COLOR]
[COLOR=#000000][FONT=verdana]for i in `ps -ef| awk '$3 == '${ppid}' { print $2 }'`[/FONT][/COLOR]
[COLOR=#000000][FONT=verdana]do[/FONT][/COLOR]
[COLOR=#000000][FONT=verdana]echo killing $i[/FONT][/COLOR]
[COLOR=#000000][FONT=verdana]kill -9 $i[/FONT][/COLOR]
[COLOR=#000000][FONT=verdana]done[/FONT][/COLOR]

sadly .. I am unable to kill any child processes in my containers!!

Search

Search

Unable to stop container: operation timed out

bread-baker

Member

sytry

New Member

bread-baker

Member

sytry

New Member

bread-baker

Member

sytry

New Member

bread-baker

Member

sytry

New Member

bread-baker

Member

ltor

Member

bread-baker

Member

SuSt

Well-Known Member

sytry

New Member

gkovacs

Renowned Member

bread-baker

Member

bread-baker

Member

Petrus4

Member