Problems with shutting down container

zzhjkrqlne

Renowned Member
Oct 16, 2008
38
0
71
I have a OpenVZ container that sometimes refuses to shut down properly. All other containers in the same server shuts down without problems.

Code:
# vzlist -a |grep 101
      101          0 running   -               yyy.xxx.com

Code:
# vzctl stop 101
Stopping container ...
Child xxxxxx exited with status 7
Killing container ...
Child xxxxxx exited with status 7
Unable to stop container

Code:
# pveversion -v
pve-manager: 2.2-32 (pve-manager/2.2/3089a616)
running kernel: 2.6.32-17-pve
proxmox-ve-2.6.32: 2.2-83
pve-kernel-2.6.32-17-pve: 2.6.32-83
pve-kernel-2.6.32-16-pve: 2.6.32-82
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-34
qemu-server: 2.0-72
pve-firmware: 1.0-21
libpve-common-perl: 1.0-41
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.3-10
ksm-control-daemon: 1.1-1

Code:
# cat /etc/vz/conf/101.conf
ONBOOT="yes"


# Primary parameters
NUMPROC="1024:1024"
NUMTCPSOCK="9223372036854775807:9223372036854775807"
NUMOTHERSOCK="9223372036854775807:9223372036854775807"
VMGUARPAGES="0:unlimited"


# Secondary parameters
KMEMSIZE="1951399936:2147483648"
OOMGUARPAGES="0:unlimited"
PRIVVMPAGES="unlimited"
TCPSNDBUF="9223372036854775807:9223372036854775807"
TCPRCVBUF="9223372036854775807:9223372036854775807"
OTHERSOCKBUF="9223372036854775807:9223372036854775807"
DGRAMRCVBUF="9223372036854775807:9223372036854775807"


# Auxiliary parameters
NUMFILE="9223372036854775807:9223372036854775807"
NUMFLOCK="9223372036854775807:9223372036854775807"
NUMPTY="255:255"
NUMSIGINFO="1024:1024"
DCACHESIZE="975175680:1073741824"
LOCKEDPAGES="524288"
SHMPAGES="9223372036854775807:9223372036854775807"
NUMIPTENT="9223372036854775807:9223372036854775807"
PHYSPAGES="0:1048576"


# Disk quota parameters
DISKSPACE="52428800:57671680"
DISKINODES="10000000:11000000"
QUOTATIME="0"
QUOTAUGIDLIMIT="0"


# CPU fair sheduler parameter
CPUS="2"
CPUUNITS="1000"


DESCRIPTION="CentOS 6 (64)"
DEVICES="c:108:0:rw "
FEATURES="ppp:on "
HOSTNAME="yyy.xxx.com"
IP_ADDRESS=""
IPTABLES="ip_conntrack ip_nat_ftp ipt_LOG ipt_REDIRECT ipt_REJECT ipt_TCPMSS ipt_TOS ipt_length ipt_limit ipt_multiport ipt_owner ipt_state ipt_tcpmss ipt_tos ipt_ttl iptable_filter iptable_mangle iptable_nat"
IPV6="no"
NAMESERVER="x.x.x.x y.y.y.y"
NETIF="ifname=eth0,mac=xx:xx:xx:xx:xx:xx,host_ifname=veth101.0,host_mac=yy:yy:yy:yy:yy:yy"
NOATIME="yes"
ORIGIN_SAMPLE="pve.auto"
OSTEMPLATE="centos-6-x86_64"
SEARCHDOMAIN="xxx.com"
SWAPPAGES="0:32G"

Only difference this container has from the rest of the other containers running on the same server is the following.

Code:
DEVICES="c:108:0:rw "
FEATURES="ppp:on "

Also, this problem doesn't happen on all the time but once in every few shutdowns.

Any ideas?
 
When it's running, it has the following running inside of it.

Code:
# vzctl exec 101 ps ax
  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:00 init
    2 ?        S      0:00 [kthreadd/101]
    3 ?        S      0:00 [khelper/101]
  125 ?        S<s    0:00 /sbin/udevd -d
  433 ?        Sl     0:00 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
  457 ?        Ss     0:00 /usr/sbin/sshd
  698 ?        Ss     0:02 /usr/libexec/postfix/master
  703 ?        S      0:00 qmgr -l -t fifo -u
  710 ?        S      0:00 /usr/sbin/zabbix_agentd
  711 ?        S      1:01 /usr/sbin/zabbix_agentd
  712 ?        S      0:00 /usr/sbin/zabbix_agentd
  713 ?        S      0:00 /usr/sbin/zabbix_agentd
  714 ?        S      0:00 /usr/sbin/zabbix_agentd
  715 ?        S      0:15 /usr/sbin/zabbix_agentd
  720 ?        Ss     0:01 crond
  788 ?        S<     0:00 /sbin/udevd -d
  793 ?        S<     0:00 /sbin/udevd -d
 7526 ?        Ss     0:08 lfd - sleeping
 8382 ?        Ss     0:01 racoon
11538 ?        Ssl    0:48 /usr/sbin/pdns_recursor --daemon
11555 ?        Ssl    0:00 /usr/sbin/accel-pppd -d -c /etc/accel-ppp.conf
18739 ?        S      0:00 pickup -l -t fifo -u
20597 ?        Rs     0:00 ps ax

All the processes are pretty much standard on other containers that doesn't have problems, except accel-pppd for providing PPP access, and racoon.
 
Suggestion, vzctl enter 101 and kill off one process at a time and then from a second ssh session to the root of the HN and attempt to stop the CT. Then kill one by one each process to see which process may be interfering with the kill signal for the vm.

I have seen on occasion where a CT will be extremely slow in shutdown when either the CT or HN server the is under a load and have had to wait upwards of 10 minutes for it to stop. I know I get the itchy finger syndrome but patience has paid off. Now with Windows, different story.
 
Suggestion, vzctl enter 101 and kill off one process at a time and then from a second ssh session to the root of the HN and attempt to stop the CT. Then kill one by one each process to see which process may be interfering with the kill signal for the vm.

Thanks for the reply and suggestions.

I would have thought that the following suggests that the container has no processes left running.

Code:
# vzlist -a |grep 101
      101          0 running   -               yyy.xxx.com

I have seen on occasion where a CT will be extremely slow in shutdown when either the CT or HN server the is under a load and have had to wait upwards of 10 minutes for it to stop. I know I get the itchy finger syndrome but patience has paid off. Now with Windows, different story.

In my case, neither the container nor the hardware node are under any kind of load. Also, it was left in that state for a couple of days, with no further changes.
 
In that scenario, what response do you get from vzctl enter 101. You should get no response from the command and it should hang. if you do get a shell response from 101, what does ps reveal.

I have only seen 2 occasions on ct shutown it actually refused to shut down. 0n the HN I had gone massively into swap file and the other time again hight swap usage and the other time neith ssh or vzctl worked. That ct has been up and down many times due to hn updates.

All of the cts are from the same tempate ?

Nothing abnormal in any of the log files on either the hn or the ct

Do you get the same response from within the ct to shutdown cmd.

What happens when you do vzctl stop with the fast function

Since this seems to be fairly consistent, I would try the kill processes and maintain a shell sesson open to that ct and then either stop or restart the container. The only other aproach is enter the vmid and kill off all processes and halt internally
This is one of those eliminate one at a time and read logs.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!