LXC - Process ps hang - 4.1-1/2f9650d4 (running kernel: 4.2.6-1-pve)

is-max

New Member
Jan 29, 2015
15
1
3
Hi,

We'v on our LXC container, a php script wich exec some "ps aux | grep something" for check
if a process is running or not.

I'v noticed that it hang on the "ps" command, wich is not avaiable to strace too.

root 8708 0.0 0.0 4408 768 ? Ss Mar04 0:00 | \_ /bin/sh -c /usr/bin/php /var/www/cron/syncLauncher.php >> /dev/null 2>&1
root 8709 0.0 0.0 225360 29876 ? S Mar04 0:00 | \_ /usr/bin/php /var/www/cron/syncLauncher.php
root 8727 0.0 0.0 4408 684 ? S Mar04 0:00 | \_ sh -c ps aux | grep "syncLauncher.php" | grep -v grep | grep -v /bin/sh | wc -l
root 8729 0.0 0.0 8284 1352 ? S Mar04 0:00 | \_ ps aux
root 8730 0.0 0.0 8112 2124 ? S Mar04 0:00 | \_ grep syncLauncher.php
root 8731 0.0 0.0 8112 2056 ? S Mar04 0:00 | \_ grep -v grep
root 8732 0.0 0.0 8112 2168 ? S Mar04 0:00 | \_ grep -v /bin/sh
root 8733 0.0 0.0 5896 1684 ? S Mar04 0:00 | \_ wc -l
--
root 6508 0.0 0.0 4408 772 ? Ss Mar04 0:00 | \_ /bin/sh -c /usr/bin/php /var/www/cron/syncLauncher.php >> /dev/null 2>&1
root 6509 0.0 0.0 225356 30052 ? S Mar04 0:00 | \_ /usr/bin/php /var/www/cron/syncLauncher.php
root 6889 0.0 0.0 4408 692 ? S Mar04 0:00 | \_ sh -c ps aux | grep -cw "[ S]yncFilesByIP.php 1.2.3.4"
root 6890 0.0 0.0 8284 1356 ? S Mar04 0:00 | \_ ps aux
root 6891 0.0 0.0 8112 2132 ? S Mar04 0:00 | \_ grep -cw [ S]yncFilesByIP.php 1.2.3.4[/S][/S]
If I kill the grep command, nothing change, if I try to kill the "ps" command, it mark the process as "Dead".

With strace it hangs, as you could see there:
root@vmsrv02:~# strace -f -e verbose=all -v -p 6891
Process 6891 attached
read(0, ^CProcess 6891 detached
<detached ...>

root@vmsrv02:~# strace -f -e verbose=all -v -p 6890
Process 6890 attached
^C^C^C^C^C^C <-- It did not detach.

Any suggestion for fix it? For now, i'v added a "timeout" for the ps, hoping that it did not let the process block.

Thank you so much
Regards[/S][/S]
 
Could you post the output of "pveversion -v"? Also when including ps output (especially of "hanging" processes), please use "ps faxl" and if possible, include the whole output and not just parts of it.
 
Hi,

root@vmsrv02:~# pveversion -v
proxmox-ve: 4.1-26 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-1 (running version: 4.1-1/2f9650d4)
pve-kernel-4.2.6-1-pve: 4.2.6-26
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-41
pve-firmware: 1.1-7
libpve-common-perl: 4.0-41
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-17
pve-container: 1.0-32
pve-firewall: 2.0-14
pve-ha-manager: 1.0-14
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie
root@vmsrv02:~#

There is the ps fauxl requested:
1 0 30842 29810 20 0 19120 2268 hrtime Ss ? 0:01 \_ cron
1 0 8706 30842 20 0 33852 2300 wait S ? 0:00 | \_ CRON
4 0 8708 8706 20 0 4408 768 wait Ss ? 0:00 | \_ /bin/sh -c /usr/bin/php /var/www/cron/syncLauncher.php >> /dev/null 2>&1
0 0 8709 8708 20 0 225360 29876 pipe_w S ? 0:00 | \_ /usr/bin/php /var/www/cron/syncLauncher.php
0 0 8727 8709 20 0 4408 684 wait S ? 0:00 | \_ sh -c ps aux | grep "syncLauncher.php" | grep -v grep | grep -v /bin/sh | wc -l
0 0 8729 8727 20 0 8284 1352 reques S ? 0:00 | \_ ps aux
0 0 8730 8727 20 0 8112 2124 pipe_w S ? 0:00 | \_ grep syncLauncher.php
0 0 8731 8727 20 0 8112 2056 pipe_w S ? 0:00 | \_ grep -v grep
0 0 8732 8727 20 0 8112 2168 pipe_w S ? 0:00 | \_ grep -v /bin/sh
0 0 8733 8727 20 0 5896 1684 pipe_w S ? 0:00 | \_ wc -l

--

1 0 5169 3421 20 0 19120 2256 hrtime Ss ? 0:01 \_ cron
1 0 6507 5169 20 0 33852 2348 wait S ? 0:00 | \_ CRON
4 0 6508 6507 20 0 4408 772 wait Ss ? 0:00 | \_ /bin/sh -c /usr/bin/php /var/www/cron/syncLauncher.php >> /dev/null 2>&1
0 0 6509 6508 20 0 225356 30052 pipe_w S ? 0:00 | \_ /usr/bin/php /var/www/cron/syncLauncher.php
0 0 6889 6509 20 0 4408 692 wait S ? 0:00 | \_ sh -c ps aux | grep -cw "[ S]yncFilesByIP.php 1.2.3.4"
0 0 6890 6889 20 0 8284 1356 reques D ? 0:00 | \_ ps aux
0 0 6891 6889 20 0 8112 2132 pipe_w S ? 0:00 | \_ grep -cw [ S]yncFilesByIP.php 1.2.3.4[/S][/S]

:~# ps faxl
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
1 0 6894 0 20 0 1633016 1672 futex_ S ? 0:00 /usr/bin/lxcfs -f -s -o allow_other /var/lib/lxcfs/
4 0 1 0 20 0 24216 3168 poll_s Ss ? 0:00 /sbin/init
1 0 419 1 20 0 17240 168 poll_s S ? 0:00 upstart-udev-bridge --daemon
5 0 459 1 20 0 21464 2296 ep_pol Ss ? 0:00 /sbin/udevd --daemon
5 0 11743 459 20 0 21464 812 ep_pol S ? 0:00 \_ /sbin/udevd --daemon
5 0 11744 459 20 0 21460 812 ep_pol S ? 0:00 \_ /sbin/udevd --daemon
5 0 552 1 20 0 19208 2276 poll_s Ss ? 0:00 rpcbind -w
1 0 632 1 20 0 15196 204 poll_s S ? 0:00 upstart-socket-bridge --daemon
4 0 853 1 20 0 50044 5064 poll_s Ss ? 0:00 /usr/sbin/sshd -D
4 0 4147 853 20 0 73564 5836 poll_s Ss ? 0:00 \_ sshd: root@pts/2
4 0 4159 4147 20 0 19700 3696 wait Ss pts/2 0:00 \_ -bash
0 0 4258 4159 20 0 8420 1280 - R+ pts/2 0:00 \_ ps faxl
5 103 1103 1 20 0 24652 5712 poll_s Ss ? 0:00 rpc.statd -L
5 101 1143 1 20 0 173852 4132 poll_s Sl ? 0:17 rsyslogd -c5
1 0 1266 1 20 0 25548 224 ep_pol Ss ? 0:00 rpc.idmapd
1 0 1552 1 20 0 19120 2256 hrtime Ss ? 0:01 cron
1 0 6847 1552 20 0 33852 2348 wait S ? 0:00 \_ CRON
4 0 6848 6847 20 0 4408 772 wait Ss ? 0:00 \_ /bin/sh -c /usr/bin/php /var/www/cron/syncLauncher.php >> /dev/null 2>&1
0 0 6849 6848 20 0 225356 30052 pipe_w S ? 0:00 \_ /usr/bin/php /var/www/cron/syncLauncher.php
0 0 6890 6849 20 0 4408 692 wait S ? 0:00 \_ sh -c ps aux | grep -cw "[ S]yncFilesByIP.php 1.2.3.4"
0 0 6891 6890 20 0 8284 1356 reques D ? 0:00 \_ ps aux
0 0 6892 6890 20 0 8112 2132 pipe_w S ? 0:00 \_ grep -cw [ S]yncFilesByIP.php 1.2.3.4
5 0 1607 1 20 0 30208 11448 poll_s Ss ? 2:28 /usr/sbin/openvpn --writepid /var/run/openvpn.openvpn.pid --daemon ovpn-openvpn --cd /etc/openvpn --config /etc/openvpn/
5 0 1626 1 20 0 31452 4352 poll_s Ss ? 0:11 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 105:107
4 104 1649 1 20 0 624948 51852 poll_s Ssl ? 1:44 /usr/sbin/mysqld
5 0 1691 1 20 0 259912 20256 poll_s Ss ? 0:07 /usr/sbin/apache2 -k start
5 33 7876 1691 20 0 259936 7696 inet_c S ? 0:00 \_ /usr/sbin/apache2 -k start
5 33 7877 1691 20 0 259936 7696 inet_c S ? 0:00 \_ /usr/sbin/apache2 -k start
5 33 7878 1691 20 0 259936 7696 inet_c S ? 0:00 \_ /usr/sbin/apache2 -k start
5 33 7879 1691 20 0 259936 7696 inet_c S ? 0:00 \_ /usr/sbin/apache2 -k start
5 33 7880 1691 20 0 259936 7696 inet_c S ? 0:00 \_ /usr/sbin/apache2 -k start
4 0 1710 1 20 0 12760 1968 wait_w Ss+ lxc/console 0:00 /sbin/getty -8 38400 console
4 0 1713 1 20 0 12760 1968 wait_w Ss+ lxc/tty2 0:00 /sbin/getty -8 38400 tty2
4 0 1714 1 20 0 12760 1960 wait_w Ss+ lxc/tty1 0:00 /sbin/getty -8 38400 tty1

Thank you so much
Regards
 
Please upgrade your packages, your versions are outdated. There was an issue with lxcfs that is probably at fault here, it was fixed in lxcfs 2.0.0-pve1.

Also, your container setup looks pretty weird: why do you have udevd and upstart components running inside your container?
 
About the container, it's a migration of Vz container to LXC.
But the container wich has been done with the template for LXC does the same, so I did not think it's the issue.

About the upgrade test, I'll try as soon possible.
Thank you so much
Regards