Proxmox 4.4 not running init inside lxc

M-SK

Member
Oct 11, 2016
46
4
13
52
Hello,

As of couple of days ago, lxc containers won't run their designated runlevel (centos 6.x container):

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 19292 2324 ? Ss 10:45 0:00 /sbin/init
root 98 0.0 0.0 11500 2612 lxc/console Ss+ 10:45 0:00 /bin/bash
root 107 0.0 0.0 108376 3112 ? Ss 10:45 0:00 /bin/bash
root 117 0.0 0.0 110252 2296 ? R+ 10:45 0:00 ps aux

If I run i.e "init 3" it proceeds to init normally without a hitch. Other hosts are running on the same version and init the same container without issues.

Host version:

proxmox-ve: 4.4-86 (running kernel: 4.4.49-1-pve)
pve-manager: 4.4-13 (running version: 4.4-13/7ea56165)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.4.49-1-pve: 4.4.49-86
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-49
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-94
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-97
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
 
As of couple of days ago, lxc containers won't run their designated runlevel (centos 6.x container):

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 19292 2324 ? Ss 10:45 0:00 /sbin/init
root 98 0.0 0.0 11500 2612 lxc/console Ss+ 10:45 0:00 /bin/bash
root 107 0.0 0.0 108376 3112 ? Ss 10:45 0:00 /bin/bash
root 117 0.0 0.0 110252 2296 ? R+ 10:45 0:00 ps aux

If I run i.e "init 3" it proceeds to init normally without a hitch. Other hosts are running on the same version and init the same container without issues.

This looks like a problem from inside this specific container, imo, as LXC (simplified) just executes /sbin/init which then has control over the container environment.
LXC does not dictates the containers init process what to do or in which "runlevel" to run. If you get the CT up with `init 3` chances are that it is just not configured to go to this level automatically.

Any changes in the CT, new packages which may introduce init changes? I'm a bit unfamiliar with CentOS 6's init system and it doesn't help that it's upstart which is now more or less out of support by the ones who made it.
 
Ok, so maybe a bit of background. We have a 4 node cluster. This is happening on one of them since a recent reboot of the host. Before that reboot there were no issues with containers.

These containers init properly when migrated to other hosts. Whenever I migrate a CT over to the host in question, it does not init properly until I "init 3" manually. I just did a quick migration on a test dev server:
First migrate to a non-affected host:

ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.2 19292 2244 ? Ss 13:37 0:00 /sbin/init
root 92 0.0 0.1 10720 1144 ? Ss 13:37 0:00 /sbin/udevd -d
root 385 0.0 0.2 177460 2492 ? Sl 13:37 0:00 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
root 397 0.0 1.5 175712 16256 ? Ss 13:37 0:00 /var/www/bin/httpd
root 407 0.0 0.2 66172 2476 ? Ss 13:37 0:00 /usr/sbin/sshd
daemon 414 0.0 1.0 530212 10552 ? Sl 13:37 0:00 /var/www/bin/httpd
daemon 416 0.0 1.0 530212 10556 ? Sl 13:37 0:00 /var/www/bin/httpd
daemon 434 0.0 1.0 530212 10552 ? Sl 13:37 0:00 /var/www/bin/httpd
root 498 0.0 0.1 21772 2056 ? Ss 13:37 0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
root 532 0.0 0.2 108228 3008 ? S 13:37 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/my
mysql 616 0.6 2.2 358620 23592 ? Sl 13:37 0:00 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/lib/mysql/p
root 622 0.0 0.2 108364 3080 ? Ss 13:37 0:00 /bin/bash
root 654 0.0 0.1 66400 1784 ? Ss 13:37 0:00 /usr/sbin/saslauthd -m /var/run/saslauthd -a pam -n 2
root 655 0.0 0.0 66400 660 ? S 13:37 0:00 /usr/sbin/saslauthd -m /var/run/saslauthd -a pam -n 2
root 670 0.0 0.3 82604 3824 ? Ss 13:37 0:00 sendmail: accepting connections
smmsp 678 0.0 0.3 78196 3696 ? Ss 13:37 0:00 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
root 686 0.0 0.2 116848 2248 ? Ss 13:37 0:00 crond
root 696 0.0 0.1 4124 1416 lxc/tty1 Ss+ 13:37 0:00 /sbin/mingetty /dev/tty1
root 698 0.0 0.1 4124 1448 lxc/tty2 Ss+ 13:37 0:00 /sbin/mingetty /dev/tty2
root 699 0.0 0.2 110244 2356 ? R+ 13:37 0:00 ps aux

Second migrate, over to the affected host:

ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.2 19292 2304 ? Ss 13:41 0:00 /sbin/init
root 92 0.0 0.1 10720 1144 ? Ss 13:41 0:00 /sbin/udevd -d
root 251 0.0 0.2 108364 3140 ? Ss 13:42 0:00 /bin/bash
root 261 0.0 0.2 110244 2356 ? R+ 13:42 0:00 ps aux

That's about it. I have no idea what is causing this and how to correct this behaviour...
 
can you post your container config, and pveversion -v on a working and a non-working host?
 
These containers init properly when migrated to other hosts. Whenever I migrate a CT over to the host in question, it does not init properly until I "init 3" manually. I just did a quick migration on a test dev server:

Hmm, ok that is strange. I tried a bit to reproduce this, sadly no luck..

Do you get any strange log messages during the container start on the PVE Host nodes journal?:
Code:
journalctl -f # for following the journal live

Something with apparmor maybe.

Also what does `runlevel` show in the CT before you do a manual `init 3`?
 
It is strange to be sure.

journalctl:


Jun 08 09:46:50 03 kernel: EXT4-fs (dm-7): mounted filesystem with ordered data mode. Opts: (null)
Jun 08 09:46:50 03 kernel: IPv6: ADDRCONF(NETDEV_UP): veth114i0: link is not ready
Jun 08 09:46:51 03 kernel: device veth114i0 entered promiscuous mode
Jun 08 09:46:51 03 kernel: eth0: renamed from vethSMIJM6
Jun 08 09:46:53 03 systemd[1]: Started LXC Container: 114.
Jun 08 09:46:53 03 pct[37088]: <root@pam> end task UPID:nthl03:000090E1:07AA6C63:593900E8:vzstart:114:root@pam: OK

runlevel is reported as 1 S, inittab configured as id:3:initdefault:
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!