[SOLVED] Container fails to start after upgrade from 6 to 7

xilni

Member
Nov 12, 2019
5
1
6
Recently performed an upgrade from 5.4 to 6 and everything seemed alright, then did one from 6 to 7 and now containers are failing to load.

➜ ~ lxc-start -n 100 -lDEBUG --logfile 100_fail.log
Code:
lxc-start 100 20211227045635.919 ERROR    apparmor - lsm/apparmor.c:run_apparmor_parser:915 - Failed to run apparmor_parser on "/var/lib/lxc/100/apparmor/lxc-100_<-var-lib-lxc>": apparmor_parser: Unable to replace "lxc-100_</var/lib/lxc>".  Profile doesn't conform to protocol
lxc-start 100 20211227045635.920 ERROR    apparmor - lsm/apparmor.c:apparmor_prepare:1085 - Failed to load generated AppArmor profile
lxc-start 100 20211227045635.920 ERROR    start - start.c:lxc_init:878 - Failed to initialize LSM
lxc-start 100 20211227045635.920 ERROR    start - start.c:__lxc_start:2002 - Failed to initialize container "100"
lxc-start 100 20211227045637.215 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:859 - No such file or directory - Failed to receive the container state
lxc-start 100 20211227045637.216 ERROR    lxc_start - tools/lxc_start.c:main:306 - The container failed to start
lxc-start 100 20211227045637.216 ERROR    lxc_start - tools/lxc_start.c:main:309 - To get more details, run the container in foreground mode
lxc-start 100 20211227045637.216 ERROR    lxc_start - tools/lxc_start.c:main:311 - Additional information can be obtained by setting the --logfile and --logpriority options
lxc-start 100 20211227050117.639 INFO     confile - confile.c:set_config_idmaps:2112 - Read uid map: type u nsid 0 hostid 100000 range 65536
lxc-start 100 20211227050117.639 INFO     confile - confile.c:set_config_idmaps:2112 - Read uid map: type g nsid 0 hostid 100000 range 65536
lxc-start 100 20211227050117.640 INFO     lxccontainer - lxccontainer.c:do_lxcapi_start:987 - Set process title to [lxc monitor] /var/lib/lxc 100
lxc-start 100 20211227050117.641 DEBUG    lxccontainer - lxccontainer.c:wait_on_daemonized_start:848 - First child 11243 exited
lxc-start 100 20211227050117.641 INFO     lsm - lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver AppArmor
lxc-start 100 20211227050117.641 INFO     conf - conf.c:run_script_argv:337 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "100", config section "lxc"
lxc-start 100 20211227050118.602 DEBUG    seccomp - seccomp.c:parse_config_v2:656 - Host native arch is [3221225534]
lxc-start 100 20211227050118.602 INFO     seccomp - seccomp.c:parse_config_v2:807 - Processing "reject_force_umount  # comment this to allow umount -f;  not recommended"
lxc-start 100 20211227050118.602 INFO     seccomp - seccomp.c:do_resolve_add_rule:524 - Set seccomp rule to reject force umounts
lxc-start 100 20211227050118.602 INFO     seccomp - seccomp.c:do_resolve_add_rule:524 - Set seccomp rule to reject force umounts
lxc-start 100 20211227050118.602 INFO     seccomp - seccomp.c:do_resolve_add_rule:524 - Set seccomp rule to reject force umounts
lxc-start 100 20211227050118.602 INFO     seccomp - seccomp.c:parse_config_v2:807 - Processing "[all]"
lxc-start 100 20211227050118.602 INFO     seccomp - seccomp.c:parse_config_v2:807 - Processing "kexec_load errno 1"
lxc-start 100 20211227050118.602 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[246:kexec_load] action[327681:errno] arch[0]
lxc-start 100 20211227050118.602 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[246:kexec_load] action[327681:errno] arch[1073741827]
lxc-start 100 20211227050118.602 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[246:kexec_load] action[327681:errno] arch[1073741886]
lxc-start 100 20211227050118.602 INFO     seccomp - seccomp.c:parse_config_v2:807 - Processing "open_by_handle_at errno 1"
lxc-start 100 20211227050118.602 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[304:open_by_handle_at] action[327681:errno] arch[0]
lxc-start 100 20211227050118.602 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[304:open_by_handle_at] action[327681:errno] arch[1073741827]
lxc-start 100 20211227050118.602 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[304:open_by_handle_at] action[327681:errno] arch[1073741886]
lxc-start 100 20211227050118.602 INFO     seccomp - seccomp.c:parse_config_v2:807 - Processing "init_module errno 1"
lxc-start 100 20211227050118.602 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[175:init_module] action[327681:errno] arch[0]
lxc-start 100 20211227050118.602 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[175:init_module] action[327681:errno] arch[1073741827]
lxc-start 100 20211227050118.602 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[175:init_module] action[327681:errno] arch[1073741886]
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:parse_config_v2:807 - Processing "finit_module errno 1"
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[313:finit_module] action[327681:errno] arch[0]
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[313:finit_module] action[327681:errno] arch[1073741827]
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[313:finit_module] action[327681:errno] arch[1073741886]
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:parse_config_v2:807 - Processing "delete_module errno 1"
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[176:delete_module] action[327681:errno] arch[0]
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[176:delete_module] action[327681:errno] arch[1073741827]
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[176:delete_module] action[327681:errno] arch[1073741886]
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:parse_config_v2:807 - Processing "ioctl errno 1 [1,0x9400,SCMP_CMP_MASKED_EQ,0xff00]"
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:do_resolve_add_rule:547 - arg_cmp[0]: SCMP_CMP(1, 7, 65280, 37888)
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[16:ioctl] action[327681:errno] arch[0]
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:do_resolve_add_rule:547 - arg_cmp[0]: SCMP_CMP(1, 7, 65280, 37888)
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[16:ioctl] action[327681:errno] arch[1073741827]
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:do_resolve_add_rule:547 - arg_cmp[0]: SCMP_CMP(1, 7, 65280, 37888)
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[16:ioctl] action[327681:errno] arch[1073741886]
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:parse_config_v2:807 - Processing "keyctl errno 38"
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[250:keyctl] action[327718:errno] arch[0]
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[250:keyctl] action[327718:errno] arch[1073741827]
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[250:keyctl] action[327718:errno] arch[1073741886]
lxc-start 100 20211227050118.603 INFO     seccomp - seccomp.c:parse_config_v2:1017 - Merging compat seccomp contexts into main context
lxc-start 100 20211227050119.344 ERROR    apparmor - lsm/apparmor.c:run_apparmor_parser:915 - Failed to run apparmor_parser on "/var/lib/lxc/100/apparmor/lxc-100_<-var-lib-lxc>": apparmor_parser: Unable to replace "lxc-100_</var/lib/lxc>".  Profile doesn't conform to protocol
lxc-start 100 20211227050119.345 ERROR    apparmor - lsm/apparmor.c:apparmor_prepare:1085 - Failed to load generated AppArmor profile
lxc-start 100 20211227050119.345 ERROR    start - start.c:lxc_init:878 - Failed to initialize LSM
lxc-start 100 20211227050119.345 ERROR    start - start.c:__lxc_start:2002 - Failed to initialize container "100"
lxc-start 100 20211227050119.345 WARN     cgfsng - cgroups/cgfsng.c:cgfsng_payload_destroy:548 - Uninitialized limit cgroup
lxc-start 100 20211227050119.345 WARN     cgfsng - cgroups/cgfsng.c:cgfsng_monitor_destroy:868 - Uninitialized monitor cgroup
lxc-start 100 20211227050119.345 INFO     conf - conf.c:run_script_argv:337 - Executing script "/usr/share/lxc/hooks/lxc-pve-poststop-hook" for container "100", config section "lxc"
lxc-start 100 20211227050120.354 INFO     conf - conf.c:run_script_argv:337 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "100", config section "lxc"
lxc-start 100 20211227050120.857 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:859 - No such file or directory - Failed to receive the container state
lxc-start 100 20211227050120.857 ERROR    lxc_start - tools/lxc_start.c:main:306 - The container failed to start
lxc-start 100 20211227050120.857 ERROR    lxc_start - tools/lxc_start.c:main:309 - To get more details, run the container in foreground mode
lxc-start 100 20211227050120.857 ERROR    lxc_start - tools/lxc_start.c:main:311 - Additional information can be obtained by setting the --logfile and --logpriority options

The only thing that jumps out at me is that there's no /var/lib/lxc/100/apparmor that the first line in the debug code is looking for:

Code:
➜  ~ tree /var/lib/lxc/100
/var/lib/lxc/100
├── config
├── rootfs
└── rules.seccomp

This somewhat resembles this old support post but I'm not running a custom kernel as far as I can tell. Version details and all below.

Container info:
Code:
➜  ~ pct config 100
arch: amd64
cores: 8
cpulimit: 8
hostname: tychus
memory: 16384
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.1.1,hwaddr=9E:6F:04:45:39:6A,ip=192.168.1.244/24,type=veth
ostype: ubuntu
rootfs: local-lvm:vm-100-disk-0,size=60G
swap: 4096
unprivileged: 1

Version info:
Code:
➜  ~ pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.15.0-2-rt-amd64)
pve-manager: 7.1-8 (running version: 7.1-8/5b267f33)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.4: 6.4-11
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.4.157-1-pve: 5.4.157-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-2
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

Thanks for any help you can offer with this.
 
Last edited:
This somewhat resembles this old support post but I'm not running a custom kernel as far as I can tell.
Hmm, but it seems you do:
proxmox-ve: 7.1-1 (running kernel: 5.15.0-2-rt-amd64)

The 5.15.0-2-rt-amd64 isn't one that's coming from us, those have pve (or in the future also proxmox) encoded in the name.

It seems you pulled in some 5.15 based real-time enabled kernel when upgrading from 6 to 7, and as that is newer it's get used by default. I'd recommend removing that kernel, real-time kernels are not that good of a fit for a hyper-visor anyway.

Can you please post the output of the following two commands:
Bash:
apt list --installed | grep linux
head -n-0 /etc/apt/sources.list /etc/apt/sources.list.d/*.list

Above should allow to determine which package you'd need to remove to ensure the Proxmox VE kernel can be booted again.
(A guess would be the linux-image-rt-amd64 meta-package, but that is on 5.10 and not 5.15 in Debian Bullseye, that's why I'd like to see the configured repos too)
 
  • Like
Reactions: xilni
Hmm, but it seems you do:


The 5.15.0-2-rt-amd64 isn't one that's coming from us, those have pve (or in the future also proxmox) encoded in the name.

It seems you pulled in some 5.15 based real-time enabled kernel when upgrading from 6 to 7, and as that is newer it's get used by default. I'd recommend removing that kernel, real-time kernels are not that good of a fit for a hyper-visor anyway.

Can you please post the output of the following two commands:
Bash:
apt list --installed | grep linux
head -n-0 /etc/apt/sources.list /etc/apt/sources.list.d/*.list

Above should allow to determine which package you'd need to remove to ensure the Proxmox VE kernel can be booted again.
(A guess would be the linux-image-rt-amd64 meta-package, but that is on 5.10 and not 5.15 in Debian Bullseye, that's why I'd like to see the configured repos too)

Code:
➜  ~ apt list --installed | grep linux

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

console-setup-linux/stable,now 1.205 all [installed]
liblinux-inotify2-perl/stable,now 1:2.2-2+b1 amd64 [installed]
libnvpair3linux/stable,now 2.1.1-pve3 amd64 [installed,automatic]
libselinux1/stable,now 3.1-3 amd64 [installed]
libuutil3linux/stable,now 2.1.1-pve3 amd64 [installed,automatic]
libzfs4linux/stable,now 2.1.1-pve3 amd64 [installed,automatic]
libzpool5linux/stable,now 2.1.1-pve3 amd64 [installed,automatic]
linux-base/stable,now 4.6 all [installed]
linux-image-5.15.0-2-rt-amd64/now 5.15.5-2 amd64 [installed,local]
linux-image-rt-amd64/now 5.15.5-2 amd64 [installed,local]
linux-libc-dev/stable,now 5.10.84-1 amd64 [installed]
util-linux/stable,now 2.36.1-8 amd64 [installed]
zfsutils-linux/stable,now 2.1.1-pve3 amd64 [installed]

Code:
➜  ~ head -n-0 /etc/apt/sources.list /etc/apt/sources.list.d/*.list
==> /etc/apt/sources.list <==
deb http://ftp.us.debian.org/debian bullseye main contrib

deb http://ftp.us.debian.org/debian bullseye-updates main contrib

# security updates
deb http://security.debian.org bullseye-security main contrib

deb http://download.proxmox.com/debian bullseye pve-no-subscription

==> /etc/apt/sources.list.d/pve-enterprise.list <==
# deb https://enterprise.proxmox.com/debian/pve bullseye pve-enterprise
 
Ok, sources look all right, and the real-time (rt) linux image is indeed installed, can you try to remove that one now?

Bash:
apt remove linux-image-5.15.0-2-rt-amd64 linux-image-rt-amd64

I'd recommend to check if it removes nothing else important looking.
 
  • Like
Reactions: xilni
Ok, sources look all right, and the real-time (rt) linux image is indeed installed, can you try to remove that one now?

Bash:
apt remove linux-image-5.15.0-2-rt-amd64 linux-image-rt-amd64

I'd recommend to check if it removes nothing else important looking.
It's a little busier than I expected:

Code:
➜  ~ apt remove linux-image-5.15.0-2-rt-amd64 linux-image-rt-amd64
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages were automatically installed and are no longer required:
  dctrl-tools dkms sudo wireguard-dkms wireguard-tools
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
  binutils binutils-common binutils-x86-64-linux-gnu build-essential cpp cpp-10 dctrl-tools dkms dpkg-dev fakeroot g++ g++-10 gcc gcc-10
  libalgorithm-diff-perl libalgorithm-diff-xs-perl libalgorithm-merge-perl libasan6 libatomic1 libbinutils libcc1-0 libctf-nobfd0 libctf0
  libdpkg-perl libfakeroot libfile-fcntllock-perl libgcc-10-dev libgomp1 libisl23 libitm1 liblsan0 libmpc3 libmpfr6 libstdc++-10-dev
  libtsan0 libubsan1 lsb-release make sudo wireguard-dkms
Suggested packages:
  binutils-doc cpp-doc gcc-10-locales debtags menu debian-keyring g++-multilib g++-10-multilib gcc-10-doc gcc-multilib autoconf automake
  libtool flex bison gdb gcc-doc gcc-10-multilib bzr libstdc++-10-doc make-doc
Recommended packages:
  wireguard
The following packages will be REMOVED:
  linux-image-5.15.0-2-rt-amd64 linux-image-rt-amd64 wireguard
The following NEW packages will be installed:
  binutils binutils-common binutils-x86-64-linux-gnu build-essential cpp cpp-10 dctrl-tools dkms dpkg-dev fakeroot g++ g++-10 gcc gcc-10
  libalgorithm-diff-perl libalgorithm-diff-xs-perl libalgorithm-merge-perl libasan6 libatomic1 libbinutils libcc1-0 libctf-nobfd0 libctf0
  libdpkg-perl libfakeroot libfile-fcntllock-perl libgcc-10-dev libgomp1 libisl23 libitm1 liblsan0 libmpc3 libmpfr6 libstdc++-10-dev
  libtsan0 libubsan1 lsb-release make sudo wireguard-dkms
0 upgraded, 40 newly installed, 3 to remove and 0 not upgraded.
Need to get 58.3 MB of archives.
After this operation, 167 MB disk space will be freed.
Do you want to continue? [Y/n]

Safe to proceed?
 
Last edited:
I'd actually also remove wireguard-dkms, as that's not required with kernels newer than 5.6 when wireguard got included in the mainline kernel and installing it pulls in quite a bit of the kernel module build dependencies.

Bash:
apt remove linux-image-5.15.0-2-rt-amd64 linux-image-rt-amd64 wireguard-dkms

and then maybe mark the wireguard tools as explicitly installed to avoid auto-removal: apt install wireguard-tools
 
  • Like
Reactions: xilni
I'd actually also remove wireguard-dkms, as that's not required with kernels newer than 5.6 when wireguard got included in the mainline kernel and installing it pulls in quite a bit of the kernel module build dependencies.

Bash:
apt remove linux-image-5.15.0-2-rt-amd64 linux-image-rt-amd64 wireguard-dkms

and then maybe mark the wireguard tools as explicitly installed to avoid auto-removal: apt install wireguard-tools

Got the following warning message, which kernel should I install and switch to before uninstalling the rt one?
 

Attachments

  • 2021-12-27_15-39-50_WindowsTerminal_screenshot.png
    2021-12-27_15-39-50_WindowsTerminal_screenshot.png
    33.5 KB · Views: 10
Turns out the 5.13 pve kernel was still installed and uninstalling the rt one switched to 5.13.

Code:
~ apt remove linux-image-5.15.0-2-rt-amd64 linux-image-rt-amd64 wireguard-dkms
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package 'wireguard-dkms' is not installed, so not removed
Package 'linux-image-rt-amd64' is not installed, so not removed
The following packages will be REMOVED:
  linux-image-5.15.0-2-rt-amd64
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
After this operation, 379 MB disk space will be freed.
Do you want to continue? [Y/n]
(Reading database ... 95283 files and directories currently installed.)
Removing linux-image-5.15.0-2-rt-amd64 (5.15.5-2) ...
W: Removing the running kernel
I: /vmlinuz.old is now a symlink to boot/vmlinuz-5.4.157-1-pve
I: /initrd.img.old is now a symlink to boot/initrd.img-5.4.157-1-pve
I: /vmlinuz is now a symlink to boot/vmlinuz-5.13.19-2-pve
I: /initrd.img is now a symlink to boot/initrd.img-5.13.19-2-pve
/etc/kernel/postrm.d/initramfs-tools:
update-initramfs: Deleting /boot/initrd.img-5.15.0-2-rt-amd64
/etc/kernel/postrm.d/zz-proxmox-boot:
Re-executing '/etc/kernel/postrm.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
/etc/kernel/postrm.d/zz-update-grub:
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.13.19-2-pve
Found initrd image: /boot/initrd.img-5.13.19-2-pve
Found linux image: /boot/vmlinuz-5.4.157-1-pve
Found initrd image: /boot/initrd.img-5.4.157-1-pve
Found linux image: /boot/vmlinuz-5.4.73-1-pve
Found initrd image: /boot/initrd.img-5.4.73-1-pve
Found memtest86+ image: /boot/memtest86+.bin
Found memtest86+ multiboot image: /boot/memtest86+_multiboot.bin
done

A reboot and now all my containers work again.

Code:
➜  ~ uname -r
5.13.19-2-pve

Thank you!
 
Last edited:
  • Like
Reactions: t.lamprecht