[SOLVED] Proxmox containers not running after apt upgrade

isaacntk · Jul 13, 2020

I recently performed an apt upgrade and my lxc containers stopped working. When starting a container, no error message appears and the web UI responds with "Task OK" but the container doesn't actually start. I should have used apt dist-upgrade instead, but I'm not too sure how to rollback an upgrade and don't have snapshots in place

I found out none of my containers ran after doing a system reboot so I tried pct start 100 also, and no error message was displayed, but trying to pct enter 100 returns "Error: container '100' not running!"

Not entirely sure which package caused it, but this this is the apt/history.log

Code:

# tail /var/log/apt/history.log
Start-Date: 2020-07-11  10:24:37
Commandline: apt upgrade
Install: pve-headers-5.4.44-2-pve:amd64 (5.4.44-2, automatic), proxmox-backup-client:amd64 (0.8.6-1, automatic), pve-kernel-5.4.44-2-pve:amd64 (5.4.44-2, automatic)
Upgrade: proxmox-widget-toolkit:amd64 (2.2-8, 2.2-9), pve-kernel-5.4:amd64 (6.2-3, 6.2-4), corosync:amd64 (3.0.3-pve1, 3.0.4-pve1), libavformat58:amd64 (7:4.1.4-1~deb10u1, 7:4.1.6-1~deb10u1), libcmap4:amd64 (3.0.3-pve1, 3.0.4-pve1), libavfilter7:amd64 (7:4.1.4-1~deb10u1, 7:4.1.6-1~deb10u1), libpve-access-control:amd64 (6.1-1, 6.1-2), libpve-storage-perl:amd64 (6.1-8, 6.2-3), libswresample3:amd64 (7:4.1.4-1~deb10u1, 7:4.1.6-1~deb10u1), libquorum5:amd64 (3.0.3-pve1, 3.0.4-pve1), pve-qemu-kvm:amd64 (5.0.0-4, 5.0.0-9), libmagickwand-6.q16-6:amd64 (8:6.9.10.23+dfsg-2.1, 8:6.9.10.23+dfsg-2.1+deb10u1), pve-container:amd64 (3.1-8, 3.1-10), libpostproc55:amd64 (7:4.1.4-1~deb10u1, 7:4.1.6-1~deb10u1), pve-manager:amd64 (6.2-6, 6.2-9), libvotequorum8:amd64 (3.0.3-pve1, 3.0.4-pve1), libpve-guest-common-perl:amd64 (3.0-10, 3.0-11), libavcodec58:amd64 (7:4.1.4-1~deb10u1, 7:4.1.6-1~deb10u1), libpve-common-perl:amd64 (6.1-3, 6.1-5), libavutil56:amd64 (7:4.1.4-1~deb10u1, 7:4.1.6-1~deb10u1), qemu-server:amd64 (6.2-3, 6.2-8), libcfg7:amd64 (3.0.3-pve1, 3.0.4-pve1), libproxmox-backup-qemu0:amd64 (0.1.6-1, 0.6.1-1), libswscale5:amd64 (7:4.1.4-1~deb10u1, 7:4.1.6-1~deb10u1), libknet1:amd64 (1.15-pve1, 1.16-pve1), libmagickcore-6.q16-6:amd64 (8:6.9.10.23+dfsg-2.1, 8:6.9.10.23+dfsg-2.1+deb10u1), pve-headers-5.4:amd64 (6.2-3, 6.2-4), pve-kernel-helper:amd64 (6.2-3, 6.2-4), libpve-http-server-perl:amd64 (3.0-5, 3.0-6), libcpg4:amd64 (3.0.3-pve1, 3.0.4-pve1), libcorosync-common4:amd64 (3.0.3-pve1, 3.0.4-pve1), imagemagick-6-common:amd64 (8:6.9.10.23+dfsg-2.1, 8:6.9.10.23+dfsg-2.1+deb10u1)
End-Date: 2020-07-11  10:26:03

I tried lxc-start with logs instead, and got these messages:

Code:

# lxc-start -n 100 -F -l DEBUG -o /tmp/lxc-100.log
lxc-start: 100: lsm/apparmor.c: run_apparmor_parser: 892 Failed to run apparmor_parser on "/var/lib/lxc/100/apparmor/lxc-100_<-var-lib-lxc>": apparmor_parser: Unable to replace "lxc-100_</var/lib/lxc>".  Profile doesn't conform to protocol
                                                                                                                                                                                                                                               lxc-start: 100: lsm/apparmor.c: apparmor_prepare: 1064 Failed to load generated AppArmor profile
                            lxc-start: 100: start.c: lxc_init: 845 Failed to initialize LSM
                                                                                           lxc-start: 100: start.c: __lxc_start: 1903 Failed to initialize container "100"
lxc-start: 100: tools/lxc_start.c: main: 308 The container failed to start
lxc-start: 100: tools/lxc_start.c: main: 314 Additional information can be obtained by setting the --logfile and --logpriority options
# tail /tmp/lxc-100.log
lxc-start 100 20200712012140.203 ERROR    start - start.c:lxc_init:845 - Failed to initialize LSM
lxc-start 100 20200712012140.203 ERROR    start - start.c:__lxc_start:1903 - Failed to initialize container "100"
lxc-start 100 20200712012140.203 DEBUG    conf - conf.c:idmaptool_on_path_and_privileged:2642 - The binary "/usr/bin/newuidmap" does have the setuid bit set
lxc-start 100 20200712012140.203 DEBUG    conf - conf.c:idmaptool_on_path_and_privileged:2642 - The binary "/usr/bin/newgidmap" does have the setuid bit set
lxc-start 100 20200712012140.203 DEBUG    conf - conf.c:lxc_map_ids:2710 - Functional newuidmap and newgidmap binary found
lxc-start 100 20200712012140.208 NOTICE   utils - utils.c:lxc_setgroups:1366 - Dropped additional groups
lxc-start 100 20200712012140.208 INFO     conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxc/hooks/lxc-pve-poststop-hook" for container "100", config section "lxc"
lxc-start 100 20200712012140.893 INFO     conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "100", config section "lxc"
lxc-start 100 20200712012141.395 ERROR    lxc_start - tools/lxc_start.c:main:308 - The container failed to start
lxc-start 100 20200712012141.395 ERROR    lxc_start - tools/lxc_start.c:main:314 - Additional information can be obtained by setting the --logfile and --logpriority options

Trying to access the apparmor directory shows that it doesn't exist, could the upgrade have deleted the directory?

Code:

# ls /var/lib/lxc/100/apparmor
ls: cannot access '/var/lib/lxc/100/apparmor': No such file or directory
# ls -l /var/lib/lxc/100/
total 8
-rw-r--r-- 1 root root  977 Jul 12 09:21 config
drwxr-xr-x 2 root root 4096 Jun 15  2019 rootfs

My filesystem is ext4, many issues I found regarding upgrade failures involves zfs but I don't use zfs
I'm not familiar enough with apparmor to go any deeper and also not entirely sure how to use tools/lxc_start.c directly with the --logfile/--logpriority options either, not sure what other logs/config files would be helpful in finding the issue, but here are a few more:

Code:

# pct config 100
arch: amd64
cores: 2
hostname: apache
memory: 512
nameserver: 1.1.1.1
net0: name=eth0,bridge=vmbr0,gw=192.168.0.1,hwaddr=82:B1:0D:3C:47:68,ip=192.168.0.42/16,ip6=dhcp,type=veth
onboot: 1
ostype: ubuntu
parent: upgrade
rootfs: local-lvm:vm-100-disk-0,size=20G
startup: order=1,up=30
swap: 1024
unprivileged: 1

# systemctl status pve-container@100.service
● pve-container@100.service - PVE LXC Container: 100
   Loaded: loaded (/lib/systemd/system/pve-container@.service; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Sun 2020-07-12 09:27:47 +08; 16min ago
     Docs: man:lxc-start
           man:lxc
           man:pct
  Process: 30827 ExecStart=/usr/bin/lxc-start -F -n 100 (code=exited, status=1/FAILURE)
 Main PID: 30827 (code=exited, status=1/FAILURE)
Jul 12 09:27:44 alpha systemd[1]: Started PVE LXC Container: 100.
Jul 12 09:27:47 alpha systemd[1]: pve-container@100.service: Main process exited, code=exited, status=1/FAILURE
Jul 12 09:27:47 alpha systemd[1]: pve-container@100.service: Failed with result 'exit-code'.

# journalctl -xe
-- The job identifier is 100128.
Jul 12 09:50:16 alpha systemd[1]: Started PVE LXC Container: 100.
-- Subject: A start job for unit pve-container@100.service has finished successfully
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pve-container@100.service has finished successfully.
--
-- The job identifier is 100210.
Jul 12 09:50:16 alpha kernel: EXT4-fs (dm-13): mounted filesystem with ordered data mode. Opts: (null)
Jul 12 09:50:17 alpha audit[1534]: AVC apparmor="STATUS" info="failed to unpack end of profile" error=-71 profile="unconfined" name="lxc-100_</var/lib/lxc>" pid=1534 comm="apparmor_parser" name="lxc-100_</var/lib/lxc>" offset=151
Jul 12 09:50:17 alpha kernel: audit: type=1400 audit(1594518617.147:54): apparmor="STATUS" info="failed to unpack end of profile" error=-71 profile="unconfined" name="lxc-100_</var/lib/lxc>" pid=1534 comm="apparmor_parser" name="lxc-100_</var/lib/lxc>" offset=151
Jul 12 09:50:18 alpha systemd[1]: pve-container@100.service: Main process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- An ExecStart= process belonging to unit pve-container@100.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 1.
Jul 12 09:50:18 alpha systemd[1]: pve-container@100.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit pve-container@100.service has entered the 'failed' state with result 'exit-code'.

oguz · Jul 13, 2020

hi,

please post your container configuration (pct config CTID) and debug logs[0] from container start

[0]: https://pve.proxmox.com/pve-docs/chapter-pct.html#_obtaining_debugging_logs

isaacntk · Jul 13, 2020

oguz said:
hi,

please post your container configuration (pct config CTID) and debug logs[0] from container start

[0]: https://pve.proxmox.com/pve-docs/chapter-pct.html#_obtaining_debugging_logs

They are both in the original post, but here they are again pct config 100 :

Code:

# pct config 100
arch: amd64
cores: 2
hostname: apache
memory: 512
nameserver: 1.1.1.1
net0: name=eth0,bridge=vmbr0,gw=192.168.0.1,hwaddr=82:B1:0D:3C:47:68,ip=192.168.0.42/16,ip6=dhcp,type=veth
onboot: 1
ostype: ubuntu
parent: upgrade
rootfs: local-lvm:vm-100-disk-0,size=20G
startup: order=1,up=30
swap: 1024
unprivileged: 1

Debug log /tmp/lxc-100.log :

Code:

lxc-start: 100: lsm/apparmor.c: run_apparmor_parser: 892 Failed to run apparmor_parser on "/var/lib/lxc/100/apparmor/lxc-100_<-var-lib-lxc>": apparmor_parser: Unable to replace "lxc-100_</var/lib/lxc>".  Profile doesn't conform to protocol
                                                                                                                                                                                                                                               lxc-start: 100: lsm/apparmor.c: apparmor_prepare: 1064 Failed to load generated AppArmor profile
                            lxc-start: 100: start.c: lxc_init: 845 Failed to initialize LSM
                                                                                           lxc-start: 100: start.c: __lxc_start: 1903 Failed to initialize container "100"
lxc-start: 100: tools/lxc_start.c: main: 308 The container failed to start
lxc-start: 100: tools/lxc_start.c: main: 314 Additional information can be obtained by setting the --logfile and --logpriority options
# tail /tmp/lxc-100.log
lxc-start 100 20200712012140.203 ERROR    start - start.c:lxc_init:845 - Failed to initialize LSM
lxc-start 100 20200712012140.203 ERROR    start - start.c:__lxc_start:1903 - Failed to initialize container "100"
lxc-start 100 20200712012140.203 DEBUG    conf - conf.c:idmaptool_on_path_and_privileged:2642 - The binary "/usr/bin/newuidmap" does have the setuid bit set
lxc-start 100 20200712012140.203 DEBUG    conf - conf.c:idmaptool_on_path_and_privileged:2642 - The binary "/usr/bin/newgidmap" does have the setuid bit set
lxc-start 100 20200712012140.203 DEBUG    conf - conf.c:lxc_map_ids:2710 - Functional newuidmap and newgidmap binary found
lxc-start 100 20200712012140.208 NOTICE   utils - utils.c:lxc_setgroups:1366 - Dropped additional groups
lxc-start 100 20200712012140.208 INFO     conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxc/hooks/lxc-pve-poststop-hook" for container "100", config section "lxc"
lxc-start 100 20200712012140.893 INFO     conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "100", config section "lxc"
lxc-start 100 20200712012141.395 ERROR    lxc_start - tools/lxc_start.c:main:308 - The container failed to start
lxc-start 100 20200712012141.395 ERROR    lxc_start - tools/lxc_start.c:main:314 - Additional information can be obtained by setting the --logfile and --logpriority options

oguz · Jul 13, 2020

this isn't the full debug log, please see the link i've sent you.

you need to run: lxc-start -n CTID -lDEBUG --logfile /tmp/lxc-CTID.log

isaacntk · Jul 13, 2020

oguz said:
this isn't the full debug log, please see the link i've sent you.

you need to run: lxc-start -n CTID -lDEBUG --logfile /tmp/lxc-CTID.log

My bad, here is the full log from the --logfile command

oguz · Jul 13, 2020

isaacntk said:
Trying to access the apparmor directory shows that it doesn't exist, could the upgrade have deleted the directory?

no, this directory is generated when the container is starting.

could you also post pveversion -v ?

it looks like the apparmor profiles might not be generated correctly. is it possible you changed something in the default apparmor profile?

isaacntk · Jul 13, 2020

I'm quite sure I didn't change anything related to apparmor, pveversion -v:

Code:

proxmox-ve: 6.2-1 (running kernel: 5.6.0-2-rt-amd64)
pve-manager: 6.2-9 (running version: 6.2-9/4d363c5b)
pve-kernel-5.4: 6.2-4
pve-kernel-helper: 6.2-4
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.4.44-1-pve: 5.4.44-1
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-4.15: 5.4-12
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-4.15.18-24-pve: 4.15.18-52
pve-kernel-4.15.18-12-pve: 4.15.18-36
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve2
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-5
libpve-guest-common-perl: 3.0-11
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-9
pve-cluster: 6.1-8
pve-container: 3.1-10
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-9
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-8
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1

oguz · Jul 13, 2020

isaacntk said:
proxmox-ve: 6.2-1 (running kernel: 5.6.0-2-rt-amd64)

this isn't our kernel... have you installed it manually? i suggest you switch to the pve kernel.

if you have to use a custom kernel (which usually isn't necessary) then you need to rebuild apparmor with the kernel headers.

isaacntk · Jul 13, 2020

oguz said:
this isn't our kernel... have you installed it manually? i suggest you switch to the pve kernel.

if you have to use a custom kernel (which usually isn't necessary) then you need to rebuild apparmor with the kernel headers.

Any resources on how I'd go about doing either of those?

This system was installed since proxmox 4.2 through a usb stick, and just perpetually apt upgraded until now, this is the first time I've encountered containers not starting. The few changes I've actually done on the host are network related and installing pve-headers or something because wireguard needed it to run in a container

isaacntk · Jul 13, 2020

My /etc/default/grub has a line change from when I tried to do a pcie passthrough some time ago, but I don't think that would have changed my kernel would it?

Code:

# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox Virtual Environment"
GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on video=efifb:off quiet irqpoll"
GRUB_CMDLINE_LINUX=""

# Disable os-prober, it might add menu entries for each guest
GRUB_DISABLE_OS_PROBER=true

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true

# Disable generation of recovery mode menu entries
GRUB_DISABLE_RECOVERY="true"

# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"

oguz · Jul 13, 2020

which repositories did you configure? is it possible you added unstable debian repositories?

can you check: find /etc/apt -name '*.list' -exec cat {} +

isaacntk · Jul 13, 2020

Oh you are right, I have that repo there for some wireguard packages likely.
/facepalm would just rolling back the packages do anything at this point?

Code:

# find /etc/apt -name '*.list' -exec cat {} +
deb http://deb.debian.org/debian/ unstable main
# deb https://enterprise.proxmox.com/debian/pve buster pve-enterprise
deb http://download.proxmox.com/debian/pve buster pve-no-subscription
deb http://ftp.debian.org/debian buster main contrib

deb http://ftp.debian.org/debian buster-updates main contrib

# security updates
deb http://security.debian.org buster/updates main contrib

oguz · Jul 13, 2020

you'll need to uninstall the kernel which came from unstable, and make sure you're using the pve-kernel instead.

check the output of dpkg -l | grep linux-image to see the exact name of the kernel package. remove these with apt remove <pkgname>.

if you have the pve-kernel package installed (unlikely that you don't), then after a reboot it should use the correct kernel. you can verify this with uname -ar or again pveversion -v (running kernel)

isaacntk · Jul 13, 2020

oguz said:
you'll need to uninstall the kernel which came from unstable, and make sure you're using the pve-kernel instead.

check the output of dpkg -l | grep linux-image to see the exact name of the kernel package. remove these with apt remove <pkgname>.

if you have the pve-kernel package installed (unlikely that you don't), then after a reboot it should use the correct kernel. you can verify this with uname -ar or again pveversion -v (running kernel)

Thanks for your support! After the kernel removal and reboot the containers managed to start up without issue. Any idea how I can prevent non-pve kernels from being installed accidentally again?

oguz · Jul 13, 2020

you're welcome!

isaacntk said:
Any idea how I can prevent non-pve kernels from being installed accidentally again?

yes, try not to add any extra repositories on your pve host. if you need to for some reason, then you can set these to a lower priority so it doesn't override the updates from the pve repositories. take a look at apt pinning[0] for more info on this. basically you need to create/edit the /etc/apt/preferences file.

[0]: https://wiki.debian.org/AptConfiguration#apt_preferences_.28APT_pinning.29

isaacntk · Jul 13, 2020

Oddly enough, I already have a priority set for the unstable packages, I must have manually installed the unstable kernel but have no idea why I would have done that. Thanks again

Code:

# cat /etc/apt/preferences.d/limit-unstable
Package: *
Pin: release a=unstable
Pin-Priority: 90       

# apt-cache policy
Package files:
 100 /var/lib/dpkg/status
     release a=now
  90 http://deb.debian.org/debian unstable/main amd64 Packages
     release o=Debian,a=unstable,n=sid,l=Debian,c=main,b=amd64
     origin deb.debian.org
 500 http://download.proxmox.com/debian/pve buster/pve-no-subscription amd64 Packages
     release o=Proxmox,a=stable,n=buster,l=Proxmox Debian Repository,c=pve-no-subscription,b=amd64
     origin download.proxmox.com
 500 http://security.debian.org buster/updates/main amd64 Packages
     release v=10,o=Debian,a=stable,n=buster,l=Debian-Security,c=main,b=amd64
     origin security.debian.org
 500 http://ftp.debian.org/debian buster-updates/main amd64 Packages
     release o=Debian,a=stable-updates,n=buster-updates,l=Debian,c=main,b=amd64
     origin ftp.debian.org
 500 http://ftp.debian.org/debian buster/contrib amd64 Packages
     release v=10.4,o=Debian,a=stable,n=buster,l=Debian,c=contrib,b=amd64
     origin ftp.debian.org
 500 http://ftp.debian.org/debian buster/main amd64 Packages
     release v=10.4,o=Debian,a=stable,n=buster,l=Debian,c=main,b=amd64
     origin ftp.debian.org
Pinned packages:

Search

Search

[SOLVED] Proxmox containers not running after apt upgrade

isaacntk

Member

oguz

Proxmox Retired Staff

isaacntk

Member

oguz

Proxmox Retired Staff

isaacntk

Member

Attachments

oguz

Proxmox Retired Staff

isaacntk

Member

oguz

Proxmox Retired Staff

isaacntk

Member

isaacntk

Member

oguz

Proxmox Retired Staff

isaacntk

Member

oguz

Proxmox Retired Staff

isaacntk

Member

oguz

Proxmox Retired Staff

isaacntk

Member

We value your privacy