Containers won't start after upgrade to PVE 6.0

StefanC

Member
Oct 19, 2016
8
0
21
48
I did an upgrade from 5.4 to 6.0 this afternoon and everything seemed to go well during the process. After a reboot the KVM machines starts fine but the existing containers refuse to start. I can create new containers and they work fine but none of the old start.

proxmox-ve: 6.0-2 (running kernel: 5.0.18-1-pve)
pve-manager: 6.0-5 (running version: 6.0-5/f8a710d7)
pve-kernel-5.0: 6.0-6
pve-kernel-helper: 6.0-6
pve-kernel-4.15: 5.4-7
pve-kernel-4.13: 5.2-2
pve-kernel-5.0.18-1-pve: 5.0.18-1
pve-kernel-4.15.18-19-pve: 4.15.18-45
pve-kernel-4.15.18-15-pve: 4.15.18-40
pve-kernel-4.15.18-13-pve: 4.15.18-37
pve-kernel-4.15.18-12-pve: 4.15.18-36
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.18-1-pve: 4.15.18-19
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.16-3-pve: 4.13.16-50
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-3-pve: 4.13.13-34
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.4.83-1-pve: 4.4.83-96
pve-kernel-4.4.79-1-pve: 4.4.79-95
pve-kernel-4.4.76-1-pve: 4.4.76-94
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.62-1-pve: 4.4.62-88
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.49-1-pve: 4.4.49-86
pve-kernel-4.4.44-1-pve: 4.4.44-84
pve-kernel-4.4.40-1-pve: 4.4.40-82
pve-kernel-4.4.35-2-pve: 4.4.35-79
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.19-1-pve: 4.4.19-66
ceph-fuse: 12.2.11+dfsg1-2.1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-3
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-6
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-61
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-4
pve-container: 3.0-5
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-6
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1

Focusing on one of the containers to begin with:

arch: amd64
cores: 1
description: www
hostname: www
memory: 1024
nameserver: 192.168.0.1
net0: name=eth0,bridge=vmbr0,gw=192.168.0.1,hwaddr=DA:6B:36:73:BA:E2,ip=192.168.0.32/24,type=veth
onboot: 1
ostype: ubuntu
rootfs: local-zfs:subvol-101-disk-2,size=10G
searchdomain: somedomain.se
swap: 512

pve-container@101.service - PVE LXC Container: 101
Loaded: loaded (/lib/systemd/system/pve-container@.service; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2019-08-01 20:38:30 CEST; 1h 4min ago
Docs: man:lxc-start
man:lxc
man:pct
Process: 5037 ExecStart=/usr/bin/lxc-start -n 101 (code=exited, status=1/FAILURE)

Aug 01 20:38:29 pve systemd[1]: Starting PVE LXC Container: 101...
Aug 01 20:38:30 pve lxc-start[5037]: lxc-start: 101: lxccontainer.c: wait_on_daemonized_start: 856 No such file or directory - Failed to receive the container state
Aug 01 20:38:30 pve lxc-start[5037]: lxc-start: 101: tools/lxc_start.c: main: 330 The container failed to start
Aug 01 20:38:30 pve lxc-start[5037]: lxc-start: 101: tools/lxc_start.c: main: 333 To get more details, run the container in foreground mode
Aug 01 20:38:30 pve lxc-start[5037]: lxc-start: 101: tools/lxc_start.c: main: 336 Additional information can be obtained by setting the --logfile and --logpriority options
Aug 01 20:38:30 pve systemd[1]: pve-container@101.service: Control process exited, code=exited, status=1/FAILURE
Aug 01 20:38:30 pve systemd[1]: pve-container@101.service: Failed with result 'exit-code'.
Aug 01 20:38:30 pve systemd[1]: Failed to start PVE LXC Container: 101.

Aug 01 20:38:30 pve systemd[1]: pve-container@101.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit pve-container@101.service has entered the 'failed' state with result 'exit-code'.
Aug 01 20:38:30 pve systemd[1]: Failed to start PVE LXC Container: 101.
-- Subject: A start job for unit pve-container@101.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pve-container@101.service has finished with a failure.
--
-- The job identifier is 249 and the job result is failed.
Aug 01 20:38:30 pve pvestatd[4637]: unable to get PID for CT 101 (not running?)
Aug 01 20:38:30 pve pve-guests[5035]: command 'systemctl start pve-container@101' failed: exit code 1
Aug 01 20:38:30 pve pvesh[4707]: Starting CT 101 failed: command 'systemctl start pve-container@101' failed: exit code 1
Aug 01 20:38:30 pve pvesh[4707]: Starting CT 102
Aug 01 20:38:30 pve pve-guests[4848]: <root@pam> starting task UPID:pve:00001402:000010F0:5D4331A6:vzstart:102:root@pam:
Aug 01 20:38:30 pve pve-guests[5122]: starting CT 102: UPID:pve:00001402:000010F0:5D4331A6:vzstart:102:root@pam:
Aug 01 20:38:30 pve systemd[1]: Starting PVE LXC Container: 102...
-- Subject: A start job for unit pve-container@102.service has begun execution
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pve-container@102.service has begun execution.
--
-- The job identifier is 253.
Aug 01 20:38:30 pve systemd-timesyncd[3127]: Synchronized to time server for the first time 194.58.204.148:123 (2.debian.pool.ntp.org).
Aug 01 20:38:31 pve lxc-start[5124]: lxc-start: 102: lxccontainer.c: wait_on_daemonized_start: 856 No such file or directory - Failed to receive the container state
Aug 01 20:38:31 pve lxc-start[5124]: lxc-start: 102: tools/lxc_start.c: main: 330 The container failed to start
Aug 01 20:38:31 pve lxc-start[5124]: lxc-start: 102: tools/lxc_start.c: main: 333 To get more details, run the container in foreground mode
Aug 01 20:38:31 pve lxc-start[5124]: lxc-start: 102: tools/lxc_start.c: main: 336 Additional information can be obtained by setting the --logfile and --logpriority options

lxc-start: 101: conf.c: run_buffer: 335 Script exited with status 2
lxc-start: 101: start.c: lxc_init: 861 Failed to run lxc.hook.pre-start for container "101"
lxc-start: 101: start.c: __lxc_start: 1944 Failed to initialize container "101"
lxc-start: 101: tools/lxc_start.c: main: 330 The container failed to start
lxc-start: 101: tools/lxc_start.c: main: 336 Additional information can be obtained by setting the --logfile and --logpriority options

lxc-start 101 20190801194828.451 INFO lsm - lsm/lsm.c:lsm_init:50 - LSM security driver AppArmor
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:759 - Processing "reject_force_umount # comment this to allow umount -f; not recommended"
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:do_resolve_add_rule:505 - Set seccomp rule to reject force umounts
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:937 - Added native rule for arch 0 for reject_force_umount action 0(kill)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:do_resolve_add_rule:505 - Set seccomp rule to reject force umounts
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:946 - Added compat rule for arch 1073741827 for reject_force_umount action 0(kill)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:do_resolve_add_rule:505 - Set seccomp rule to reject force umounts
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:956 - Added compat rule for arch 1073741886 for reject_force_umount action 0(kill)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:do_resolve_add_rule:505 - Set seccomp rule to reject force umounts
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:966 - Added native rule for arch -1073741762 for reject_force_umount action 0(kill)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:759 - Processing "[all]"
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:759 - Processing "kexec_load errno 1"
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:937 - Added native rule for arch 0 for kexec_load action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:946 - Added compat rule for arch 1073741827 for kexec_load action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:956 - Added compat rule for arch 1073741886 for kexec_load action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:966 - Added native rule for arch -1073741762 for kexec_load action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:759 - Processing "open_by_handle_at errno 1"
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:937 - Added native rule for arch 0 for open_by_handle_at action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:946 - Added compat rule for arch 1073741827 for open_by_handle_at action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:956 - Added compat rule for arch 1073741886 for open_by_handle_at action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:966 - Added native rule for arch -1073741762 for open_by_handle_at action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:759 - Processing "init_module errno 1"
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:937 - Added native rule for arch 0 for init_module action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:946 - Added compat rule for arch 1073741827 for init_module action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:956 - Added compat rule for arch 1073741886 for init_module action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:966 - Added native rule for arch -1073741762 for init_module action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:759 - Processing "finit_module errno 1"
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:937 - Added native rule for arch 0 for finit_module action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:946 - Added compat rule for arch 1073741827 for finit_module action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:956 - Added compat rule for arch 1073741886 for finit_module action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:966 - Added native rule for arch -1073741762 for finit_module action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:759 - Processing "delete_module errno 1"
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:937 - Added native rule for arch 0 for delete_module action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:946 - Added compat rule for arch 1073741827 for delete_module action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:956 - Added compat rule for arch 1073741886 for delete_module action 327681(errno)
lxc-start 101 20190801194828.451 INFO seccomp - seccomp.c:parse_config_v2:966 - Added native rule for arch -1073741762 for delete_module action 327681(errno)
lxc-start 101 20190801194828.452 INFO seccomp - seccomp.c:parse_config_v2:970 - Merging compat seccomp contexts into main context
lxc-start 101 20190801194828.452 INFO conf - conf.c:run_script_argv:356 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "101", config section "lxc"
lxc-start 101 20190801194829.240 DEBUG conf - conf.c:run_buffer:326 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 101 lxc pre-start with output: unable to detect OS distribution

lxc-start 101 20190801194829.254 ERROR conf - conf.c:run_buffer:335 - Script exited with status 2
lxc-start 101 20190801194829.254 ERROR start - start.c:lxc_init:861 - Failed to run lxc.hook.pre-start for container "101"
lxc-start 101 20190801194829.254 ERROR start - start.c:__lxc_start:1944 - Failed to initialize container "101"
lxc-start 101 20190801194829.254 ERROR lxc_start - tools/lxc_start.c:main:330 - The container failed to start
lxc-start 101 20190801194829.255 ERROR lxc_start - tools/lxc_start.c:main:336 - Additional information can be obtained by setting the --logfile and --logpriority options

The container storage is on a ZFS pool. It is possible to mount the storage but it is empty except a lone /dev folder:
Code:
root@pve:/# pct mount 101
mounted CT 101 in '/var/lib/lxc/101/rootfs'
root@pve:/# ls -al /var/lib/lxc/101/rootfs
total 2
drwxr----- 3 root root 3 Aug  1 19:37 .
drwxr-xr-x 3 root root 4 Aug  1 20:38 ..
drwxr-xr-x 2 root root 2 Aug  1 19:37 dev
root@pve:/#

However, the ZFS filesystem still shows correct used data:
Code:
root@pve:/# zfs list rpool/data/subvol-101-disk-2
NAME                           USED  AVAIL     REFER  MOUNTPOINT
rpool/data/subvol-101-disk-2  1.07G  8.93G     1.07G  /rpool/data/subvol-101-disk-2

Any ideas?
 
Update: There is something fishy going on with ZFS:

Code:
root@pve:/rpool# zfs list -r -o name,mountpoint,mounted
NAME                            MOUNTPOINT                     MOUNTED
rpool                           /rpool                              no
rpool/ROOT                      /rpool/ROOT                         no
rpool/ROOT/pve-1                /                                  yes
rpool/data                      /rpool/data                         no
rpool/data/subvol-101-disk-2    /rpool/data/subvol-101-disk-2       no
rpool/data/subvol-102-disk-2    /rpool/data/subvol-102-disk-2       no
rpool/data/subvol-103-disk-2    /rpool/data/subvol-103-disk-2       no
rpool/data/subvol-104-disk-1    /rpool/data/subvol-104-disk-1       no
rpool/data/subvol-105-disk-2    /rpool/data/subvol-105-disk-2       no
rpool/data/subvol-106-disk-1    /rpool/data/subvol-106-disk-1       no
rpool/data/subvol-110-disk-1    /rpool/data/subvol-110-disk-1       no
rpool/data/subvol-111-disk-0    /rpool/data/subvol-111-disk-0       no
rpool/data/subvol-114-disk-2    /rpool/data/subvol-114-disk-2       no
rpool/data/subvol-115-disk-1    /rpool/data/subvol-115-disk-1       no
rpool/data/subvol-116-disk-1    /rpool/data/subvol-116-disk-1       no
rpool/data/subvol-118-disk-0    /rpool/data/subvol-118-disk-0       no
rpool/data/subvol-120-disk-1    /rpool/data/subvol-120-disk-1       no
rpool/data/vm-100-disk-1        -                                    -
rpool/data/vm-100-disk-3        -                                    -
rpool/data/vm-107-disk-0        -                                    -
rpool/data/vm-107-state-before  -                                    -
rpool/data/vm-108-disk-1        -                                    -
rpool/data/vm-109-disk-1        -                                    -
rpool/data/vm-112-disk-1        -                                    -
rpool/data/vm-117-disk-1        -                                    -
rpool/data/vm-121-disk-1        -                                    -
rpool/swap                      -                                    -
tank                            /tank                              yes
tank/media                      /tank/media                        yes
tank/vmdata                     /tank/vmdata                       yes
tank/vmdata/vm-113-disk-1       -                                    -
tank/vmstorage                  /tank/vmstorage                    yes

The root pool "rpool" has been working perfectly since the base install of Proxmox VE 4.1 and every version since then. How do I debug this?
 
Update:

"zfs mount -a" failed since the directory was not empty. It contain the directories "data" and "ROOT". After deleting these directories, "zfs mount -a" succeeded and "zfs list -r -o name,mountpoint,mounted" now lists everything as mounted.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!