Can't start some containers after upgrade 5.14 -> 6.1

MikeC

Renowned Member
Jan 11, 2016
72
0
71
Bay Area, California
Hello.

After upgrading my proxmox from jessie -> buster, several of my containers won't start. I had added a second disk through the UI, and stored several containers on it. The volume shows up in the UI as a ZFS volume, and the containers show up under Content, but attempts to start always fail. Here is the error:

Code:
lxc-start 104 20200509013351.704 INFO     lsm - lsm/lsm.c:lsm_init:29 - LSM security driver AppArmor
lxc-start 104 20200509013351.704 INFO     conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "104", config section "lxc"
lxc-start 104 20200509013353.116 DEBUG    conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 104 lxc pre-start produced output: cannot open directory //MEDIA/subvol-104-disk-0: No such file or directory

lxc-start 104 20200509013353.138 ERROR    conf - conf.c:run_buffer:323 - Script exited with status 2
lxc-start 104 20200509013353.138 ERROR    start - start.c:lxc_init:804 - Failed to run lxc.hook.pre-start for container "104"
lxc-start 104 20200509013353.138 ERROR    start - start.c:__lxc_start:1903 - Failed to initialize container "104"
lxc-start 104 20200509013353.138 INFO     conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "104", config section "lxc"
lxc-start 104 20200509013353.643 INFO     conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxc/hooks/lxc-pve-poststop-hook" for container "104", config section "lxc"
lxc-start 104 20200509013354.962 DEBUG    conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 104 lxc post-stop produced output: umount: /var/lib/lxc/.pve-staged-mounts/mp0: not mounted.

lxc-start 104 20200509013354.963 DEBUG    conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 104 lxc post-stop produced output: command 'umount -- /var/lib/lxc/.pve-staged-mounts/mp0' failed: exit code 32

lxc-start 104 20200509013355.385 ERROR    lxc_start - tools/lxc_start.c:main:308 - The container failed to start
lxc-start 104 20200509013355.386 ERROR    lxc_start - tools/lxc_start.c:main:314 - Additional information can be obtained by setting the --logfile and --logpriority options

the config is:

Code:
#Plex media server on Debian8
arch: amd64
cpulimit: 2
cpuunits: 1024
hostname: plex.aviate.org
memory: 2000
mp0: MEDIA:subvol-104-disk-0,mp=/usr/local/media2,size=500G
net0: bridge=vmbr0,gw=192.168.1.1,hwaddr=9A:63:CF:3E:92:1C,ip=192.168.1.100/24,ip6=auto,name=eth0,type=veth
onboot: 1
ostype: ubuntu
rootfs: zfs-local:subvol-104-disk-1,size=1250G
swap: 2000
lxc.cgroup.devices.allow: c 189:* rwm
lxc.mount.entry: /dev/bus/usb/009 dev/bus/usb/009 none bind,optional,create=file

I'm not sure why rootfs is on zfs-local, when it's showing up in Content for the MEDIA volume. Here's another container that is not starting:

Code:
#Plex media server on Debian8
arch: amd64
cpulimit: 2
cpuunits: 1024
hostname: plex.aviate.org
memory: 2000
mp0: MEDIA:subvol-104-disk-0,mp=/usr/local/media2,size=500G
net0: bridge=vmbr0,gw=192.168.1.1,hwaddr=9A:63:CF:3E:92:1C,ip=192.168.1.100/24,ip6=auto,name=eth0,type=veth
onboot: 1
ostype: ubuntu
rootfs: zfs-local:subvol-104-disk-1,size=1250G
swap: 2000
lxc.cgroup.devices.allow: c 189:* rwm
lxc.mount.entry: /dev/bus/usb/009 dev/bus/usb/009 none bind,optional,create=file

Code:
root@proxmox:~# cat lxc-105.log 
lxc-start 105 20200509015743.651 INFO     lsm - lsm/lsm.c:lsm_init:29 - LSM security driver AppArmor
lxc-start 105 20200509015743.651 INFO     conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "105", config section "lxc"
lxc-start 105 20200509015745.181 DEBUG    conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 105 lxc pre-start produced output: cannot open directory //MEDIA/subvol-105-disk-0: No such file or directory

lxc-start 105 20200509015745.402 ERROR    conf - conf.c:run_buffer:323 - Script exited with status 2
lxc-start 105 20200509015745.404 ERROR    start - start.c:lxc_init:804 - Failed to run lxc.hook.pre-start for container "105"
lxc-start 105 20200509015745.404 ERROR    start - start.c:__lxc_start:1903 - Failed to initialize container "105"
lxc-start 105 20200509015745.407 INFO     conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "105", config section "lxc"
lxc-start 105 20200509015745.545 INFO     conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxc/hooks/lxc-pve-poststop-hook" for container "105", config section "lxc"
lxc-start 105 20200509015746.858 DEBUG    conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 105 lxc post-stop produced output: umount: /var/lib/lxc/105/rootfs: not mounted

lxc-start 105 20200509015746.858 DEBUG    conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 105 lxc post-stop produced output: command 'umount --recursive -- /var/lib/lxc/105/rootfs' failed: exit code 1

lxc-start 105 20200509015746.879 ERROR    conf - conf.c:run_buffer:323 - Script exited with status 1
lxc-start 105 20200509015746.880 ERROR    start - start.c:lxc_end:971 - Failed to run lxc.hook.post-stop for container "105"
lxc-start 105 20200509015746.880 ERROR    lxc_start - tools/lxc_start.c:main:308 - The container failed to start
lxc-start 105 20200509015746.880 ERROR    lxc_start - tools/lxc_start.c:main:314 - Additional information can be obtained by setting the --logfile and --logpriority options

So the common issue seems to be:
lxc-pve-prestart-hook 105 lxc pre-start produced output: cannot open directory //MEDIA/subvol-105-disk-0: No such file or directory

Here is the screenshot of the summary page in the UI:

Screen Shot 2020-05-08 at 6.54.18 PM.png

There are 2 disks that make up my zfs-local RAID group, and I added a single 2TB drive named MEDIA, which was working fine under 5.x. All three disks show up under fdisk. Any idea why this disk is somehow not available when I go to start the containers?
Thanks.
 
It looks like I may have mounted the 2TB disk on /MEDIA on the proxmox server itself, since there is an empty /MEDIA dir. Also the config looks like that /MEDIA dir is then shared onto 104. Trying to mount the device returns this error:

Code:
root@proxmox:/media# mount /dev/sda1 /MEDIA
mount: /MEDIA: unknown filesystem type 'zfs_member'.

Zpool status:
Code:
root@proxmox:~# zpool status
  pool: MEDIA
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 0 days 00:44:06 with 0 errors on Sun Apr 12 01:08:08 2020
config:

    NAME                      STATE     READ WRITE CKSUM
    MEDIA                     ONLINE       0     0     0
      wwn-0x5000cca223dab25b  ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 0 days 07:26:48 with 0 errors on Sun Apr 12 07:50:51 2020
config:

    NAME        STATE     READ WRITE CKSUM
    rpool       ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        sdb2    ONLINE       0     0     0
        sdc2    ONLINE       0     0     0

errors: No known data errors
 
Evidently, the zfs pool did not auto-mount. I'm not sure if this is due to the upgrade from jessie -> buster, or the PVE upgrade. Issuing a 'zpool mount MEDIA' attached the drive to /MEDIA, and *one* container was able to start (104). However, one of the files were there.

Other containers that had not started still won't, but this time they have "unable to detect OS distribution" errors. These were created lately, based on a debian minimal image.

Code:
root@proxmox:~# lxc-start -n 114 -F -lDEBUG -o lxc-114.log
lxc-start: 114: conf.c: run_buffer: 323 Script exited with status 2
lxc-start: 114: start.c: lxc_init: 804 Failed to run lxc.hook.pre-start for container "114"
lxc-start: 114: start.c: __lxc_start: 1903 Failed to initialize container "114"
lxc-start: 114: tools/lxc_start.c: main: 308 The container failed to start
lxc-start: 114: tools/lxc_start.c: main: 314 Additional information can be obtained by setting the --logfile and --logpriority options
root@proxmox:~# cat lxc-114.log
lxc-start 114 20200509185956.256 INFO     lsm - lsm/lsm.c:lsm_init:29 - LSM security driver AppArmor
lxc-start 114 20200509185956.256 INFO     conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "114", config section "lxc"
lxc-start 114 20200509185957.649 DEBUG    conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 114 lxc pre-start produced output: unable to detect OS distribution

lxc-start 114 20200509185957.671 ERROR    conf - conf.c:run_buffer:323 - Script exited with status 2
lxc-start 114 20200509185957.671 ERROR    start - start.c:lxc_init:804 - Failed to run lxc.hook.pre-start for container "114"
lxc-start 114 20200509185957.671 ERROR    start - start.c:__lxc_start:1903 - Failed to initialize container "114"
lxc-start 114 20200509185957.672 INFO     conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "114", config section "lxc"
lxc-start 114 20200509185958.176 INFO     conf - conf.c:run_script_argv:340 - Executing script "/usr/share/lxc/hooks/lxc-pve-poststop-hook" for container "114", config section "lxc"
lxc-start 114 20200509185959.532 DEBUG    conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 114 lxc post-stop produced output: umount: /var/lib/lxc/.pve-staged-mounts/mp0: not mounted.

lxc-start 114 20200509185959.532 DEBUG    conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-poststop-hook 114 lxc post-stop produced output: command 'umount -- /var/lib/lxc/.pve-staged-mounts/mp0' failed: exit code 32

lxc-start 114 20200509185959.631 ERROR    lxc_start - tools/lxc_start.c:main:308 - The container failed to start
lxc-start 114 20200509185959.632 ERROR    lxc_start - tools/lxc_start.c:main:314 - Additional information can be obtained by setting the --logfile and --logpriority options

I've seen similar posts. Does anyone have some input on to how to resolve this?

Container 114 has "OS Type: debian". However, other containers that ARE starting also have debian as the OS type.
The system itself has:

Code:
root@proxmox:~# cat /etc/debian_version
10.3
root@proxmox:~# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Note: Found out that the system could not mount the subvolumes since there was a 'dev' directory that got created on them. Since they weren't mounted, there was no filesystem, and therefore no /etc/debian-version or /etc/os-release.
 
Last edited:
Trying a "zfs mount -a" displayed an error that subvol-105-disk-0 and subvol-114-disk-0 were not empty, and therefore couldn't be mounted. Both of those subdirs had an empty "dev" directory. Once I removed them, "zfs mount -a" worked, and I could start 105 and 114. Also, the mount on 104 is now working. Looks like the issue was zfs couldn't mount the MEDIA pool because of those errors.
 
Everything is working again. Not sure why a dev directory was created in 2 of the container mount points, but that was probably the root cause. Issue solved.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!