lxc container on zfs won't start after reboot of pve host

I figured it out:

I looped this and it worked, after 3 tries
Code:
systemctl restart zfs-mount
journactl <-- -xb something -- see /rpool/data/* is something

#if not mounted yet, take care you could delete container data!
rm -rf /rpool/data/*

Somehow Proxmox created /rpool/data/ct-id-something-blabla/rootfs/data in the empty data directory.
 
If it's still true that the CT does not start due to ZFS not being able to mount as the MP directory isn't empty with the new package versions, then yes, this seems like another bug - albeit, I find that weird..

Can you actually check what is in the directory hindering the mount? Something like:

Bash:
ls -lart "/$(zfs get -H -o name mountpoint  datapool-01/vm-crypt/subvol-108-disk-0)"
 
If it's still true that the CT does not start due to ZFS not being able to mount as the MP directory isn't empty with the new package versions, then yes, this seems like another bug - albeit, I find that weird..

Can you actually check what is in the directory hindering the mount? Something like:

Bash:
ls -lart "/$(zfs get -H -o name mountpoint  datapool-01/vm-crypt/subvol-108-disk-0)"


Hi,

I just did an apt-get upgrade and a reboot but still no luck. LXC won't start after reboot.

Here is the output of the command:

Code:
root@pve01:~# ls -lart "/$(zfs get -H -o name mountpoint  datapool-01/vm-crypt/subvol-108-disk-0)"
total 1
drwxr-xr-x 3 root root 3 Sep 12 09:57 ..
drwxr----- 2 root root 2 Sep 12 09:57 .

I executed it ...
- once after the reboot
- again after zfs load-key
- again after the first attempt to start the lxc

The result was the same every time.

cheers
Michael
 
Hi,
any news on this?
can I do anything to help to get this bug fixed?
provide additional logs / tests ?

cheers
Michael
 
I am also experiencing this issue, using pve-container 3.2-2. In my case, I do not even see a "/dev" directory in the mounting path. I am able to use the workaround described above (delete mounting path and use "zfs mount XXX"), but this is not a good long-term solution.

Here are the relevant logs for me (attached debug log, since it was too large to post):
Code:
Web console error:
TASK ERROR: zfs error: '/tank/vm-dev/subvol-100-disk-1': not a ZFS filesystem

Code:
root@pve1:~# pveversion -v
proxmox-ve: 6.2-2 (running kernel: 5.4.65-1-pve)
pve-manager: 6.2-15 (running version: 6.2-15/48bd51b6)
pve-kernel-5.4: 6.2-7
pve-kernel-helper: 6.2-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-2
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-9
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 0.9.4-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.3-6
pve-cluster: 6.2-1
pve-container: 3.2-2
pve-docs: 6.2-6
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-4
pve-xtermjs: 4.7.0-2
pve-zsync: 2.0-3
qemu-server: 6.2-18
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve2

Code:
root@pve1:~# lxc-start -d -n 100 -F -l DEBUG -o /tmp/lxc-100.log
lxc-start: 100: conf.c: mount_autodev: 1074 Permission denied - Failed to create "/dev" directory
                                                                                                 lxc-start: 100: conf.c: lxc_setup: 3238 Failed to mount "/dev"
lxc-start: 100: start.c: do_start: 1224 Failed to setup container "100"
                                                                        lxc-start: 100: sync.c: __sync_wait: 41 An error occurred in another process (expected sequence number 5)
                   lxc-start: 100: start.c: __lxc_start: 1950 Failed to spawn container "100"
                                                                                             lxc-start: 100: tools/lxc_start.c: main: 308 The container failed to start
lxc-start: 100: tools/lxc_start.c: main: 314 Additional information can be obtained by setting the --logfile and --logpriority options
 

Attachments

  • lxc-100.log
    64.4 KB · Views: 0
Last edited:
After trying yet another workaround for this, I found simply mounting the ZFS subvolume allowed me to start the container:
zfs mount tank/vm-dev/subvol-100-disk-1
pct start 100
 
Figured it out - my zpool mountpoint path existed on the root file system, preventing the pool from mounting at all. To solve this once and for all, I issued this command (after stopping all containers/VMs):
zpool export tank && rm -rf /tank

This cleared out any existing directory structure on the pool's mount point and allowed the zpool to not only import but also to mount all necessary ZFS filesystems under /tank.

You can check to see if this issue affects your system by looking at the output of df on the pool's mounting path. For example:

Example of "masked" zpool mount path ("Mounted on" = system root '/'):
Code:
root@pve1:~# df /tank
Filesystem           1K-blocks     Used Available Use% Mounted on
/dev/mapper/pve-root  59600812 37869940  18673620  67% /

Example of proper zpool mount path ("Filesystem" = pool name):
Code:
root@pve1:~# df /tank
Filesystem      1K-blocks   Used  Available Use% Mounted on
tank           6802108032 504064 6801603968   1% /tank
 
Thinking back to how this might have happened in the first place, I vaguely recall creating a Directory storage on top of a ZFS file system, so that I could store backups, snippets and ISOs on a ZFS-backed file system. However, I think this can result in the creation of the requisite paths if for any reason the pool doesn't import on boot, resulting in a directory structure on the pool's mount point. Don't do that!
 
Thinking back to how this might have happened in the first place, I vaguely recall creating a Directory storage on top of a ZFS file system, so that I could store backups, snippets and ISOs on a ZFS-backed file system. However, I think this can result in the creation of the requisite paths if for any reason the pool doesn't import on boot, resulting in a directory structure on the pool's mount point. Don't do that!
yes, this is very likely the cause of the issue.

having directory storage on the zpool is not really recommended because of this reason, since when you're booting if the directory storage gets mounted first, zpool won't be able to mount (directory not empty).
 
  • Like
Reactions: Elliott Partridge
yes, this is very likely the cause of the issue.

having directory storage on the zpool is not really recommended because of this reason, since when you're booting if the directory storage gets mounted first, zpool won't be able to mount (directory not empty).
Thanks for the confirmation. I ended up hosting ISOs, templates, snippets inside a container.
 
Hi Elliot

just to make sure: You do have the same issues because you are also using an encryptet ZFS as Storage for the LXC ?

Because I havn't been able to get the container running after a reboot.
My workaround is still to delete and restore the container from backup :-/

cheers
Michael
 
Hi Elliot

just to make sure: You do have the same issues because you are also using an encryptet ZFS as Storage for the LXC ?

Because I havn't been able to get the container running after a reboot.
My workaround is still to delete and restore the container from backup :-/

cheers
Michael
No, I'm not using an encrypted ZFS dataset. I only have the same issue with regard to not starting up the container after reboot, with the same error message, "Permission denied - Failed to create "/dev" directory". I think the issue I had could also present itself with encrypted datasets.
 
OK, so the next thing that I will try is to move my container to a storage that does not need to be unlocked after reboot (network share or local unencrypted zfs pool)

Then I will see if the fact that the storage is unavailable during the boot process of the PVE host is the reason...

where exactly is your LXC hosted on? Local storage or a network share?
 
OK, so the next thing that I will try is to move my container to a storage that does not need to be unlocked after reboot (network share or local unencrypted zfs pool)

Then I will see if the fact that the storage is unavailable during the boot process of the PVE host is the reason...

where exactly is your LXC hosted on? Local storage or a network share?
My LXC is hosted on a local ZFS dataset.
 
Hm. I moved the containers disk to a new storage (CIFS)
After that the container won't start

Code:
()
run_buffer: 323 Script exited with status 255
lxc_init: 797 Failed to run lxc.hook.pre-start for container "108"
__lxc_start: 1896 Failed to initialize container "108"
TASK ERROR: startup for container '108' failed

A regular VM on the same storage works fine


EDIT:

Deleted it and restored it to the same storage: No Luck
Deleted it and restored it to a local unencrypted ZFS -> working again.
Aren't containers supposed to work on shared cifs storages ?

Next Step: Reboot host and see if it starts again after that
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!