I can't start LXC

DynFi User · Mar 1, 2020

Going into each mounted directory, removing the "dev" directory in it and remounting with "zfs mount" did temporarily solve my issue.

But this is really a deviated solution, not something one would like to do at every reboot…

Looks like we all have in common to have more than one zpool.

This obviously deserve a debug work and a patch.

Where can we open a bug report ?

Stoiko Ivanov · Mar 2, 2020

These issues are usually resolved by recreating the zpool.cache file and running update-initramfs -k all -u (as described in a few posts here in the forum)

this is something that I consider specific to the setups and not a problem with the software as is.

Did you do anything special when adding the second pool?

I hope this helps!

DynFi User · Mar 2, 2020

Hello, and thanks for the feedback.
This pool has been setup with Proxmox 4, when we have started to use Proxmox.

We have been leaving with it long time ago.

The problem seems to be tied to the existence of the /dev/ directory as described by someone on this thread.

I have "recreated the zpool.cache" and "update-initramfs" but after the reboot of the system all CT were down !
It took me a while to realize that the mounting was somehow locked by these "dev" folders.

I had to remove the "dev" folders, unmount each directory using zfs unmount and mount them again to have full access to the CT and be able to start them again.

But It is definitely a bug.
You have a minimum of five persons with the same issue (not counting those who have faced the issue and have decided to "reinstall").
This seems to happen when you have two zpool and main pool is not the one where system is installed.

The ZFS setup is quite standard :

pool: monster
state: ONLINE
scan: scrub repaired 0B in 11h7m with 0 errors on Sun Feb 9 11:31:25 2020
config:

NAME STATE READ WRITE CKSUM
monster ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
wwn-0x5000039488301e58 ONLINE 0 0 0
wwn-0x5000039488300c20 ONLINE 0 0 0
wwn-0x50000396083857a0 ONLINE 0 0 0
wwn-0x500003960833bb3c ONLINE 0 0 0
wwn-0x5000039608389fec ONLINE 0 0 0
wwn-0x50000c0f02078ad0 ONLINE 0 0 0
spares
wwn-0x50000c0f02ba8db0 AVAIL

errors: No known data errors

pool: rpool
state: ONLINE
scan: scrub repaired 0B in 0h1m with 0 errors on Sun Feb 9 00:25:19 2020
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdi2 ONLINE 0 0 0
sdj2 ONLINE 0 0 0

errors: No known data errors

Also since last update my data storage pool is no longer defined using "mountpoints" like "sdi2" but using more cryptic "wwn-0x5000039488301e58".

DynFi User · Mar 2, 2020

One more thing, since last reboot I have noticed that the disk in the main pool ar now displayed using their "ID"

pool: monster
state: ONLINE
scan: scrub repaired 0B in 11h7m with 0 errors on Sun Feb 9 11:31:25 2020
config:

NAME STATE READ WRITE CKSUM
monster ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
wwn-0x5000039488301e58 ONLINE 0 0 0
wwn-0x5000039488300c20 ONLINE 0 0 0
wwn-0x50000396083857a0 ONLINE 0 0 0
wwn-0x500003960833bb3c ONLINE 0 0 0
wwn-0x5000039608389fec ONLINE 0 0 0
wwn-0x50000c0f02078ad0 ONLINE 0 0 0
spares
wwn-0x50000c0f02ba8db0 AVAIL

Where they used to be seen with their path as on the root pool :

pool: rpool
state: ONLINE
scan: scrub repaired 0B in 0h1m with 0 errors on Sun Feb 9 00:25:19 2020
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdi2 ONLINE 0 0 0
sdj2 ONLINE 0 0 0

errors: No known data errors

This is quite strange.

Stoiko Ivanov · Mar 2, 2020

Greg Bernard said:
The problem seems to be tied to the existence of the /dev/ directory as described by someone on this thread.

The '/dev' directory exists because the container gets started before all zfs filesystems are mounted (especially the dataset of the container)

-> check your journal since the last boot for all messages related to zfs/zpool imports and mounts (`journalctl -b`)

-> check which zfs related units are enabled (zfs-import-cache.service , zfs-import-scan.service,.... - the output of `systemctl -a |grep -Ei 'zfs|zpool'`)

I hope this helps.

chencho · May 8, 2020

Same here

After dist-upgrade into a vm (721 - ubuntu) it can't start again with error:

Code:

may 08 17:10:46 server7 lxc-start[2479523]: lxc-start: 721: lxccontainer.c: wait_on_daemonized_start: 856 No such file or directory - Failed to receive the container state
may 08 17:10:46 server7 lxc-start[2479523]: lxc-start: 721: tools/lxc_start.c: main: 330 The container failed to start
may 08 17:10:46 server7 lxc-start[2479523]: lxc-start: 721: tools/lxc_start.c: main: 333 To get more details, run the container in foreground mode
may 08 17:10:46 server7 lxc-start[2479523]: lxc-start: 721: tools/lxc_start.c: main: 336 Additional information can be obtained by setting the --logfile and --logpriority options
may 08 17:10:46 server7 systemd[1]: pve-container@721.service: Control process exited, code=exited, status=1/FAILURE
may 08 17:10:46 server7 systemd[1]: pve-container@721.service: Failed with result 'exit-code'.
may 08 17:10:46 server7 systemd[1]: Failed to start PVE LXC Container: 721.

I can do:

Code:

root@server7:~# pct unmount 721
no lock found trying to remove 'mounted'  lock
root@server7:~# zfs umount rpool/vmdata/subvol-721-disk-0
root@server7:~# rm -rf /rpool/vmdata/subvol-721-disk-0/dev
root@server7:~# zfs mount rpool/vmdata/subvol-721-disk-0
root@server7:~# pct start 721

But error persists

Here is my systemctl -a |grep -Ei 'zfs|zpool':

Code:

root@server7:~# systemctl -a |grep -Ei 'zfs|zpool'
  zfs-import-cache.service                                                                                                             loaded    active     exited    Import ZFS pools by cache file
● zfs-mount.service                                                                                                                    loaded    failed     failed    Mount ZFS filesystems
  zfs-share.service                                                                                                                    loaded    active     exited    ZFS file system shares
  zfs-import.target                                                                                                                    loaded    active     active    ZFS pool import target
  zfs.target                                                                                                                           loaded    active     active    ZFS startup target

Code:

systemctl status zfs-mount.service
● zfs-mount.service - Mount ZFS filesystems
   Loaded: loaded (/lib/systemd/system/zfs-mount.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Fri 2019-09-13 12:56:42 CEST; 7 months 25 days ago
     Docs: man:zfs(8)
Main PID: 1316 (code=exited, status=1/FAILURE)

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

The others vm's are working as expect (but I don't want to restart it!)

I see this post: https://forum.proxmox.com/threads/upgrade-pve-5-1-41-to-5-2-1-failed-to-start-mount-zfs.44014/ but not sure if I now can do zfs set mountpoint=none rpool , because I have too many vm's currently running in my rpool

I see all my vm's mounted, included 721

Code:

NAME                             USED  AVAIL     REFER  MOUNTPOINT
rpool                           41,8G   834G       96K  /rpool
rpool/backup                     756M   834G       96K  /backups
rpool/tmp_backup                 124K   834G      124K  /var/lib/vz/tmp_backup
rpool/vmdata                    40,7G   834G     5,94G  /rpool/vmdata
rpool/vmdata/subvol-606-disk-0  2,25G  17,8G     2,17G  /rpool/vmdata/subvol-606-disk-0
rpool/vmdata/subvol-702-disk-0  2,41G  48,1G     1,91G  /rpool/vmdata/subvol-702-disk-0
rpool/vmdata/subvol-712-disk-0  2,43G  48,0G     2,02G  /rpool/vmdata/subvol-712-disk-0
rpool/vmdata/subvol-716-disk-0  2,22G  48,2G     1,81G  /rpool/vmdata/subvol-716-disk-0
rpool/vmdata/subvol-717-disk-0  2,10G  48,3G     1,69G  /rpool/vmdata/subvol-717-disk-0
rpool/vmdata/subvol-719-disk-0  3,14G  47,3G     2,72G  /rpool/vmdata/subvol-719-d isk-0
rpool/vmdata/subvol-720-disk-0  2,45G  48,0G     2,03G  /rpool/vmdata/subvol-720-disk-0
rpool/vmdata/subvol-721-disk-0  2,94G  48,1G     1,89G  /rpool/vmdata/subvol-721-disk-0
rpool/vmdata/subvol-722-disk-0  2,27G  48,2G     1,84G  /rpool/vmdata/subvol-722-disk-0
rpool/vmdata/subvol-801-disk-0  3,78G  46,7G     3,35G  /rpool/vmdata/subvol-801-disk-0
rpool/vmdata/subvol-802-disk-0  4,40G  46,0G     3,95G  /rpool/vmdata/subvol-802-disk-0
rpool/vmdata/subvol-803-disk-0  4,40G  46,0G     3,95G  /rpool/vmdata/subvol-803-disk-0

I can stop/start others VM's, like 606 without problems.

And finally ...

Code:

lxc-start -n 721 -F -l DEBUG -o /tmp/lxc-721.log
...
Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 721 lxc pre-start with output: unsupported Ubuntu version '19.10'

chencho · May 8, 2020

I need to change

/usr/share/perl5/PVE/LXC/Setup/Ubuntu.pm

And add ‘19.10’ => 1, # eoan

Then systemctl restart pveproxy pvedaemon

Now I can start my vm.

Stoiko Ivanov · May 11, 2020

chencho said:
And add ‘19.10’ => 1, # eoan

this has been added in last October - make sure you have the latest updates installed

linwinfan · Nov 19, 2020

use df -h
if subvolumes not in list, then

Removed dirs in subvolumes.
zfs mount nvme1/subvol....

then ok

Search

Search

I can't start LXC

DynFi User

Renowned Member

Stoiko Ivanov

Proxmox Staff Member

DynFi User

Renowned Member

DynFi User

Renowned Member

Stoiko Ivanov

Proxmox Staff Member

chencho

Well-Known Member

chencho

Well-Known Member

Stoiko Ivanov

Proxmox Staff Member

linwinfan

New Member

We value your privacy