Upgrade from 5.x to 6.x LXC containers will not start

Cronus89

Well-Known Member
Nov 22, 2017
37
3
48
35
Tried a few things and below are the results. Seems maybe the zfs setup is bugged?

root@prox2:~# pct mount 102
mounting container failed
cannot open directory //rpool/data/subvol-102-disk-1: No such file or directory
root@prox2:~# pct mount 102^C
root@prox2:~# ^C
root@prox2:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rootpool 13.5G 94.0G 96K /rootpool
rootpool/ROOT 4.73G 94.0G 96K /rootpool/ROOT
rootpool/ROOT/pve-1 4.73G 94.0G 4.73G /
rootpool/data 96K 94.0G 96K /rootpool/data
rootpool/swap 8.50G 95.9G 6.61G -
rpool 1.72T 1.79T 96K /rpool
rpool/data 1.72T 1.79T 112K /rpool/data
rpool/data/subvol-102-disk-1 291G 72.4G 278G /rpool/data/subvol-102-disk-1
rpool/data/subvol-104-disk-1 421M 1.59G 421M /rpool/data/subvol-104-disk-1
rpool/data/subvol-105-disk-1 5.36G 14.6G 5.36G /rpool/data/subvol-105-disk-1
rpool/data/subvol-107-disk-1 2.46G 2.55G 2.45G /rpool/data/subvol-107-disk-1
rpool/data/subvol-108-disk-1 1.06G 970M 1.05G /rpool/data/subvol-108-disk-1
rpool/data/subvol-109-disk-1 1.12G 898M 1.12G /rpool/data/subvol-109-disk-1
rpool/data/subvol-110-disk-1 4.22G 1.78G 4.22G /rpool/data/subvol-110-disk-1
rpool/data/subvol-112-disk-0 537M 1.48G 537M /rpool/data/subvol-112-disk-0
rpool/data/subvol-112-disk-2 544G 56.3G 544G /rpool/data/subvol-112-disk-2
rpool/data/vm-100-disk-1 25.1G 1.79T 25.1G -
rpool/data/vm-101-disk-1 14.0G 1.79T 14.0G -
rpool/data/vm-103-disk-1 132M 1.79T 132M -
rpool/data/vm-106-disk-1 375G 1.79T 375G -
rpool/data/vm-111-disk-0 6.82G 1.79T 6.82G -
rpool/data/vm-113-disk-0 13.6G 1.79T 13.6G -
rpool/data/vm-113-disk-1 212G 1.79T 210G -
rpool/data/vm-115-disk-0 261G 1.79T 261G -
root@prox2:~# pct fsck 102
unable to run fsck for 'local-zfs:subvol-102-disk-1' (format == subvol)


root@prox2:~# lxc-start -n 102 0F 0l DEBUG -o /tmp/lxc-102.log
lxc-start: 102: lxccontainer.c: wait_on_daemonized_start: 856 No such file or directory - Failed to receive the container state
lxc-start: 102: tools/lxc_start.c: main: 330 The container failed to start
lxc-start: 102: tools/lxc_start.c: main: 333 To get more details, run the container in foreground mode
lxc-start: 102: tools/lxc_start.c: main: 336 Additional information can be obtained by setting the --logfile and --logpriority options

lxc-start 102 20190718170159.256 ERROR conf - conf.c:run_buffer:335 - Script exited with status 2
lxc-start 102 20190718170159.256 ERROR start - start.c:lxc_init:861 - Failed to run lxc.hook.pre-start for container "102"
lxc-start 102 20190718170159.256 ERROR start - start.c:__lxc_start:1944 - Failed to initialize container "102"
lxc-start 102 20190718170159.256 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:856 - No such file or directory - Failed to receive the container state
lxc-start 102 20190718170159.256 ERROR lxc_start - tools/lxc_start.c:main:330 - The container failed to start
lxc-start 102 20190718170159.256 ERROR lxc_start - tools/lxc_start.c:main:333 - To get more details, run the container in foreground mode
lxc-start 102 20190718170159.256 ERROR lxc_start - tools/lxc_start.c:main:336 - Additional information can be obtained by setting the --logfile and --logpriority options
 
Same issue on my side. VMs from ZFS works find but containers fails with:

Jul 18 19:17:54 x lxc-start[5374]: lxc-start: 212: lxccontainer.c: wait_on_daemonized_start: 856 No such file or directory - Failed to receive the container state
Jul 18 19:17:54 x lxc-start[5374]: lxc-start: 212: tools/lxc_start.c: main: 330 The container failed to start
Jul 18 19:17:54 x lxc-start[5374]: lxc-start: 212: tools/lxc_start.c: main: 333 To get more details, run the container in foreground mode
Jul 18 19:17:54 x lxc-start[5374]: lxc-start: 212: tools/lxc_start.c: main: 336 Additional information can be obtained by setting the --logfile and --logpriority options
Jul 18 19:17:54 x systemd[1]: pve-container@212.service: Control process exited, code=exited, status=1/FAILURE
Jul 18 19:17:54 x systemd[1]: pve-container@212.service: Failed with result 'exit-code'.

No issues on another system running with lvm only.
 
Little update, all zfs volumes for containers are unmounted. I mounted all by hand and i am able to start the container afterwards. One of the containers got an lock item and tells me it is mounted?

# pct list
VMID Status Lock Name
203 running mounted minio01
205 running minio02
211 running elastic02
212 running grafana
 
I don't know how to enter the dir of the container. I just saw them mounted in df -h
 
This is still a bug and not fixed. i have an entire node offline since it cannot start any LXC Containers. Can someone look into this?

It seems to be cause I use zfs and its not mounted properly I think?
 
It seems to be cause I use zfs and its not mounted properly I think?

hmm - please check the status of zfs-import-cache.service and zfs-import-scan.service:
Code:
systemctl status -l zfs-import-cache.service
systemctl status -l zfs-import-scan.service

does:
Code:
zfs mount -a

work without error and can you start your containers afterwards?
 
cache is running, stop is not, the mount did not succeed see below.



Code:
root@prox2:~# systemctl status -l zfs-import-cache.service
● zfs-import-cache.service - Import ZFS pools by cache file
   Loaded: loaded (/lib/systemd/system/zfs-import-cache.service; enabled; vendor prese
   Active: active (exited) since Thu 2019-09-19 14:37:15 CDT; 3 days ago
     Docs: man:zpool(8)
  Process: 1441 ExecStart=/sbin/zpool import -c /etc/zfs/zpool.cache -aN (code=exited,
 Main PID: 1441 (code=exited, status=0/SUCCESS)

Warning: Journal has been rotated since unit was started. Log output is incomplete or 

root@prox2:~# systemctl status -l zfs-import-scan.service 
● zfs-import-scan.service - Import ZFS pools by device scanning
   Loaded: loaded (/lib/systemd/system/zfs-import-scan.service; disabled; vendor prese
   Active: inactive (dead)
     Docs: man:zpool(8)
root@prox2:~# zfs mount -a
cannot mount '/rpool': directory is not empty
root@prox2:~# 
root@prox2:~# ls /rpool
data  ROOT
root@prox2:~#
 
* check which files are in /rpool/data (`find /rpool/data`) - if it's only the containers root-dirs and 'dev/' directories inside - remove them (if there are other things inside - please post the output)

* else - set the cache-file property on both your pools and update the initramfs:
Code:
zpool set cachefile=/etc/zfs/zpool.cache rpool
zpool set cachefile=/etc/zfs/zpool.cache rootpool
update-initramfs -k all -u

afterwards reboot.

I hope this helps!
 
it seems its all the containers files. Is this normal? if i rm -rf them what happens?

Code:
root@prox2:~# ls /rpool/data/
subvol-107-disk-1  subvol-109-disk-1  subvol-112-disk-0
subvol-108-disk-1  subvol-110-disk-1  subvol-112-disk-2
root@prox2:~#
 
please run `find /rpool/data` - this shows you the complete tree - and we can see if it's just the directories (and optionally a dev dir inside) or if the datasets are actually mounted

I would also suggest to not `rm -rf` them but rather `mv` them out of the way
 
hm - which filesystem is mounted on e.g. /rpool/data/subvol-107-disk-1 ? (`df /rpool/data/subvol-107-disk-1`)
* `zfs get all rpool/data | grep mount'
* `zfs get all rpool/data/subvol-107-disk-1 | grep mount`
would also be interesting
 
Code:
root@prox2:~# df /rpool/data/subvol-107-disk-1
Filesystem                   1K-blocks    Used Available Use% Mounted on
rpool/data/subvol-107-disk-1   5242880 2578176   2664704  50% /rpool/data/subvol-107-disk-1

root@prox2:~# zfs get all rpool/data | grep mount
rpool/data  mounted               no                     -
rpool/data  mountpoint            /rpool/data            default
rpool/data  canmount              on                     default

root@prox2:~# zfs get all rpool/data/subvol-107-disk-1 | grep mount
rpool/data/subvol-107-disk-1  mounted               yes                            -
rpool/data/subvol-107-disk-1  mountpoint            /rpool/data/subvol-107-disk-1  default
rpool/data/subvol-107-disk-1  canmount              on                             default
 
Aha! I have a cluster and the 107 is just a replication from another node. lxc 105 is actually on this node. And it is not mounted! None of the ones on this node seem to be. Below is whats in /rpool/data/ and also what ID's are on this node

Code:
root@prox2:~# ls //rpool/data/
subvol-107-disk-1  subvol-108-disk-1  subvol-109-disk-1  subvol-110-disk-1  subvol-112-disk-0  subvol-112-disk-2

lxc
104
105

vm
101
103
111

Code:
lxc-start 105 20190924152644.412 DEBUG    conf - conf.c:run_buffer:326 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 105 lxc pre-start with output: cannot open directory
//rpool/data/subvol-105-disk-1: No such file or directory
 
Ok - it seems that subvol-105-disk-1 is not mounted - `zfs get all rpool/data/subvol-105-disk-1 |grep -i mount ` should say so.
can you manually mount it? `zfs mount rpool/data/subvol-105-disk-1` and start the container afterwards?
 
Code:
root@prox2:~# zfs get all rpool/data/subvol-105-disk-1 |grep -i mount
rpool/data/subvol-105-disk-1  mounted               no                             -
rpool/data/subvol-105-disk-1  mountpoint            /rpool/data/subvol-105-disk-1  default
rpool/data/subvol-105-disk-1  canmount              on                             default

it does start now after manually mounting.

What could be the reason its not auto mounting?