Upgrade from 5.x to 6.x LXC containers will not start

Cronus89

Well-Known Member
Nov 22, 2017
37
3
48
35
Tried a few things and below are the results. Seems maybe the zfs setup is bugged?

root@prox2:~# pct mount 102
mounting container failed
cannot open directory //rpool/data/subvol-102-disk-1: No such file or directory
root@prox2:~# pct mount 102^C
root@prox2:~# ^C
root@prox2:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rootpool 13.5G 94.0G 96K /rootpool
rootpool/ROOT 4.73G 94.0G 96K /rootpool/ROOT
rootpool/ROOT/pve-1 4.73G 94.0G 4.73G /
rootpool/data 96K 94.0G 96K /rootpool/data
rootpool/swap 8.50G 95.9G 6.61G -
rpool 1.72T 1.79T 96K /rpool
rpool/data 1.72T 1.79T 112K /rpool/data
rpool/data/subvol-102-disk-1 291G 72.4G 278G /rpool/data/subvol-102-disk-1
rpool/data/subvol-104-disk-1 421M 1.59G 421M /rpool/data/subvol-104-disk-1
rpool/data/subvol-105-disk-1 5.36G 14.6G 5.36G /rpool/data/subvol-105-disk-1
rpool/data/subvol-107-disk-1 2.46G 2.55G 2.45G /rpool/data/subvol-107-disk-1
rpool/data/subvol-108-disk-1 1.06G 970M 1.05G /rpool/data/subvol-108-disk-1
rpool/data/subvol-109-disk-1 1.12G 898M 1.12G /rpool/data/subvol-109-disk-1
rpool/data/subvol-110-disk-1 4.22G 1.78G 4.22G /rpool/data/subvol-110-disk-1
rpool/data/subvol-112-disk-0 537M 1.48G 537M /rpool/data/subvol-112-disk-0
rpool/data/subvol-112-disk-2 544G 56.3G 544G /rpool/data/subvol-112-disk-2
rpool/data/vm-100-disk-1 25.1G 1.79T 25.1G -
rpool/data/vm-101-disk-1 14.0G 1.79T 14.0G -
rpool/data/vm-103-disk-1 132M 1.79T 132M -
rpool/data/vm-106-disk-1 375G 1.79T 375G -
rpool/data/vm-111-disk-0 6.82G 1.79T 6.82G -
rpool/data/vm-113-disk-0 13.6G 1.79T 13.6G -
rpool/data/vm-113-disk-1 212G 1.79T 210G -
rpool/data/vm-115-disk-0 261G 1.79T 261G -
root@prox2:~# pct fsck 102
unable to run fsck for 'local-zfs:subvol-102-disk-1' (format == subvol)


root@prox2:~# lxc-start -n 102 0F 0l DEBUG -o /tmp/lxc-102.log
lxc-start: 102: lxccontainer.c: wait_on_daemonized_start: 856 No such file or directory - Failed to receive the container state
lxc-start: 102: tools/lxc_start.c: main: 330 The container failed to start
lxc-start: 102: tools/lxc_start.c: main: 333 To get more details, run the container in foreground mode
lxc-start: 102: tools/lxc_start.c: main: 336 Additional information can be obtained by setting the --logfile and --logpriority options

lxc-start 102 20190718170159.256 ERROR conf - conf.c:run_buffer:335 - Script exited with status 2
lxc-start 102 20190718170159.256 ERROR start - start.c:lxc_init:861 - Failed to run lxc.hook.pre-start for container "102"
lxc-start 102 20190718170159.256 ERROR start - start.c:__lxc_start:1944 - Failed to initialize container "102"
lxc-start 102 20190718170159.256 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:856 - No such file or directory - Failed to receive the container state
lxc-start 102 20190718170159.256 ERROR lxc_start - tools/lxc_start.c:main:330 - The container failed to start
lxc-start 102 20190718170159.256 ERROR lxc_start - tools/lxc_start.c:main:333 - To get more details, run the container in foreground mode
lxc-start 102 20190718170159.256 ERROR lxc_start - tools/lxc_start.c:main:336 - Additional information can be obtained by setting the --logfile and --logpriority options
 
Same issue on my side. VMs from ZFS works find but containers fails with:

Jul 18 19:17:54 x lxc-start[5374]: lxc-start: 212: lxccontainer.c: wait_on_daemonized_start: 856 No such file or directory - Failed to receive the container state
Jul 18 19:17:54 x lxc-start[5374]: lxc-start: 212: tools/lxc_start.c: main: 330 The container failed to start
Jul 18 19:17:54 x lxc-start[5374]: lxc-start: 212: tools/lxc_start.c: main: 333 To get more details, run the container in foreground mode
Jul 18 19:17:54 x lxc-start[5374]: lxc-start: 212: tools/lxc_start.c: main: 336 Additional information can be obtained by setting the --logfile and --logpriority options
Jul 18 19:17:54 x systemd[1]: pve-container@212.service: Control process exited, code=exited, status=1/FAILURE
Jul 18 19:17:54 x systemd[1]: pve-container@212.service: Failed with result 'exit-code'.

No issues on another system running with lvm only.
 
Little update, all zfs volumes for containers are unmounted. I mounted all by hand and i am able to start the container afterwards. One of the containers got an lock item and tells me it is mounted?

# pct list
VMID Status Lock Name
203 running mounted minio01
205 running minio02
211 running elastic02
212 running grafana
 
I don't know how to enter the dir of the container. I just saw them mounted in df -h
 
This is still a bug and not fixed. i have an entire node offline since it cannot start any LXC Containers. Can someone look into this?

It seems to be cause I use zfs and its not mounted properly I think?
 
It seems to be cause I use zfs and its not mounted properly I think?

hmm - please check the status of zfs-import-cache.service and zfs-import-scan.service:
Code:
systemctl status -l zfs-import-cache.service
systemctl status -l zfs-import-scan.service

does:
Code:
zfs mount -a

work without error and can you start your containers afterwards?
 
cache is running, stop is not, the mount did not succeed see below.



Code:
root@prox2:~# systemctl status -l zfs-import-cache.service
● zfs-import-cache.service - Import ZFS pools by cache file
   Loaded: loaded (/lib/systemd/system/zfs-import-cache.service; enabled; vendor prese
   Active: active (exited) since Thu 2019-09-19 14:37:15 CDT; 3 days ago
     Docs: man:zpool(8)
  Process: 1441 ExecStart=/sbin/zpool import -c /etc/zfs/zpool.cache -aN (code=exited,
 Main PID: 1441 (code=exited, status=0/SUCCESS)

Warning: Journal has been rotated since unit was started. Log output is incomplete or 

root@prox2:~# systemctl status -l zfs-import-scan.service 
● zfs-import-scan.service - Import ZFS pools by device scanning
   Loaded: loaded (/lib/systemd/system/zfs-import-scan.service; disabled; vendor prese
   Active: inactive (dead)
     Docs: man:zpool(8)
root@prox2:~# zfs mount -a
cannot mount '/rpool': directory is not empty
root@prox2:~# 
root@prox2:~# ls /rpool
data  ROOT
root@prox2:~#
 
* check which files are in /rpool/data (`find /rpool/data`) - if it's only the containers root-dirs and 'dev/' directories inside - remove them (if there are other things inside - please post the output)

* else - set the cache-file property on both your pools and update the initramfs:
Code:
zpool set cachefile=/etc/zfs/zpool.cache rpool
zpool set cachefile=/etc/zfs/zpool.cache rootpool
update-initramfs -k all -u

afterwards reboot.

I hope this helps!
 
it seems its all the containers files. Is this normal? if i rm -rf them what happens?

Code:
root@prox2:~# ls /rpool/data/
subvol-107-disk-1  subvol-109-disk-1  subvol-112-disk-0
subvol-108-disk-1  subvol-110-disk-1  subvol-112-disk-2
root@prox2:~#
 
please run `find /rpool/data` - this shows you the complete tree - and we can see if it's just the directories (and optionally a dev dir inside) or if the datasets are actually mounted

I would also suggest to not `rm -rf` them but rather `mv` them out of the way
 
hm - which filesystem is mounted on e.g. /rpool/data/subvol-107-disk-1 ? (`df /rpool/data/subvol-107-disk-1`)
* `zfs get all rpool/data | grep mount'
* `zfs get all rpool/data/subvol-107-disk-1 | grep mount`
would also be interesting
 
Code:
root@prox2:~# df /rpool/data/subvol-107-disk-1
Filesystem                   1K-blocks    Used Available Use% Mounted on
rpool/data/subvol-107-disk-1   5242880 2578176   2664704  50% /rpool/data/subvol-107-disk-1

root@prox2:~# zfs get all rpool/data | grep mount
rpool/data  mounted               no                     -
rpool/data  mountpoint            /rpool/data            default
rpool/data  canmount              on                     default

root@prox2:~# zfs get all rpool/data/subvol-107-disk-1 | grep mount
rpool/data/subvol-107-disk-1  mounted               yes                            -
rpool/data/subvol-107-disk-1  mountpoint            /rpool/data/subvol-107-disk-1  default
rpool/data/subvol-107-disk-1  canmount              on                             default
 
Aha! I have a cluster and the 107 is just a replication from another node. lxc 105 is actually on this node. And it is not mounted! None of the ones on this node seem to be. Below is whats in /rpool/data/ and also what ID's are on this node

Code:
root@prox2:~# ls //rpool/data/
subvol-107-disk-1  subvol-108-disk-1  subvol-109-disk-1  subvol-110-disk-1  subvol-112-disk-0  subvol-112-disk-2

lxc
104
105

vm
101
103
111

Code:
lxc-start 105 20190924152644.412 DEBUG    conf - conf.c:run_buffer:326 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 105 lxc pre-start with output: cannot open directory
//rpool/data/subvol-105-disk-1: No such file or directory
 
Ok - it seems that subvol-105-disk-1 is not mounted - `zfs get all rpool/data/subvol-105-disk-1 |grep -i mount ` should say so.
can you manually mount it? `zfs mount rpool/data/subvol-105-disk-1` and start the container afterwards?
 
Code:
root@prox2:~# zfs get all rpool/data/subvol-105-disk-1 |grep -i mount
rpool/data/subvol-105-disk-1  mounted               no                             -
rpool/data/subvol-105-disk-1  mountpoint            /rpool/data/subvol-105-disk-1  default
rpool/data/subvol-105-disk-1  canmount              on                             default

it does start now after manually mounting.

What could be the reason its not auto mounting?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!