Upgrade from 5.x to 6.x LXC containers will not start

Cronus89 · Jul 18, 2019

Tried a few things and below are the results. Seems maybe the zfs setup is bugged?

root@prox2:~# pct mount 102
mounting container failed
cannot open directory //rpool/data/subvol-102-disk-1: No such file or directory
root@prox2:~# pct mount 102^C
root@prox2:~# ^C
root@prox2:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rootpool 13.5G 94.0G 96K /rootpool
rootpool/ROOT 4.73G 94.0G 96K /rootpool/ROOT
rootpool/ROOT/pve-1 4.73G 94.0G 4.73G /
rootpool/data 96K 94.0G 96K /rootpool/data
rootpool/swap 8.50G 95.9G 6.61G -
rpool 1.72T 1.79T 96K /rpool
rpool/data 1.72T 1.79T 112K /rpool/data
rpool/data/subvol-102-disk-1 291G 72.4G 278G /rpool/data/subvol-102-disk-1
rpool/data/subvol-104-disk-1 421M 1.59G 421M /rpool/data/subvol-104-disk-1
rpool/data/subvol-105-disk-1 5.36G 14.6G 5.36G /rpool/data/subvol-105-disk-1
rpool/data/subvol-107-disk-1 2.46G 2.55G 2.45G /rpool/data/subvol-107-disk-1
rpool/data/subvol-108-disk-1 1.06G 970M 1.05G /rpool/data/subvol-108-disk-1
rpool/data/subvol-109-disk-1 1.12G 898M 1.12G /rpool/data/subvol-109-disk-1
rpool/data/subvol-110-disk-1 4.22G 1.78G 4.22G /rpool/data/subvol-110-disk-1
rpool/data/subvol-112-disk-0 537M 1.48G 537M /rpool/data/subvol-112-disk-0
rpool/data/subvol-112-disk-2 544G 56.3G 544G /rpool/data/subvol-112-disk-2
rpool/data/vm-100-disk-1 25.1G 1.79T 25.1G -
rpool/data/vm-101-disk-1 14.0G 1.79T 14.0G -
rpool/data/vm-103-disk-1 132M 1.79T 132M -
rpool/data/vm-106-disk-1 375G 1.79T 375G -
rpool/data/vm-111-disk-0 6.82G 1.79T 6.82G -
rpool/data/vm-113-disk-0 13.6G 1.79T 13.6G -
rpool/data/vm-113-disk-1 212G 1.79T 210G -
rpool/data/vm-115-disk-0 261G 1.79T 261G -
root@prox2:~# pct fsck 102
unable to run fsck for 'local-zfs:subvol-102-disk-1' (format == subvol)

root@prox2:~# lxc-start -n 102 0F 0l DEBUG -o /tmp/lxc-102.log
lxc-start: 102: lxccontainer.c: wait_on_daemonized_start: 856 No such file or directory - Failed to receive the container state
lxc-start: 102: tools/lxc_start.c: main: 330 The container failed to start
lxc-start: 102: tools/lxc_start.c: main: 333 To get more details, run the container in foreground mode
lxc-start: 102: tools/lxc_start.c: main: 336 Additional information can be obtained by setting the --logfile and --logpriority options

lxc-start 102 20190718170159.256 ERROR conf - conf.c:run_buffer:335 - Script exited with status 2
lxc-start 102 20190718170159.256 ERROR start - start.c:lxc_init:861 - Failed to run lxc.hook.pre-start for container "102"
lxc-start 102 20190718170159.256 ERROR start - start.c:__lxc_start:1944 - Failed to initialize container "102"
lxc-start 102 20190718170159.256 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:856 - No such file or directory - Failed to receive the container state
lxc-start 102 20190718170159.256 ERROR lxc_start - tools/lxc_start.c:main:330 - The container failed to start
lxc-start 102 20190718170159.256 ERROR lxc_start - tools/lxc_start.c:main:333 - To get more details, run the container in foreground mode
lxc-start 102 20190718170159.256 ERROR lxc_start - tools/lxc_start.c:main:336 - Additional information can be obtained by setting the --logfile and --logpriority options

talos · Jul 18, 2019

Same issue on my side. VMs from ZFS works find but containers fails with:

Jul 18 19:17:54 x lxc-start[5374]: lxc-start: 212: lxccontainer.c: wait_on_daemonized_start: 856 No such file or directory - Failed to receive the container state
Jul 18 19:17:54 x lxc-start[5374]: lxc-start: 212: tools/lxc_start.c: main: 330 The container failed to start
Jul 18 19:17:54 x lxc-start[5374]: lxc-start: 212: tools/lxc_start.c: main: 333 To get more details, run the container in foreground mode
Jul 18 19:17:54 x lxc-start[5374]: lxc-start: 212: tools/lxc_start.c: main: 336 Additional information can be obtained by setting the --logfile and --logpriority options
Jul 18 19:17:54 x systemd[1]: pve-container@212.service: Control process exited, code=exited, status=1/FAILURE
Jul 18 19:17:54 x systemd[1]: pve-container@212.service: Failed with result 'exit-code'.

No issues on another system running with lvm only.

talos · Jul 18, 2019

Little update, all zfs volumes for containers are unmounted. I mounted all by hand and i am able to start the container afterwards. One of the containers got an lock item and tells me it is mounted?

# pct list
VMID Status Lock Name
203 running mounted minio01
205 running minio02
211 running elastic02
212 running grafana

Cronus89 · Jul 19, 2019

Mine seem to be mounted?

talos · Jul 19, 2019

Cronus89 said:
Mine seem to be mounted?

Are they? can you enter the directory of the container? if not, you cant mount them with zfs mount <mountpoint>

Cronus89 · Jul 19, 2019

I don't know how to enter the dir of the container. I just saw them mounted in df -h

talos · Jul 19, 2019

Cronus89 said:
I don't know how to enter the dir of the container. I just saw them mounted in df -h

looks like you have a different problem than me :/

Cronus89 · Sep 23, 2019

This is still a bug and not fixed. i have an entire node offline since it cannot start any LXC Containers. Can someone look into this?

It seems to be cause I use zfs and its not mounted properly I think?

Stoiko Ivanov · Sep 23, 2019

Cronus89 said:
It seems to be cause I use zfs and its not mounted properly I think?

hmm - please check the status of zfs-import-cache.service and zfs-import-scan.service:

Code:

systemctl status -l zfs-import-cache.service
systemctl status -l zfs-import-scan.service

does:

Code:

zfs mount -a

work without error and can you start your containers afterwards?

Cronus89 · Sep 23, 2019

cache is running, stop is not, the mount did not succeed see below.

Code:

root@prox2:~# systemctl status -l zfs-import-cache.service
● zfs-import-cache.service - Import ZFS pools by cache file
   Loaded: loaded (/lib/systemd/system/zfs-import-cache.service; enabled; vendor prese
   Active: active (exited) since Thu 2019-09-19 14:37:15 CDT; 3 days ago
     Docs: man:zpool(8)
  Process: 1441 ExecStart=/sbin/zpool import -c /etc/zfs/zpool.cache -aN (code=exited,
 Main PID: 1441 (code=exited, status=0/SUCCESS)

Warning: Journal has been rotated since unit was started. Log output is incomplete or 

root@prox2:~# systemctl status -l zfs-import-scan.service 
● zfs-import-scan.service - Import ZFS pools by device scanning
   Loaded: loaded (/lib/systemd/system/zfs-import-scan.service; disabled; vendor prese
   Active: inactive (dead)
     Docs: man:zpool(8)
root@prox2:~# zfs mount -a
cannot mount '/rpool': directory is not empty
root@prox2:~# 
root@prox2:~# ls /rpool
data  ROOT
root@prox2:~#

Stoiko Ivanov · Sep 23, 2019

* check which files are in /rpool/data (`find /rpool/data`) - if it's only the containers root-dirs and 'dev/' directories inside - remove them (if there are other things inside - please post the output)

* else - set the cache-file property on both your pools and update the initramfs:

Code:

zpool set cachefile=/etc/zfs/zpool.cache rpool
zpool set cachefile=/etc/zfs/zpool.cache rootpool
update-initramfs -k all -u

afterwards reboot.

I hope this helps!

Cronus89 · Sep 23, 2019

it seems its all the containers files. Is this normal? if i rm -rf them what happens?

Code:

root@prox2:~# ls /rpool/data/
subvol-107-disk-1  subvol-109-disk-1  subvol-112-disk-0
subvol-108-disk-1  subvol-110-disk-1  subvol-112-disk-2
root@prox2:~#

Stoiko Ivanov · Sep 23, 2019

please run `find /rpool/data` - this shows you the complete tree - and we can see if it's just the directories (and optionally a dev dir inside) or if the datasets are actually mounted

I would also suggest to not `rm -rf` them but rather `mv` them out of the way

Cronus89 · Sep 23, 2019

Sorry Im following now!

It is not just directories, it is all the files also.

Example I see a ton of my .js and .php files from things installed in these containers

Also its a rather large tree, so cant get it all to you, So I put a small screenshot of part of the tree thats showing during the command.

https://dl.dropboxusercontent.com/s/cyq7zp9negp0dnm/totermw_Uty7M73MOg.png

Stoiko Ivanov · Sep 23, 2019

hm - which filesystem is mounted on e.g. /rpool/data/subvol-107-disk-1 ? (`df /rpool/data/subvol-107-disk-1`)
* `zfs get all rpool/data | grep mount'
* `zfs get all rpool/data/subvol-107-disk-1 | grep mount`
would also be interesting

Cronus89 · Sep 23, 2019

Code:

root@prox2:~# df /rpool/data/subvol-107-disk-1
Filesystem                   1K-blocks    Used Available Use% Mounted on
rpool/data/subvol-107-disk-1   5242880 2578176   2664704  50% /rpool/data/subvol-107-disk-1

root@prox2:~# zfs get all rpool/data | grep mount
rpool/data  mounted               no                     -
rpool/data  mountpoint            /rpool/data            default
rpool/data  canmount              on                     default

root@prox2:~# zfs get all rpool/data/subvol-107-disk-1 | grep mount
rpool/data/subvol-107-disk-1  mounted               yes                            -
rpool/data/subvol-107-disk-1  mountpoint            /rpool/data/subvol-107-disk-1  default
rpool/data/subvol-107-disk-1  canmount              on                             default

Stoiko Ivanov · Sep 24, 2019

hm - the datasets are properly mounted - seems like the issue might be somewhere else ... - take a look at the debug-logs when starting a container:
https://pve.proxmox.com/pve-docs/chapter-pct.html#_obtaining_debugging_logs

Cronus89 · Sep 24, 2019

Aha! I have a cluster and the 107 is just a replication from another node. lxc 105 is actually on this node. And it is not mounted! None of the ones on this node seem to be. Below is whats in /rpool/data/ and also what ID's are on this node

Code:

root@prox2:~# ls //rpool/data/
subvol-107-disk-1  subvol-108-disk-1  subvol-109-disk-1  subvol-110-disk-1  subvol-112-disk-0  subvol-112-disk-2

lxc
104
105

vm
101
103
111

Code:

lxc-start 105 20190924152644.412 DEBUG    conf - conf.c:run_buffer:326 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 105 lxc pre-start with output: cannot open directory
//rpool/data/subvol-105-disk-1: No such file or directory

Stoiko Ivanov · Sep 24, 2019

Ok - it seems that subvol-105-disk-1 is not mounted - `zfs get all rpool/data/subvol-105-disk-1 |grep -i mount ` should say so.
can you manually mount it? `zfs mount rpool/data/subvol-105-disk-1` and start the container afterwards?

Cronus89 · Sep 24, 2019

Code:

root@prox2:~# zfs get all rpool/data/subvol-105-disk-1 |grep -i mount
rpool/data/subvol-105-disk-1  mounted               no                             -
rpool/data/subvol-105-disk-1  mountpoint            /rpool/data/subvol-105-disk-1  default
rpool/data/subvol-105-disk-1  canmount              on                             default

it does start now after manually mounting.

What could be the reason its not auto mounting?

Upgrade from 5.x to 6.x LXC containers will not start

Well-Known Member

Renowned Member

Renowned Member

Well-Known Member

Renowned Member

Well-Known Member

Renowned Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

We value your privacy