Update broke LXC

OK, thanks. I've added a few logfiles.
  1. dmesg
  2. var/log/lxc/104.log
  3. zpool history rpool
  4. lxc-start -n 104 -F -l DEBUG -o ~/lxc-104.log
/var/log/messages doesn't show more than dmesg. Is it helpful?
 

Attachments

  • lxc-104.log
    6 KB · Views: 4
  • zpool history rpool.txt
    106 KB · Views: 3
  • var_log_lxc_104.log
    9.5 KB · Views: 2
  • dmesg.txt
    170.4 KB · Views: 4
Do you need some more logs? Or could you find some helpful information in this files?
I am at my wit's end... :-(
 
From the debug log of pct start 104:
Code:
lxc pre-start produced output: unable to detect OS distribution
this could indicate that it's a problem with importing the ZFS filesystems

as said before: check your journal since last boot for messages from ZFS/Zpool:
Code:
journalctl -b
(there you can search (case-sensitive) by pressing '/')

I hope this helps!
 
OK, I've got an example extracted from log:

Code:
Mär 14 09:47:38 vhost pvesh[3236]: Starting CT 108
Mär 14 09:47:38 vhost pve-guests[9043]: starting CT 108: UPID:vhost:00002353:00001CE4:5E6C9A2A:vzstart:108:root@pam:
Mär 14 09:47:38 vhost pve-guests[3237]: <root@pam> starting task UPID:vhost:00002353:00001CE4:5E6C9A2A:vzstart:108:root@pam:
Mär 14 09:47:38 vhost systemd[1]: Starting PVE LXC Container: 108...
Mär 14 09:47:39 vhost lxc-start[9048]: lxc-start: 108: lxccontainer.c: wait_on_daemonized_start: 865 No such file or directory - Failed to receive the container state
Mär 14 09:47:39 vhost lxc-start[9048]: lxc-start: 108: tools/lxc_start.c: main: 329 The container failed to start
Mär 14 09:47:39 vhost lxc-start[9048]: lxc-start: 108: tools/lxc_start.c: main: 332 To get more details, run the container in foreground mode
Mär 14 09:47:39 vhost lxc-start[9048]: lxc-start: 108: tools/lxc_start.c: main: 335 Additional information can be obtained by setting the --logfile and --logpriority options
Mär 14 09:47:39 vhost systemd[1]: pve-container@108.service: Control process exited, code=exited, status=1/FAILURE
Mär 14 09:47:39 vhost systemd[1]: pve-container@108.service: Failed with result 'exit-code'.
Mär 14 09:47:39 vhost systemd[1]: Failed to start PVE LXC Container: 108.
Mär 14 09:47:39 vhost pve-guests[9043]: command 'systemctl start pve-container@108' failed: exit code 1
Mär 14 09:47:39 vhost kernel: lxc-start[9053]: segfault at 50 ip 00007fb900c70f8b sp 00007ffc22bfc550 error 4 in liblxc.so.1.6.0[7fb900c17000+8a000]
Mär 14 09:47:39 vhost kernel: Code: 9b c0 ff ff 4d 85 ff 0f 85 82 02 00 00 66 90 48 8b 73 50 48 8b bb f8 00 00 00 e8 80 78 fa ff 4c 8b 74 24 10 48 89 de 4c 89 f7 <41> ff 56 50 4c 89 f7 48 89 de 41 ff 56 58 48 8b 83 f8 00 00 00 8b
Mär 14 09:47:39 vhost pvestatd[3103]: unable to get PID for CT 108 (not running?)
 
as said above - please check the journal from the complete boot for messages from _ZFS_ - the lxc-log is what points to a potential problem with the import of your zpool and the mounting of the datasets
 
OK, you've said "complete"? Maybe I've been unteachable :( Sorry.
I've checked the log again - this time from the beginning and line by line. And there was an entry that the root mount point itself is not empty. I don't know, why, but I'll find ich out. Perhaps it's a result of an unclean shutdown.
After asking google for help, the solution is pretty simple:
Code:
zfs set overlay=on rpool

Thank you very much for your help!!!
nigi
 
hmm - glad you found the workaround! - however the issue you're having is most likely due to a corrupt cache file (which is quite easily fixed as described by @oguz in a thread on the pve-devel mailing-list (https://pve.proxmox.com/pipermail/pve-devel/2020-March/042054.html)

maybe try to follow the steps described there

OK, you've said "complete"? Maybe I've been unteachable :( Sorry.
Sorry - I did not make myself clear - I basically wanted the output of:
`journalctl -b | grep -Ei 'zfs|zpool'`

but in any case - glad your issue is mitigated!
 
I've been seeing similar for about the past 2weeks.

I appears that there might either be a race condition, or something not mounting *before* the LXC's root directory and /dev gets created.
I didn't do the overlay mount, II found that `rm -rf ${LXC_root_dir}` and then a `pct start ${LXD_id}` works

I've found this to be problematic especially when the LXC is set to not be started at boot, and also specifically when the host got a reset/power-failure (as happened today).... actually this happened on two different PVEs
 
as written in this thread - please check the logs of the last boot `journalctl -b` for messages from zfs pool import and zfs mount - that might indicate where the issue is at - also make sure you have a cachefile set and included in your initramfs

I hope this helps!
 
as written in this thread - please check the logs of the last boot `journalctl -b` for messages from zfs pool import and zfs mount - that might indicate where the issue is at - also make sure you have a cachefile set and included in your initramfs

I hope this helps!
YEs, it seems pvestatd is waaayyyy too quick out of the blocks, before the ZFS stuff got mounted:

ie.:
Code:
root@blacktest:~# journalctl -b|grep -i zfs
Mar 23 14:12:25 blacktest kernel: Command line: initrd=\EFI\proxmox\5.3.18-2-pve\initrd.img-5.3.18-2-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs
Mar 23 14:12:25 blacktest kernel: Kernel command line: initrd=\EFI\proxmox\5.3.18-2-pve\initrd.img-5.3.18-2-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs
Mar 23 14:12:25 blacktest kernel: ZFS: Loaded module v0.8.3-pve1, ZFS pool version 5000, ZFS filesystem version 5
Mar 23 14:12:26 blacktest systemd[1]: Starting Import ZFS pools by cache file...
Mar 23 14:12:26 blacktest systemd[1]: Started Import ZFS pools by cache file.
Mar 23 14:12:26 blacktest systemd[1]: Reached target ZFS pool import target.
Mar 23 14:12:26 blacktest systemd[1]: Starting Mount ZFS filesystems...
Mar 23 14:12:26 blacktest systemd[1]: Starting Wait for ZFS Volume (zvol) links in /dev...
Mar 23 14:12:26 blacktest systemd[1]: Started Wait for ZFS Volume (zvol) links in /dev.
Mar 23 14:12:26 blacktest systemd[1]: Reached target ZFS volumes are ready.
Mar 23 14:12:26 blacktest systemd[1]: Started Mount ZFS filesystems.
Mar 23 14:12:27 blacktest systemd[1]: Started ZFS Event Daemon (zed).
Mar 23 14:12:27 blacktest systemd[1]: Starting ZFS file system shares...
Mar 23 14:12:27 blacktest systemd[1]: Started ZFS file system shares.
Mar 23 14:12:27 blacktest systemd[1]: Reached target ZFS startup target.
Mar 23 14:12:27 blacktest zed[3478]: ZFS Event Daemon 0.8.3-pve1 (PID 3478)
Mar 23 14:12:42 blacktest pvestatd[4666]: zfs error: cannot open 'hvHdd01': no such pool
Mar 23 14:12:46 blacktest pvestatd[4666]: zfs error: cannot open 'zNVME02': no such pool
Code:
root@blacktest:~# zpool list
NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
hSSD01    199G  15.4G   184G        -         -     0%     7%  1.00x    ONLINE  -
hvHdd01  5.45T  65.8G  5.39T        -         -     0%     1%  1.00x    ONLINE  -
rpool      14G  2.68G  11.3G        -         -     9%    19%  1.00x    ONLINE  -
zNVME01   952G  44.6G   907G        -         -     3%     4%  1.00x    ONLINE  -
zNVME02    83G  4.88G  78.1G        -         -     0%     5%  1.00x    ONLINE  -
root@blacktest:~#
 
on a hunch - could you try to recreate the cachefile for all your pools and update the intramfs - i.e. for each pool run:
Code:
zpool set cachefile=/etc/zfs/zpool.cache $poolname
and finally:
Code:
update-initramfs -k all -u

(you can verify that all are set by running: `strings /etc/zfs/zpool.cache` )

and finally reboot?
 
on a hunch - could you try to recreate the cachefile for all your pools and update the intramfs - i.e. for each pool run:
Code:
zpool set cachefile=/etc/zfs/zpool.cache $poolname
and finally:
Code:
update-initramfs -k all -u

(you can verify that all are set by running: `strings /etc/zfs/zpool.cache` )

and finally reboot?
I have the same issue with lxc that cant start after updating pve.

The proposed fix above doesnt work
I still have to rm -r alla mount points and /dev in the subvol , then remount zfs and finally start the lxc again.

Code:
~ journalctl -b|grep -i zfs
Mar 23 21:48:56 pve kernel: Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-5.3.18-2-pve root=ZFS=rpool/ROOT/pve-1 ro fbcon=rotate:3 rootdelay=15 quiet
Mar 23 21:48:56 pve kernel: Kernel command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-5.3.18-2-pve root=ZFS=rpool/ROOT/pve-1 ro fbcon=rotate:3 rootdelay=15 quiet
Mar 23 21:48:56 pve kernel: ZFS: Loaded module v0.8.3-pve1, ZFS pool version 5000, ZFS filesystem version 5
Mar 23 21:48:57 pve systemd[1]: Starting Import ZFS pools by cache file...
Mar 23 21:49:01 pve systemd[1]: Started Import ZFS pools by cache file.
Mar 23 21:49:01 pve systemd[1]: Reached target ZFS pool import target.
Mar 23 21:49:01 pve systemd[1]: Starting Wait for ZFS Volume (zvol) links in /dev...
Mar 23 21:49:01 pve systemd[1]: Starting Mount ZFS filesystems...
Mar 23 21:49:01 pve zfs[6106]: cannot mount '/tank': directory is not empty
Mar 23 21:49:01 pve zfs[6106]: cannot mount '/rpool':
Mar 23 21:49:01 pve zfs[6106]: cannot mount '/rpool': mount failed
Mar 23 21:49:01 pve systemd[1]: zfs-mount.service: Main process exited, code=killed, status=11/SEGV
Mar 23 21:49:01 pve kernel: zfs[6298]: segfault at 0 ip 00007fb1d79ee694 sp 00007fb1c77f6420 error 4 in libc-2.28.so[7fb1d7994000+148000]
Mar 23 21:49:01 pve systemd[1]: Started Wait for ZFS Volume (zvol) links in /dev.
Mar 23 21:49:01 pve systemd[1]: Reached target ZFS volumes are ready.
Mar 23 21:49:02 pve systemd[1]: zfs-mount.service: Failed with result 'signal'.
Mar 23 21:49:02 pve systemd[1]: Failed to start Mount ZFS filesystems.
Mar 23 21:49:02 pve systemd[1]: Started ZFS Event Daemon (zed).
Mar 23 21:49:02 pve systemd[1]: Starting ZFS file system shares...
Mar 23 21:49:02 pve zed[6494]: ZFS Event Daemon 0.8.3-pve1 (PID 6494)
Mar 23 21:49:02 pve containerd[6682]: time="2020-03-23T21:49:02.776986180+01:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1
Mar 23 21:49:03 pve dockerd[6696]: time="2020-03-23T21:49:03.215345183+01:00" level=info msg="[graphdriver] using prior storage driver: zfs"
Mar 23 21:49:03 pve dockerd[6696]: time="2020-03-23T21:49:03.548879313+01:00" level=info msg="Docker daemon" commit=afacb8b7f0 graphdriver(s)=zfs version=19.03.8
Mar 23 21:49:17 pve systemd[1]: Started ZFS file system shares.
Mar 23 21:49:17 pve systemd[1]: Reached target ZFS startup target.
 
Last edited:
The proposed fix above doesnt work
please post the output of:
Code:
strings /etc/zfs/zpool.cache
zpool status

Thanks

Mar 23 21:49:02 pve containerd[6682]: time="2020-03-23T21:49:02.776986180+01:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1 Mar 23 21:49:03 pve dockerd[6696]: time="2020-03-23T21:49:03.215345183+01:00" level=info msg="[graphdriver] using prior storage driver: zfs" Mar 23 21:49:03 pve dockerd[6696]: time="2020-03-23T21:49:03.548879313+01:00" level=info msg="Docker daemon" commit=afacb8b7f0 graphdriver(s)=zfs version=19.03.8
on a sidenode - installing docker on a PVE host ist not really supported/well-tested
 
Hey guys,
I had the same issue, that all my containers would not start anymore, sometimes they threw errors, sometimes they just said starting ok but did not start, but what worked for me were the following steps (involves a bit of a downtime)(Thank you to tonci, you really helped me out!):

Let's assume our zpool is called "data_redundant".

  1. Unmount all datasets of the affected zpool (Yes, every dataset of one zpool!).
    zfs unmount data_redundant/YOUR_SUBVOL
  2. If you are sure that everything is unmounted, delete the mouted folder on root-level (e.g. if your dataset is called data_redundant, delete the folder /data_redundant
    rm -rf /data_redundant
  3. Then you can restart the zfs-mount.service. If that starts successful, you can clap you hands!
  4. As oguz suggested, I set the cachefile new and updated the initramfs.
    zpool set cachefile=/etc/zfs/zpool.cache data_redundant
    update-initramfs -k all -u
  5. After a reboot, everything worked for me again, the container started without a problem. Just deleting the dev-folders in the mounted datastores did fix the issue only temporarily and after the next reboot, everything was back to beginning.
Hope this works for everybody, if not I would be happy to hear about your experiences!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!