chOzcA75vE0poCY0F6XC

Active Member
Oct 26, 2017
28
5
43
29
Hi all,

It's been a while since I last posted, but I'm having an issue again that I can't solve myself.
The LXC containers on my host do usually do not survive a reboot, but a restore from backup can revive them until the host is rebooted again.
I thought this was reproducable all the time, but after some finnicking around I suddenly had one half surviving a single reboot, but another reboot also killed them.
When the LXC are actually starting, you can reboot, shutdown, stop and start them again all day until the proxmox host itself gets a reboot. Then I have to restore from backup.
I don't know what triggered the fault initially, I manually update the host and reboot afterwards to make sure it comes up again.
I don't know what version I have been at before, but a apt update && apt dist-upgrade and reboot didn't fix it.
The "old" proxmox version on my machine was a few weeks, maybe a month old.
All LXCs are unprivileged.

Command pct start is not triggering error codes, the web ui shows that everything is fine.
QEMU machines are not having issues.

Information and logs are attached in a text file.

I'll go to bed now, I've been trying to figure this out for way too long now. I'll respond ASAP tomorrow.

Thanks in advance!
 

Attachments

  • LXC wont start on reboot.txt
    60.1 KB · Views: 4
Of course just when I wanted to add some more info about the forum posts I've found and stuff I did try out yet, I thought "mh, let's try out command 'zfs mount -O -a'" from this post
And that works, LXC 628 can start without a restore from backup.

The solution from Stoiko Ivanov however did not, but maybe that's because I'm too tired and likely missed something.
His solution:
for each pool run: zpool set cachefile=/etc/zfs/zpool.cache POOLNAME. then update initramfs: update-initramfs -u -k all. then reboot.
didn't work.
I changed zpool set cachefile=/etc/zfs/zpool.cache POOLNAME to zpool set cachefile=/etc/zfs/zpool.cache rpool, since that's the only pool on that machine.


I did not move the zpool.cache like nigi did, so I'll do that as a last try for this evening and let you guys know it that did the trick or not.
 
Okay so following commands did not work:
root@hypervisor1:~# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 232G 174G 57.7G - - 67% 75% 1.00x ONLINE -
root@hypervisor1:~# mv /etc/zfs/zpool.cache /etc/zfs/zpool.cache_31072020
root@hypervisor1:~# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 232G 174G 57.7G - - 67% 75% 1.00x ONLINE -
root@hypervisor1:~# zpool set cachefile=/etc/zfs/zpool.cache rpool
root@hypervisor1:~# update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-5.4.44-2-pve
Running hook script 'zz-pve-efiboot'..
Re-executing '/etc/kernel/postinst.d/zz-pve-efiboot' in new private mount namespace..
No /etc/kernel/pve-efiboot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-5.4.44-1-pve
Running hook script 'zz-pve-efiboot'..
Re-executing '/etc/kernel/postinst.d/zz-pve-efiboot' in new private mount namespace..
No /etc/kernel/pve-efiboot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-5.4.41-1-pve
Running hook script 'zz-pve-efiboot'..
Re-executing '/etc/kernel/postinst.d/zz-pve-efiboot' in new private mount namespace..
No /etc/kernel/pve-efiboot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-5.3.18-3-pve
Running hook script 'zz-pve-efiboot'..
Re-executing '/etc/kernel/postinst.d/zz-pve-efiboot' in new private mount namespace..
No /etc/kernel/pve-efiboot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-5.0.21-5-pve
Running hook script 'zz-pve-efiboot'..
Re-executing '/etc/kernel/postinst.d/zz-pve-efiboot' in new private mount namespace..
No /etc/kernel/pve-efiboot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-4.15.18-19-pve
Running hook script 'zz-pve-efiboot'..
Re-executing '/etc/kernel/postinst.d/zz-pve-efiboot' in new private mount namespace..
No /etc/kernel/pve-efiboot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-4.15.17-1-pve
Running hook script 'zz-pve-efiboot'..
Re-executing '/etc/kernel/postinst.d/zz-pve-efiboot' in new private mount namespace..
No /etc/kernel/pve-efiboot-uuids found, skipping ESP sync.
root@hypervisor1:~#

But manually running this command "zfs mount -O -a" allowed me to start all LXCs again.


Is it some kind of race condition?
Because at one point, half of the LXCs did start up after a reboot, and then the rest couldnt start.
 
I stumbled just now on this post where Psilospiral mentions following: "After some more forum probing, I learned the ZFS setting 'overlay' will allow for mounting of a pool that is not empty.".
The default setting is off, and can be checked with following command:

root@hypervisor1:~# zfs get overlay rpool NAME PROPERTY VALUE SOURCE rpool overlay off local

I've enabled it on my pool with command "zfs set overlay=on rpool" and sure enough, all LXCs come back up online after a reboot!

Now I am curious, what caused this issue?
I'm not sure if I can figure that out, but I hope that I can help others with the different steps and comments when they're googling themselves for the same issue.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!