Some LXC do not start automatically after a reboot

stubbo66

New Member
Apr 2, 2024
24
1
3
I've been trying to diagnose the reason why some of my VMs don't start after a reboot, and typically they are my more critical ones with priority 1 set against them. I have to go into the web interface and start them manually.

I've been looking through the logs from my latest boot and found these entries....

Code:
Dec 17 15:41:22 proxmox1 pve-guests[1748]: <root@pam> starting task UPID:proxmox1:000006DC:00000A40:6942CF22:startall::root@pam:
Dec 17 15:41:22 proxmox1 pvesh[1748]: Starting CT 100
Dec 17 15:41:22 proxmox1 pve-guests[1756]: <root@pam> starting task UPID:proxmox1:000006DD:00000A42:6942CF22:vzstart:100:root@pam:
Dec 17 15:41:22 proxmox1 pve-guests[1757]: starting CT 100: UPID:proxmox1:000006DD:00000A42:6942CF22:vzstart:100:root@pam:
Dec 17 15:41:22 proxmox1 pvesh[1748]: Starting CT 101
Dec 17 15:41:22 proxmox1 pve-guests[1756]: <root@pam> starting task UPID:proxmox1:000006DF:00000A43:6942CF22:vzstart:101:root@pam:
Dec 17 15:41:22 proxmox1 pve-guests[1759]: starting CT 101: UPID:proxmox1:000006DF:00000A43:6942CF22:vzstart:101:root@pam:
Dec 17 15:41:22 proxmox1 pvesh[1748]: Starting CT 105
Dec 17 15:41:22 proxmox1 pve-guests[1756]: <root@pam> starting task UPID:proxmox1:000006E0:00000A44:6942CF22:vzstart:105:root@pam:
Dec 17 15:41:22 proxmox1 pve-guests[1760]: starting CT 105: UPID:proxmox1:000006E0:00000A44:6942CF22:vzstart:105:root@pam:
Dec 17 15:41:22 proxmox1 pvesh[1748]: Starting CT 108
Dec 17 15:41:22 proxmox1 pve-guests[1756]: <root@pam> starting task UPID:proxmox1:000006E3:00000A45:6942CF22:vzstart:108:root@pam:
Dec 17 15:41:22 proxmox1 pve-guests[1763]: starting CT 108: UPID:proxmox1:000006E3:00000A45:6942CF22:vzstart:108:root@pam:
Dec 17 15:41:22 proxmox1 iptag[1414]: Setting 102 tags from 10.0.0.120 172.17.0.1 debian docker trixie to debian docker trixie
Dec 17 15:41:22 proxmox1 pve-guests[1763]: could not activate storage 'zfsRaid', zfs error: cannot import 'zfsRaid': no such pool available
Dec 17 15:41:22 proxmox1 pve-guests[1760]: could not activate storage 'zfsRaid', zfs error: cannot import 'zfsRaid': no such pool available
Dec 17 15:41:22 proxmox1 pve-guests[1757]: could not activate storage 'zfsRaid', zfs error: cannot import 'zfsRaid': no such pool available
Dec 17 15:41:22 proxmox1 zed[2275]: eid=7 class=config_sync pool='zfsRaid'
Dec 17 15:41:22 proxmox1 zed[2276]: eid=8 class=pool_import pool='zfsRaid'
Dec 17 15:41:22 proxmox1 zed[2299]: eid=10 class=config_sync pool='zfsRaid'
Dec 17 15:41:22 proxmox1 zed[2373]: vdev nvme-Samsung_SSD_980_PRO_2TB_S6B0NG0R414567Y_1 '' doesn't exist
Dec 17 15:41:22 proxmox1 zed[2418]: vdev nvme-Samsung_SSD_970_EVO_Plus_2TB_S4J4NX0R805130X_1 '' doesn't exist
Dec 17 15:41:22 proxmox1 zed[2466]: vdev nvme-Samsung_SSD_980_PRO_2TB_S6B0NG0R810666H_1 '' doesn't exist
Dec 17 15:41:22 proxmox1 kernel:  zd16: p1 p2 p3
Dec 17 15:41:22 proxmox1 systemd[1]: Created slice system-pve\x2dcontainer.slice - PVE LXC Container Slice.
Dec 17 15:41:22 proxmox1 systemd[1]: Started pve-container@101.service - PVE LXC Container: 101.
Dec 17 15:41:23 proxmox1 pvesh[1748]: Starting CT 100 failed: could not activate storage 'zfsRaid', zfs error: cannot import 'zfsRaid': no such pool available
Dec 17 15:41:23 proxmox1 pvesh[1748]: Starting CT 105 failed: could not activate storage 'zfsRaid', zfs error: cannot import 'zfsRaid': no such pool available
Dec 17 15:41:23 proxmox1 pvesh[1748]: Starting CT 108 failed: could not activate storage 'zfsRaid', zfs error: cannot import 'zfsRaid': no such pool available

So it's telling my my zfsRaid isn't available, but the log continues and 3 seconds later my next list of LXC start and I get no errors.

Oddly, CT 101 which started at the same time as 100, 105 and 108 did start, and it is on zfsRaid as are all of my volumes and VMs.

Code:
Dec 17 15:41:23 proxmox1 kernel: kauditd_printk_skb: 115 callbacks suppressed
Dec 17 15:41:23 proxmox1 kernel: audit: type=1400 audit(1765986083.627:127): apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-101_</var/lib/lxc>" pid=2590 comm="apparmor_parser"
Dec 17 15:41:23 proxmox1 iptag[1414]: Setting 103 tags from 10.0.0.161 debian native trixie to debian native trixie
Dec 17 15:41:24 proxmox1 kernel: vmbr0: port 2(fwpr101p0) entered blocking state
Dec 17 15:41:24 proxmox1 kernel: vmbr0: port 2(fwpr101p0) entered disabled state
Dec 17 15:41:24 proxmox1 kernel: fwpr101p0: entered allmulticast mode
Dec 17 15:41:24 proxmox1 kernel: fwpr101p0: entered promiscuous mode
Dec 17 15:41:24 proxmox1 kernel: vmbr0: port 2(fwpr101p0) entered blocking state
Dec 17 15:41:24 proxmox1 kernel: vmbr0: port 2(fwpr101p0) entered forwarding state
Dec 17 15:41:24 proxmox1 kernel: fwbr101i0: port 1(fwln101i0) entered blocking state
Dec 17 15:41:24 proxmox1 kernel: fwbr101i0: port 1(fwln101i0) entered disabled state
Dec 17 15:41:24 proxmox1 kernel: fwln101i0: entered allmulticast mode
Dec 17 15:41:24 proxmox1 kernel: fwln101i0: entered promiscuous mode
Dec 17 15:41:24 proxmox1 kernel: fwbr101i0: port 1(fwln101i0) entered blocking state
Dec 17 15:41:24 proxmox1 kernel: fwbr101i0: port 1(fwln101i0) entered forwarding state
Dec 17 15:41:24 proxmox1 kernel: fwbr101i0: port 2(veth101i0) entered blocking state
Dec 17 15:41:24 proxmox1 kernel: fwbr101i0: port 2(veth101i0) entered disabled state
Dec 17 15:41:24 proxmox1 kernel: veth101i0: entered allmulticast mode
Dec 17 15:41:24 proxmox1 kernel: veth101i0: entered promiscuous mode
Dec 17 15:41:24 proxmox1 kernel: eth0: renamed from vethvYpWJB
Dec 17 15:41:24 proxmox1 kernel: fwbr101i0: port 2(veth101i0) entered blocking state
Dec 17 15:41:24 proxmox1 kernel: fwbr101i0: port 2(veth101i0) entered forwarding state
Dec 17 15:41:24 proxmox1 kernel: cfg80211: Loading compiled-in X.509 certificates for regulatory database
Dec 17 15:41:24 proxmox1 kernel: Loaded X.509 cert 'benh@debian.org: 577e021cb980e0e820821ba7b54b4961b8b4fadf'
Dec 17 15:41:24 proxmox1 kernel: Loaded X.509 cert 'romain.perier@gmail.com: 3abbc6ec146e09d1b6016ab9d6cf71dd233f0328'
Dec 17 15:41:24 proxmox1 kernel: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
Dec 17 15:41:24 proxmox1 kernel: Loaded X.509 cert 'wens: 61c038651aabdcf94bd0ac7ff06c7248db18c600'
Dec 17 15:41:24 proxmox1 kernel: faux_driver regulatory: Direct firmware load for regulatory.db failed with error -2
Dec 17 15:41:24 proxmox1 kernel: cfg80211: failed to load regulatory.db
Dec 17 15:41:25 proxmox1 iptag[1414]: Setting 104 tags from 10.0.0.160 172.17.0.1 debian docker trixie to debian docker trixie
Dec 17 15:41:26 proxmox1 pvesh[1748]: Starting CT 107
Dec 17 15:41:26 proxmox1 pve-guests[1756]: <root@pam> starting task UPID:proxmox1:00000BD4:00000BD6:6942CF26:vzstart:107:root@pam:
Dec 17 15:41:26 proxmox1 pve-guests[3028]: starting CT 107: UPID:proxmox1:00000BD4:00000BD6:6942CF26:vzstart:107:root@pam:
Dec 17 15:41:26 proxmox1 pvesh[1748]: Starting CT 114
Dec 17 15:41:26 proxmox1 pve-guests[1756]: <root@pam> starting task UPID:proxmox1:00000BD6:00000BD7:6942CF26:vzstart:114:root@pam:
Dec 17 15:41:26 proxmox1 pve-guests[3030]: starting CT 114: UPID:proxmox1:00000BD6:00000BD7:6942CF26:vzstart:114:root@pam:
Dec 17 15:41:26 proxmox1 systemd[1]: Started pve-container@107.service - PVE LXC Container: 107.
Dec 17 15:41:26 proxmox1 systemd[1]: Started pve-container@114.service - PVE LXC Container: 114.

As I said, all of my VM's (LXC) are on the zfsRaid volume, so why isn't the file system ready before the VM's are started, and what can I do about it? Nothing jumps out at me in the logs as being an issue.

I've tried changing the node to set a start on boot delay of 5 seconds, but that made no difference, I still got the same errors. It isn't always 3 of the 4 VMs that are priority 1 on startup, it can be one of them or 3 of them.

Appreciate any guidance on what to look at or change to resolve this please.
 
Could you try giving the pool some more time pre- and post init? Sounds to me like a ZFS-Timing issue (see here):

Code:
nano /etc/default/zfs

ZFS_INITRD_PRE_MOUNTROOT_SLEEP='5'
ZFS_INITRD_POST_MODPROBE_SLEEP='5'

Code:
update-initramfs -u
 
Last edited:
Could you try giving the pool some more time pre- and post init? Sounds to me like a ZFS-Timing issue (see here):

Code:
nano /etc/default/zfs

ZFS_INITRD_PRE_MOUNTROOT_SLEEP='5'
ZFS_INITRD_POST_MODPROBE_SLEEP='5'

Code:
update-initramfs -u

I'll give it a try, but it's been running fine for the last 6 months, it's only in the last 3-4 weeks I'd say I noticed this issue. It's not often I reboot the server unless there are any software updates that require it, so hard to be precise.

I just don't get why all the other LXCs come up without an issue. So I might try a really unimportant LXC first, one that has few dependencies on other services and then lower the priority of the others, see if that does anything as well.