[SOLVED] LXC won't boot after PVE reboot.

Jarvar · Jan 29, 2024

I get an error when trying to start my lxc container
I bypassed it by creating a new lxc but using the local-zfs as rootfs instead of storage01
This is the error I get.


run_buffer: 322 Script exited with status 2
lxc_init: 844 Failed to run lxc.hook.pre-start for container "205"
__lxc_start: 2027 Failed to initialize container "205"
TASK ERROR: startup for container '205' failed

I have PVE version


pveversion
pve-manager/8.1.4/ec5affc9e41f1d79 (running kernel: 6.5.11-7-pve)

Some context. I installed PE on 2 x 256 SSDs as a mirror local and local-zfs.
Then I added two larger drives for storage 2 x7.68 TB called storage01 using zfs.
I had previously used an external USB HDD drive with ZFS as a datastore for a PBS container.
I was getting some I/O errors so took me a while to remove the drive.
However, after rebooting PVE I was unable to open my original container 205.
It was installed with rootfs on storage01
Now when I try to create a new lxc and use storage01 as the location it fails. it allows me to do this on local-zfs but not storage01.
Some issue is blocking the use of storage01 zfs pool or allowing access to it to boot the lxc containers or use it as the storage location

However I have a Windows Server VM which is running on the same storage01 zfs pool.
Any thoughts and help please?

Chris · Jan 29, 2024

Hi,
please share your storage configuration cat /etc/pve/storage.cfg as well as the output of zpool status. Further, please provide the debug output provided when starting the lxc you get via pct start 205 --debug.

Also check the systemd units related to importing the zfs pools, systemctl status 'zfs-import*'

Jarvar · Jan 29, 2024

Chris said:
Hi,
please share your storage configuration cat /etc/pve/storage.cfg as well as the output of zpool status. Further, please provide the debug output provided when starting the lxc you get via pct start 205 --debug.

Also check the systemd units related to importing the zfs pools, systemctl status 'zfs-import*'


arch: amd64
cores: 2
features: nesting=1
hostname: deb-pbs-005
memory: 2048
net0: name=eth0,bridge=vmbr0,firewall=1,gw=10.4.0.1,hwaddr=BC:24:11:D4:6E:F5,ip=10.4.0.235/24,type=veth
ostype: debian
parent: snap1
rootfs: storage01:subvol-205-disk-0,size=8G
swap: 2048
unprivileged: 1
lxc.idmap: u 0 100000 34
lxc.idmap: g 0 100000 34
lxc.idmap: u 34 34 1
lxc.idmap: g 34 34 1
lxc.idmap: u 35 100035 65501
lxc.idmap: g 35 100035 65501

[snap1]
arch: amd64
cores: 2
features: nesting=1
hostname: deb-pbs-005
memory: 2048
net0: name=eth0,bridge=vmbr0,firewall=1,gw=10.4.0.1,hwaddr=BC:24:11:D4:6E:F5,ip=10.4.0>
ostype: debian
rootfs: storage01:subvol-205-disk-0,size=8G
snaptime: 1701503682
swap: 2048
unprivileged: 1
lxc.mount.entry: /storage01/dataset001 mnt/storage01/dataset001 none rbind.create=dir
lxc.idmap: u 0 100000 34
lxc.idmap: g 0 100000 34
lxc.idmap: u 34 34 1
lxc.idmap: g 34 34 1
lxc.idmap: u 35 100035 65501
lxc.idmap: g 35 100035 65501


 zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:10 with 0 errors on Sun Jan 14 00:24:11 2024
config:

        NAME                                                  STATE     READ WRITE CKSUM
        rpool                                                 ONLINE       0     0     0
          mirror-0                                            ONLINE       0     0     0
            ata-INTEL_SSDSC2KG240G8_PHYG946501XS240AGN-part3  ONLINE       0     0     0
            ata-INTEL_SSDSC2KG240G8_PHYG946403HM240AGN-part3  ONLINE       0     0     0

errors: No known data errors

  pool: storage01
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub in progress since Mon Jan 29 06:57:46 2024
        1.92T / 1.92T scanned, 1008G / 1.92T issued at 380M/s
        0B repaired, 51.19% done, 00:43:11 to go
config:

        NAME                                             STATE     READ WRITE CKSUM
        storage01                                        ONLINE       0     0     0
          mirror-0                                       ONLINE       0     0     0
            ata-KINGSTON_SEDC600M7680G_50026B7686915432  ONLINE       0     0     0
            ata-KINGSTON_SEDC600M7680G_50026B76869121CA  ONLINE       0     0     0

errors: No known data errors


 pct start 205 --debug
run_buffer: 322 Script exited with status 2
lxc_init: 844 Failed to run lxc.hook.pre-start for container "205"
__lxc_start: 2027 Failed to initialize container "205"
0 hostid 100000 range 34
INFO     confile - ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type u nsid 34 hostid 34 range 1
INFO     confile - ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type g nsid 34 hostid 34 range 1
INFO     confile - ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type u nsid 35 hostid 100035 range 65501
INFO     confile - ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type g nsid 35 hostid 100035 range 65501
INFO     lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver AppArmor
INFO     conf - ../src/lxc/conf.c:run_script_argv:338 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "205", config section "lxc"
DEBUG    conf - ../src/lxc/conf.c:run_buffer:311 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 205 lxc pre-start produced output: cannot open directory //storage: No such file or directory

ERROR    conf - ../src/lxc/conf.c:run_buffer:322 - Script exited with status 2
ERROR    start - ../src/lxc/start.c:lxc_init:844 - Failed to run lxc.hook.pre-start for container "205"
ERROR    start - ../src/lxc/start.c:__lxc_start:2027 - Failed to initialize container "205"
INFO     conf - ../src/lxc/conf.c:run_script_argv:338 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "205", config section "lxc"
startup for container '205' failed

and


systemctl status 'zfs-import'
○ zfs-import.service
     Loaded: masked (Reason: Unit zfs-import.service is masked.)
     Active: inactive (dead)

Chris · Jan 29, 2024

Jarvar said:
DEBUG conf - ../src/lxc/conf.c:run_buffer:311 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 205 lxc pre-start produced output: cannot open directory //storage: No such file or directory

The pre-start hook failes because it failes to open the directory storage, can you relate that directory to some setup step you did? Did you have a mountpoint or bind-mount related to that? Can you mount via pct mount 205?

Also, please also post the full config as I see there are snapshots present, you can get it via cat /etc/pve/lxc/205.conf.

Jarvar · Jan 29, 2024

Chris said:
The pre-start hook failes because it failes to open the directory storage, can you relate that directory to some setup step you did? Did you have a mountpoint or bind-mount related to that? Can you mount via pct mount 205?

Also, please also post the full config as I see there are snapshots present, you can get it via cat /etc/pve/lxc/205.conf.

Thank you @Chris I added the snapshot to the config output above.
Yes there's a a mount

lxc.mount.entry: /storage01/dataset001 mnt/storage01/dataset001 none rbind.create=dir

which was connected before, this was part of a dataset on the zpool storage01 which was working before reboot.
I wasn't able to boot the container after reboot but was able to connect that bind-mount to a new container I build using the local-zfs instead of storage01.
I didn't make any changes. I did remove a previous pool that became faulted and unavailable.

this is what pct mount 205 shows


pct mount 205
mounting container failed
cannot open directory //storage: No such file or directory

After doing this, I also noticed that the container becomes locked.
If I hit pct list
it shows as mounted under Lock even though the status is stopped.
Weird why is it asking for //storage? it should be storage01 shouldn't it?
Somehow the containers aren't letting me make the rootfs on the storage01 zpool....

I've tried this on another system and it does work, with storage01 and storage02 even storage zpools all on the same system.
Thank you very much for your assistance.

Chris · Jan 29, 2024

Jarvar said:
After doing this, I also noticed that the container becomes locked.

Yes, the container will be locked if you mounted the rootfs via the pct mount command, this guarantees mutual exclusive access. Once you unmount via pct unmount 205 the container will be unlocked again.

Jarvar said:
Weird why is it asking for //storage? it should be storage01 shouldn't it?

That is why I was asking if this path is somehow related to your setup, as it causes the issue when mounting the filesystem.

Jarvar said:
Somehow the containers aren't letting me make the rootfs on the storage01 zpool....

You never posted your storage config, are you sure to have enabled container images as content type for that storage. Also, how did you move the current rootfs to this storage to begin with?

Jarvar · Jan 29, 2024

Chris said:
Yes, the container will be locked if you mounted the rootfs via the pct mount command, this guarantees mutual exclusive access. Once you unmount via pct unmount 205 the container will be unlocked again.

That is why I was asking if this path is somehow related to your setup, as it causes the issue when mounting the filesystem.

You never posted your storage config, are you sure to have enabled container images as content type for that storage. Also, how did you move the current rootfs to this storage to begin with?


cat /etc/pve/lxc/205.conf 
arch: amd64
cores: 2
features: nesting=1
hostname: deb-pbs-005
lock: mounted
memory: 2048
net0: name=eth0,bridge=vmbr0,firewall=1,gw=10.4.0.1,hwaddr=BC:24:11:D4:6E:F5,ip=10.4.0.235/24,type=veth
ostype: debian
parent: snap1
rootfs: storage01:subvol-205-disk-0,size=8G
swap: 2048
unprivileged: 1
lxc.idmap: u 0 100000 34
lxc.idmap: g 0 100000 34
lxc.idmap: u 34 34 1
lxc.idmap: g 34 34 1
lxc.idmap: u 35 100035 65501
lxc.idmap: g 35 100035 65501

[snap1]
arch: amd64
cores: 2
features: nesting=1
hostname: deb-pbs-005
memory: 2048
net0: name=eth0,bridge=vmbr0,firewall=1,gw=10.4.0.1,hwaddr=BC:24:11:D4:6E:F5,ip=10.4.0.234/24,type=veth
ostype: debian
rootfs: storage01:subvol-205-disk-0,size=8G
snaptime: 1701503682
swap: 2048
unprivileged: 1
lxc.mount.entry: /storage01/dataset001 mnt/storage01/dataset001 none rbind.create=dir
lxc.idmap: u 0 100000 34
lxc.idmap: g 0 100000 34
lxc.idmap: u 34 34 1
lxc.idmap: g 34 34 1
lxc.idmap: u 35 100035 65501
lxc.idmap: g 35 100035 65501

That is my entire config file for pct 205

As for the enabled content I have Disk Image, Container the same as local-zfs which is working.
I never moved the current rootfs to this storage, it was orignally built on storage01. It just stopped booting after the latest pve reboot.
It also won't allow me to choose the storage01 as the location for any new rootfs for containers being built.

Chris · Jan 29, 2024

Jarvar said:
It also won't allow me to choose the storage01 as the location for any new rootfs for containers being built.

That is why I'm asking for the storage config, as you never provided that and I have a suspicion that the issue might be there. You can get it via: cat /etc/pve/storage.cfg

Jarvar · Jan 29, 2024

Chris said:
That is why I'm asking for the storage config, as you never provided that and I have a suspicion that the issue might be there. You can get it via: cat /etc/pve/storage.cfg

Silly me. I think you wrote that, but I didn't register in my head I kept thinking 205.conf
This is what shows in the storage.cfg file

zfspool: storage01
pool storage01
content rootdir,images
mountpoint /storage
sparse 1

The mointpoint doesn't look right, is it possible to edit it or change it? Or do we have to remove the storage and add it again?
Thank you so much again for bearing with me. I really appreciate it.
I misspelled some locations last night too and just caught my mistakes this morning.
Thank you.

Chris · Jan 29, 2024

Jarvar said:
mountpoint /storage

Yes, this is the root cause of your storage problem. You can simply adapt the path to the new mountpoint of the pool zfs list storage01 should show it to you if unsure.

Jarvar said:
Thank you so much again for bearing with me. I really appreciate it.
I misspelled some locations last night too and just caught my mistakes this morning.
Thank you.

No worries!

Jarvar · Jan 29, 2024

Chris said:
Yes, this is the root cause of your storage problem. You can simply adapt the path to the new mountpoint of the pool zfs list storage01 should show it to you if unsure.

No worries!

This may be a easy question, but is it okay to edit the storage.cfg while the system is running?
I have a vm using the same storage, but it looks like mountpoint generally affects the containers.
Thanks

Chris · Jan 29, 2024

Jarvar said:
This may be a easy question, but is it okay to edit the storage.cfg while the system is running?
I have a vm using the same storage, but it looks like mountpoint generally affects the containers.
Thanks

In this case yes, in general it would be wise to avoid editing the storage with active guests on them.

Jarvar · Jan 29, 2024

Chris said:
In this case yes, in general it would be wise to avoid editing the storage with active guests on them.

Any idea how that might have changed? Maybe when I edited the bind mounts somewhere along the line?
I know previously we have a zpool called just storage, but since then we reinstalled from scratch PVE 7 to PVE8.
Well I really appreciate your help so efficiently and in depth.

Chris · Jan 29, 2024

Jarvar said:
Any idea how that might have changed? Maybe when I edited the bind mounts somewhere along the line?
I know previously we have a zpool called just storage, but since then we reinstalled from scratch PVE 7 to PVE8.

Well that is hard to guess, but I could imagine that you may have copied over a backed up configuration at some point?

Jarvar · Jan 29, 2024

Chris said:
Well that is hard to guess, but I could imagine that you may have copied over a backed up configuration at some point?

Hmm it's possible, and this was probably the first time I rebooted the system fully since the beginning of December. I guess that's why it's good to plan for when not everything needs to be running. Luckily I was able to build another container quickly and point it to the data in a pinch.
Thank you so much again.
I'll test it out later.

Search

Search

[SOLVED] LXC won't boot after PVE reboot.

Jarvar

Active Member

Chris

Proxmox Staff Member

Jarvar

Active Member

Chris

Proxmox Staff Member

Jarvar

Active Member

Chris

Proxmox Staff Member

Jarvar

Active Member

Chris

Proxmox Staff Member

Jarvar

Active Member

Chris

Proxmox Staff Member

Jarvar

Active Member

Chris

Proxmox Staff Member

Jarvar

Active Member

Chris

Proxmox Staff Member

Jarvar

Active Member