Unable to boot Proxmox PVE after power outage – ZFS root + GRUB issues

Feb 21, 2023
48
2
13
Stockholm, Sweden
Hi everyone,

I’m having trouble getting my Proxmox PVE server to boot after a power outage. GRUB loads, but the system fails during boot with errors such as:
  • mount: special device /dev/nvme0n1p6 does not exist
  • unable to identify a filesystem in /dev/nvme0n1p6
  • ZFS modules cannot be auto-imported

My setup uses ZFS as the root filesystem and LVM for pve-root. I can access the GRUB menu, but booting any kernel results in a failure to mount the root filesystem.

What I have tried so far:
  • Booted from Proxmox ISO in Rescue Mode
  • zpool import → returns: “ZFS modules cannot be auto-imported”
  • vgchange -ay → successfully activates LVM volumes
  • Reinstalled GRUB in BIOS mode (grub-install /dev/nvme0n1)
  • Ran update-initramfs -u and update-grub
  • Verified that /etc/fstab points to the correct UUID
  • Verified that the system is installed in BIOS/Legacy mode, not UEFI

Despite this, the system still fails to import rpool during boot, and the kernel cannot mount the root filesystem.

Symptoms:
  • GRUB loads normally
  • Kernel starts, but root filesystem cannot be mounted
  • ZFS pool does not auto-import
  • Rescue environment sometimes lacks ZFS modules
  • Boot attempts reference a non-existent partition (/dev/nvme0n1p6)

What I suspect:
  • initramfs may be missing ZFS modules
  • rpool is not being imported early enough
  • GRUB or initramfs may reference outdated device paths
  • ZFS module loading may have been broken by the power loss

What I need help with:
  1. How to ensure ZFS modules are correctly included in initramfs
  2. How to force or repair ZFS auto-import during boot
  3. How to properly rebuild initramfs for a ZFS-on-root Proxmox installation
  4. Any recommended recovery steps for ZFS root after an unclean shutdown

Any guidance or similar experiences would be greatly appreciated.
Thanks in advance!


— Tomas
 
✅ Summary of the Boot/Kernel Issue and How It Was Resolved

I ran into a situation where my Proxmox host failed to boot properly after a power outage. The root cause turned out to be a broken kernel installation (Proxmox 6.17 series) combined with the proxmox-default-kernel meta‑package forcing the system to reinstall the faulty kernel on every apt operation.

Here is the exact sequence that resolved the issue:

1. The problem
  • The system attempted to boot a 6.17.x kernel that had incomplete or broken installation.
  • DKMS/NVIDIA modules failed to build for that kernel.
  • proxmox-default-kernel kept pulling the 6.17 kernel back in.
  • proxmox-ve could not be removed or repaired because of the pve-apt-hook safety mechanism.
  • apt and dpkg were stuck in a broken state.

2. The fix

Step 1 — Allow removal of the meta‑package


Proxmox protects proxmox-ve and related meta‑packages.
To override the safety hook:

touch /please-remove-proxmox-ve

This does not remove Proxmox — it only allows the meta‑package to be purged.

Step 2 — Purge the kernel meta‑package

This prevents the broken 6.17 kernel from being reinstalled:

apt purge proxmox-default-kernel

Step 3 — Remove any remaining 6.17 kernel packages

(They were already gone in my case, but this ensures a clean state.)

dpkg --purge --force-all proxmox-kernel-6.17.4-2-pve-signeddpkg --purge --force-all proxmox-kernel-6.17
Step 4 — Repair package state

With the meta‑package gone, apt can now cleanly repair itself:

apt --fix-broken installapt autoremove

Step 5 — Rebuild GRUB
Ensures only valid kernels appear in the boot menu:

update-grub

3. Result
  • System boots cleanly again using the stable 6.14.11‑4‑pve kernel.
  • No more attempts to reinstall the broken 6.17 kernel.
  • apt and dpkg are fully consistent.
  • NVIDIA DKMS modules build correctly.
  • Proxmox VE runs normally.
4. Root cause

A power outage interrupted a kernel upgrade, leaving the 6.17 kernel in a partially installed state.

Because proxmox-default-kernel depends on the newest kernel series, the system kept trying to reinstall the broken kernel, causing repeated boot failures and package errors.

Removing the meta‑package and purging the broken kernel resolved the issue completely.