Server Unbootable After ZFS Mirror Failure — Missing ESP on Remaining Drive?

Coffeeri

Active Member
Jun 8, 2019
28
4
43
29
Hi Proxmox Community,

I'm facing an issue with a Proxmox VE server that has become unbootable after one of the drives in the boot ZFS mirror failed. I've confirmed the remaining drive lacks an ESP partition and need advice on the best recovery path.

Setup:
  • Proxmox VE (Version 8.x) installed on a ZFS mirror (rpool) across two NVMe SSDs (intended for UEFI boot).
Problem:
  • One of the NVMe SSDs in the mirror failed completely.
  • The server no longer boots.
Goal:
  • Make the server bootable again using the single remaining functional NVMe SSD.
  • When the replacement NVMe has arrived, recover mirror.
Troubleshooting Steps Taken:
  1. Booted from Proxmox VE Installation USB: Used the official installer USB stick.
  2. Selected Rescue Boot: Navigated to Advanced Options -> Rescue Boot.
  3. Rescue Boot Error: The rescue boot process failed to automatically find the boot disk (error: no such device: rpool. ERROR: unable to find boot disk automatically.).
  4. Entered Shell via TUI Installer: Successfully entered the basic rescue command-line shell.
  5. Checked Disks: Ran lsblk and sgdisk -p /dev/nvme0n1 to identify the remaining good NVMe SSD (nvme0n1) and its partitions.
The Core Issue - ESP Confirmed Missing

The output of sgdisk -p /dev/nvme0n1 confirms that there is no EFI System Partition (ESP) on the remaining working drive:

Disk /dev/nvme0n1: 1953525168 sectors, 931.5 GiB
Model: WD_BLACK SN850X 1000GB
[...]
Number Start (sector) End (sector) Size Code Name
1 2048 1953507327 931.5 GiB BF01 zfs-251b95c7537d3a65 <-- ZFS Data rpool
9 1953507328 1953523711 8.0 MiB BF07 <-- Solaris reserved / bios_grub?

(No partition with code EF00 exists)

This means proxmox-boot-tool cannot be used as it has no target partition.

My Questions:
  1. Given the confirmed absence of an ESP, is there any supported method to make this drive bootable without a full reinstall, or is attempting to manually create/resize partitions too risky?
  2. Is the recommended (and safest) path forward now to:
  • Use the rescue shell to import rpool read-only and back up /etc/pve and any other critical data from the ZFS partition (nvme0n1p1).
  • Perform a clean Proxmox reinstall onto nvme0n1, allowing the installer to correctly partition the drive (including creating a new ESP).
  • Restore the backed-up configuration and data?
Any confirmation or alternative suggestions would be greatly appreciated.
Thank you!
 
  1. Given the confirmed absence of an ESP, is there any supported method to make this drive bootable without a full reinstall, or is attempting to manually create/resize partitions too risky?
Only if there is free partitioned space to create an additional ESP. If rescue boot works for you, you can add an ESP (even at the end of the physical drive) and use proxmox-boot-tool to format and init it. Please see the manual for details: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysboot

  1. Is the recommended (and safest) path forward now to:
  • Use the rescue shell to import rpool read-only and back up /etc/pve and any other critical data from the ZFS partition (nvme0n1p1).
  • Perform a clean Proxmox reinstall onto nvme0n1, allowing the installer to correctly partition the drive (including creating a new ESP).
  • Restore the backed-up configuration and data?
A new install is not necessary if the rescue boot works and boots you into the existing Proxmox. Then you can partition the new drive (and make it part from the mirror) from there. Make sure not to forget the ESP. See "Changing a failed bootable device" in the manual: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_zfs_change_failed_dev . After that you can add an ESP to the old drive (as mentioned above) or simply redo the entire old drive.