How to I rebuild the Bootloader on ZFS RAID1??

sdet00

Well-Known Member
Nov 18, 2017
95
47
58
I have a somewhat frustrating situation - I have a ZFS RAID1 which started out life as a single disk, and then a second SSD was added as a mirror after the installation was done. The original failed, and today I learned that a ZFS RAID1 mirror is not actually a mirror - it doesn't actually clone the bootloader, so my system is unbootable. Oops!

Initially I booted up Linux Mint and fired up gparted - indeed, all I have is a ZFS partition and no bootloader. I installed Proxmox in a VM using UEFI boot and ZFS, and cloned the boot mbr and EFI partition to IMG files which I can then pull down on my target system when booting the Linux Mint Live ISO. I then installed a second SSD that was slightly larger (256GB instead of 250GB) and cloned over the boot, efi and zfs partitions using ddrescue. Adjusted the flags in gparted... still no boot.

At this point I figured that Proxmox is special somehow and that there is probably some special alignment of the EFI partition and ZFS pool, so I also tried booting a Proxmox ISO and using the "proxmox-boot-tool" via the terminal debugging console - no luck there either. The tool will format my EFI partition (/dev/sda2) but it will not initialize it - I get the following error:

Code:
E: bootctl is not available - make sure systemd-boot is installed

I tried to install systemd-boot in the Proxmox ISO live session but no luck there because there's no network and messing around with /etc/network/interfaces doesn't seem to behave. I am ready to throw in the towel and wipe this system and restore from backup, but before I do that, has anyone dealt with this before? Would rather get this booting again if possible as it is part of a cluster and last time I tried to replace a node it was a somewhat painful experience as well... suggestions appreciated!
 
Last edited:
  • Like
Reactions: Kingneutron
I have a somewhat frustrating situation - I have a ZFS RAID1 which started out life as a single disk, and then a second SSD was added as a mirror after the installation was done. The original failed, and today I learned that a ZFS RAID1 mirror is not actually a mirror - it doesn't actually clone the bootloader, so my system is unbootable. Oops!

Initially I booted up Linux Mint and fired up gparted - indeed, all I have is a ZFS partition and no bootloader. I installed Proxmox in a VM using UEFI boot and ZFS, and cloned the boot mbr and EFI partition to IMG files. I then installed a second SSD that was slightly larger (256GB instead of 250GB) and cloned over the boot, efi and zfs partitions using ddrescue. Adjusted the flags in gparted... still no boot.

At this point I figured that Proxmox is special somehow and that there is probably some special alignment of the EFI and ZFS pool, so I also tried booting a Proxmox ISO and using the "proxmox-boot-tool" via the terminal debugging console - no luck there either. The tool will format my EFI partition (/dev/sda2) but it will not initialize it - I get the following error:

Code:
E: bootctl is not available - make sure systemd-boot is installed

I tried to install systemd-boot in the live session but no luck there because there's no network and messing around with /etc/network/interfaces doesn't seem to behave. I am ready to throw in the towel and wipe this system and restore from backup, but before I do that, has anyone dealt with this before? Would rather get this booting again if possible as it is part of a cluster and last time I tried to replace a node it was a somewhat painful experience as well... suggestions appreciated!
ZFS RAID1 should be a mirror, BUT not whole HDD, BUT only part3 (in my scenario, which I believe is default scenario).
So you have mirrored data on part3, but to boot from a new drive, you have to clone other parts, which you did anf then you have to reactivate booting using something like this:
proxmox-boot-tool format /dev/sdb2
proxmox-boot-tool format /dev/sdb2 --force
proxmox-boot-tool init /dev/sdb2
proxmox-boot-tool status

it depends if you have UEFI or not.
 
Yes I did - that's where I encountered my issues with the proxmox-boot-tool. The format command works, the init command did not.

ZFS RAID1 should be a mirror, BUT not whole HDD, BUT only part3 (in my scenario, which I believe is default scenario).
So you have mirrored data on part3, but to boot from a new drive, you have to clone other parts, which you did anf then you have to reactivate booting using something like this:
proxmox-boot-tool format /dev/sdb2
proxmox-boot-tool format /dev/sdb2 --force
proxmox-boot-tool init /dev/sdb2
proxmox-boot-tool status

it depends if you have UEFI or not.
Yes it is a UEFI boot partition.

"proxmox-boot-tool init /dev/sda2" is the command that failed for me. Not sure why. The ISO doesn't ship with the binaries and I can't grab them from anywhere, so I'm a bit stuck unless there's a way to do this on another debian-based live ISO...
 
Last edited:
so you are missing those tools in your rescue iso ?
did you tried Proxmox iso, Advanced and Rescue boot ?

btw: I believe that if you have mirror, you have to be able to boot from both disks and if one of them is dead, you can always boot from the other one, or not ?
 
Last edited:
That's what I'm talking about — he had a fully functional ZFS RAID1 setup, and then one of the two HDDs failed. So I think he should have been able to boot from the other one.
 
Oh, I understand now — I missed the beginning of his story, where he mentioned starting with a single-disk RAID1. Now it makes sense. I started getting worried about weird ZFS RAID1 behavior, but I did some tests with a regular two-disk ZFS RAID1 setup, and it worked exactly as expected. I destroyed one HDD and was still able to boot from the other.
 
Only if he fixed the boot on both drives. Adding a mirror and having the 1st one fail without doing that leads exactly to this situation.

OP, try zfsbootmenu

https://github.com/zbm-dev/zfsbootmenu

Alright - this worked, brilliant!

To quickly summarise:

1. Read this: https://docs.zfsbootmenu.org/en/v3.0.x/general/portable.html
2. Boot Linux Mint Live ISO, download the recovery EFI file and place it in a fat32 partition as instructed. It must be renamed to EFI/BOOT/BOOTX64.EFI
3. Reboot. Your boot will fail but that's OK. Log into failsafe mode.
4. Run "blkid" to get the real UUID of your /boot/efi
5. Modify /etc/fstab and update the UUID
6. Reboot. Success!

I haven't tried running the Proxmox-Boot-Tool yet to get the bootloader back to "stock" (probably a good idea for future upgrades) but the fact I am able to boot is a great success.
 
  • Like
Reactions: Kingneutron