Proxmox-boot-tool, ZFS rpool and failure to boot after disk replacement

waroen

New Member
Oct 10, 2021
4
0
1
47
Hi,

I have a very strange and slightly dangerous situation currently.

I have a Mirror ZFS rpool, originally with two 500GB normal HDDs. One drive failed, got replaced with a new one. Updated with proxmox-boot-tool and according to the steps in the manual. Everything seemed to work just fine. No error either during copy of part-table or during proxmox-boot-tool updates.

Fast forward a couple of month and I'm in the process of replacing said disks with SSDs, only to notice that the only disk that is bootable is the original disk. As long as that is connected PVE boots just fine, but as soon as it is removed a get a strange GRUB error about not finding a device and unknown filesystem. This is very strange since I'm booting using systemd-boot and proxmox-boot-tool says that it's using UEFI.

Anyway, nothing I can do fixes this error, I've tried proxmox-boot-tool repeatedly, mounting the EFI-partition and copying the working EFI to the new SSD. No joy.

Have anyone here had the same issues?

Will I have to backup my config, send the ZFS-data to another disk and just reinstall (preferably not, a lot of work involved with that).
 
If the system is configured using proxmox-boot-tool and booted in UEFI mode - grub should not be used at all...
I'd check the output of `efibootmgr` and make sure that the efivars indeed are set to boot with systemd-boot (and not some old grub installation)

I hope this helps!
 
This is what efibootmgr outputs, if I list efivars I have a very large list. Anything specific I should look for?

LoaderInfo contains systemd-boot.

The problem is that all newly installed disks fail to be bootable.

root@pve:~# efibootmgr -v
BootCurrent: 0001
Timeout: 1 seconds
BootOrder: 0001,0002,0005,0000,000D,000C,000E,000F
Boot0000* CentOS Linux HD(1,GPT,ff04a1ce-4fa0-44e3-aec9-bbbac1496923,0x800,0x12c000)/File(\EFI\centos\shimx64.efi)
Boot0001* Linux Boot Manager HD(2,GPT,840ffcb0-2336-4aee-a6b3-efc3e404a703,0x800,0x100000)/File(\EFI\systemd\systemd-bootx64.efi)
Boot0002* Linux Boot Manager HD(2,GPT,ae4ab53d-d102-40b7-8665-9efc15434b39,0x800,0x100000)/File(\EFI\systemd\systemd-bootx64.efi)
Boot0005* Linux Boot Manager HD(2,GPT,5f6ac6d3-22d1-4b72-be98-a1db3d142de0,0x800,0x100000)/File(\EFI\systemd\systemd-bootx64.efi)
Boot000C* Samsung SSD 870 EVO 500GB BBS(HD,,0x0)AMBO
Boot000D* UEFI: ST500DM002-1BD142 PciRoot(0x0)/Pci(0x1f,0x2)/Sata(2,65535,0)/HD(2,GPT,1543c402-5374-4afd-bc3c-dcd6731683e0,0x800,0x100000)AMBO
Boot000E* Samsung SSD 870 EVO 500GB BBS(HD,,0x0)AMBO
Boot000F* ST500DM002-1BD142 BBS(HD,,0x0)AMBO

Thanks!
 
If you indeed get a _grub_ error - I would guess that your bios still tries to boot from the following entry:
Boot0000* CentOS Linux HD(1,GPT,ff04a1ce-4fa0-44e3-aec9-bbbac1496923,0x800,0x12c000)/File(\EFI\centos\shimx64.efi)

I'd check in the BIOS from which disk it tries to boot

I hope this helps

else please post some screenshots of the concrete error you're getting
 
If you indeed get a _grub_ error - I would guess that your bios still tries to boot from the following entry:


I'd check in the BIOS from which disk it tries to boot

I hope this helps

else please post some screenshots of the concrete error you're gettingI

If I remove the disk that seems to be bootable, so that it can't boot from it I get the "grub rescue>" prompt and an unbootable server.

I'll see if I can get some screenshots later from that error.
 
If I remove the disk that seems to be bootable, so that it can't boot from it I get the "grub rescue>" prompt and an unbootable server.
In that case I'd suggest simply trying to reinitialize the VFAT partition of the new SSD with proxmox-boot-tool format, and proxmox-boot-tool init
(and cleanup afterwards with proxmox-boot-tool clean)

compare the output of efibootmgr afterwards
 
In that case I'd suggest simply trying to reinitialize the VFAT partition of the new SSD with proxmox-boot-tool format, and proxmox-boot-tool init
(and cleanup afterwards with proxmox-boot-tool clean)

compare the output of efibootmgr afterwards
I've done that about 10 times already. :)

But I'll check again later tonight, can't shutdown now since the
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!