How to boot Proxmox PVE from degraded ZFS-RAID10 / grub rescue

Chris&Patte

Renowned Member
Sep 3, 2013
55
1
73
Hello,

we are using a old, mostly unused Proxmox server (HP ProLiant DL160 Gen8) as backup/fail-safe Hypervisor for our infrastructure production VMs.
I boot up that server ever week and clone the storage of those VMs via "zfs send" to it.

Now it seems that one disk is broken and it does not boot any more, instead is stuck at the GRUB prompt.
1699266426956.png

When i boot up the server with a zfs-enabled linux live cd, i can see that one disk is broken but that i can access the zpool
1699266493629.png

and all VM-storage seems intact.
1699266546124.png

so, i need to replace that broken disk and resilver that RAID, BUT my question is now why does the server does not boot any more? What did i need to do with GRUB that it can boot up the server WITH the degraded pool/disk?

Thank you

Chris
 
Since the (hd0,gpt2) filesystem is unknown instead of FAT or ESP, I assume that GRUB was not properly installed on this drive when it was added to the mirror (search for "Changing a failed bootable device" in the manual). Maybe this can be easily fixed by chrooting into the rpool and running grub-install and update-grub? It's much like https://pve.proxmox.com/wiki/Recover_From_Grub_Failure but with ZFS instead of LVM.
 
Is the broken disk disconnected already? If not, do it.
And give us a list of your partitions with fdisk -l
 
@JensF

About the partitions, i need to power up the machine, but till then i have this screenshot from a Linux installer (did not knew any more what distribution this was) that shows "some" partition info.

And no, i cannot access the server physically today to detach the broken disk, i wanted to replace the HD tomorrow, but i still wonder why my server stopped booting at all from the RAID10.

1699269950117.png
 
I guess your server still tries to boot from the broken disk. Look in your BIOS boot order if you have another boot entry and try to boot from the second disk.
 
I guess your server still tries to boot from the broken disk. Look in your BIOS boot order if you have another boot entry and try to boot from the second disk.
That's not possible in that way. I the HP-Bios can only select the HD-controller to boot from and select the order in which devices are tried (Floppy, HD controller, Network, ..) but there is no way to select a single disk to boot from.
 
That's not possible in that way. I the HP-Bios can only select the HD-controller to boot from and select the order in which devices are tried (Floppy, HD controller, Network, ..) but there is no way to select a single disk to boot from.
does the fact that GRUB Recue start did not show that the computer IS booting from the HD, but cannot find the linux kernel or the /boot folder?
 
Since the (hd0,gpt2) filesystem is unknown instead of FAT or ESP, I assume that GRUB was not properly installed on this drive when it was added to the mirror (search for "Changing a failed bootable device" in the manual). Maybe this can be easily fixed by chrooting into the rpool and running grub-install and update-grub? It's much like https://pve.proxmox.com/wiki/Recover_From_Grub_Failure but with ZFS instead of LVM.
@leesteken

thank for your info. I think that's the correct way and i will try to understand it. Coming back soon ;-)
 
The partition structure on /dev/sda looks good but there is no /dev/sdb detected!
Just to make sure that sda isn't the broken disk you could give us the output of ls -l /dev/disk/by-id/
 
The partition structure on /dev/sda looks good but there is no /dev/sdb detected!
Just to make sure that sda isn't the broken disk you could give us the output of ls -l /dev/disk/by-id/
sdb is the broken disk. There are 4 drives.
1699280617195.png
 
Looks not that bad at all. You could mount your sda1 and sda2 within a Linux live CD to see, if there is any data on it, or just do the steps in leesteken's link.
 
OK, a maybe stupid question. Which bootloader is my machine using?

As i#m stiuck at GRUB RESCUE, i gues it's GRUB? But if i remeber well, i have installed via installer on a zfs-RAID10, so it should be systemd-boot?

i do not have proxmox-boot-tool in proxmox boot media rescue to check it. So how do i knew whixch boot i#m actually using?


1699370997103.png
 
OK, a maybe stupid question. Which bootloader is my machine using?
It's in the manual but you might need to chroot into your rpool first.
As i#m stiuck at GRUB RESCUE, i gues it's GRUB?
Probably. It's what is currently started by the motherboard BIOS but it could also be accidental because you previous boot drive failed.
But if i remeber well, i have installed via installer on a zfs-RAID10, so it should be systemd-boot?
Only if your system boots in UEFI mode (efibootmgr -v), otherwise it's still GRUB (see the manual).
i do not have proxmox-boot-tool in proxmox boot media rescue to check it. So how do i knew whixch boot i#m actually using?
First chroot into your rpool (mount dev, sys, proc, etc. before) and then you can probably do everything as in the manual and Wiki.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!