Another HPE Microserver Gen8 GRUB rescue thread

bvlgy-ple

New Member
Mar 29, 2019
14
1
3
34
Hello there,
I've dumped my reliable XPEnology setup for the sake of being more flexible (and legally compliant) with Proxmox and I'm slowly starting to regret it. I hope I can get it fixed with your help though... Here's what I've got and what I have tried:

* Proxmox 5.3-2 installed via USB, then updated (as below)
* 4x3TB harddrives in the 4 bays of the Microserver Gen8 G1610T
* SATA is configured as AHCI (not RAID)
* RAIDZ-1 over all 4 harddrives,
* rootfs on the zpool as created by Proxmox installer

I've done several reinstallations because I initially thought I was the problem. During the process, I encountered that after apt-get-upgrading the system, the next boot will fail. I'm sent to the GRUB rescue mode shell with the error message "no such device: 1b2406d9cc9ad984". I've been on a long journey through a lot of threads on the web, also on this forum, but all the instructions and good ideas didn't lead me to success. My actions boil down to the following:

Code:
#Download and boot sysresccd-5.2.0_zfs_0.7.9.iso
mkdir /zfsmnt && zpool import rpool -R /zfsmnt
mount -t /proc /zfsmnt/proc
mount --rbind /dev /zfsmnt/dev
mount --rbind /sys /zfsmnt/sys
chroot /zfsmnt /bin/bash # yay, i've now chrooted into my pve install
apt-get update && apt-get upgrade && apt-get dist-upgrade # There was a kernel update and I hoped to solve any inconsistencies within boot-related configuration
grub-install /dev/sda && grub-install /dev/sdb && grub-install /dev/sdc && grub-install /dev/sdd  # all without errors
update-initramfs -u
grub-probe /  # returns zfs

I've also explored my GRUB configuration at that point which told me which modules to insert:
Code:
insmod part_gpt
insmod zfs
Inserting the modules worked, except for the ls part. There I get the message "unknown filesystem". Setting debug=zfs, I can see that the actual checksum only consists of zeroes while the expected checksum is some generic hex. (EDIT: After another try, there's an actual checksum that mismatches:
2grub-ls.PNG 3grub-ls.PNG ) Googling that, I found some information about some bug, but I can't recall now and I'm pretty sure it was for another (older) GRUB version. Anyway, I'm also struggling to find out where that hex ID that GRUB wants to use initially comes from. Reading /boot/grub/grub.cfg, I can see that the ID "1b2406..." is in the end of the "search" commands. As I grep through /etc/grub.d and /etc/default/grub, from where the file is supposed to be generated, I find nothing.
Edit: Here's a part of my grub.cfg where the UID resides:
1grub-conf.PNG

In the meanwhile I've also tried to copy my /boot directory from the zfs to a USB drive and install GRUB on there but still no success booting the system. The zpool itself is healthy and has also completed a scrub without any errors. One strange thing is that I cannot access my VM files even though the zfs seems to be mounted, but that's most certainly my own mistake since I can see the datasets using "zfs list".

It appears to me that using zfs as root volume is a fairly new thing or not as supported as other options. Is that true? And, most importantly, what can I do to make my PVE installation boot again? I've had to survive five days of downtime now and I'm pretty sick of it.

Another EDIT:
I just encountered that the proxmox installer inserts USB drives inbetween the four disks (sda=bay1,sdb=bay2,sdc=usbdrive,sdd=usbdrive,sde=bay3,sdf=bay4). I read somewhere else that the Proxmox installer installs the zraid to the /dev/sdX devices instead of using UUIDs. Could this be related since I installed from USB?
 
Last edited:
After the last try with checksum errors, I ran the Proxmox installer in debug mode, imported the zpool and performed another scrub. That turned out to be successful and found no errors:
4zpool-scrub.PNG

Well, unfortunately, that didn't do the trick. I'm still stuck in GRUB rescue mode with the suspicious fs-uuid. Does anyone have an idea what else I could try?
5grub-error.PNG
..
6grub-ls.PNG

Warm regards
bvlgy-ple
 
Hey there,

grub-install /dev/sda && grub-install /dev/sdb && grub-install /dev/sdc && grub-install /dev/sdd # all without errors

I don't know much about ZFS, and less about raidz, though do you actually need to install grub-install on every disk? Especially since it's a raid setup?
 
Hi Ihorace,
at first, thank you very much for your reply! The idea behind installing GRUB to every disk was to not get boot problems when replacing the disk that the server uses to boot per default. Did I get something wrong here?
The raid-style behaviour of ZFS only comes into play after starting the filesystem, unfortunately.. One could compare that to madm softraid, but ZFS is still so much more.
Warm regards
 
I know less of raid period, as I've never had a chance to test such a setup, nevertheless, if you install bootloaders on every disk, would it be able to find the right disk to boot from? Especially when you are booting, what disk are you choosing to boot from? Since it's not a hardware raid. Or there's no such thing as primary disks in raidz and you can boot from any disk? Anyway, those errors seem to stem from grub having difficulty finding the disk to boot from, the unknown file system could be attributed to it doesn't understand the contents of the disk it's booting from, see my prior comment regarding choosing the right disk. Though not sure if you booting in BIOS/UEFI mode though.
 
Booting in BIOS mode, Microserver Gen8 luckily doesn't support UEFI yet. The disk layout of all four disks in the zpool is gpt and contains BIOS boot type partitions, the Proxmox installer created those in addition to the ZFS partitions. Since every disk in my setup (up to one disk because of raidz-1) can be failing, every disk should contain the same layout and contents, ensuring that the system can also be booted after removing (or failing) one device.
However, running blkid on the member disks, I get UUID="1955695670696008067", which can be converted to hex "1B2406D9CC9AD983". This is exactly the UUID that's used/searched by GRUB but not found. Does GRUB use the hex value of the disk/partition UUID? Is the "unknown filesystem" error related to the mismatching checksums I can see when debug=zfs is set? (btw, I have checked "ls" for different disks and one time the actual checksum is only zeroes while on another disk there's a hex checksum)
 
This is exactly the UUID that's used/searched by GRUB but not found. Does GRUB use the hex value of the disk/partition UUID?

Dunno. That incorrect checksum could be a problem, might want to look into what's causing that. Though you've already ran zfs scrub, see https://forum.proxmox.com/threads/grub-rescue-checksum-verification-failed.39143/ and https://forum.proxmox.com/threads/p...o-boot-after-the-latest-kernel-upgrade.42389/

every disk should contain the same layout and contents

Ahh, good to know.

Wish you luck.
 
I've gone through those threads once again but I couldn't a thing I haven't tried in there (excepting the kernel parameter rootdelay=10 which I find to be a quite hopeless approach for this problem). I have now installed a separate SSD into my Microserver that I will use to set up a plain vanilla Proxmox install using ext4 (and add the existing ZFS pool as VM storage so I get my VMs back into operation). If that SSD dies, I have sworn to myself to have a backup. I bet I will never have more downtime upon the failure of the SSD than this time when I tried to set up a rock-solid system. I still love the idea of having a ZFS for everything, it'd be a dream to have it still working after updates - but it seems that the time is not here for it yet. Thank you very much for your support, Ihorace!
 
  • Like
Reactions: lhorace

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!