Crashes with ZFS root, and stuck on grub rescue> prompt

alchemycs

Well-Known Member
Dec 6, 2011
34
8
48
Seattle, WA
alchemycs.com
Hello!
I recently had 2 servers crash - both had been used in a PVE v3 cluster for a few years and never had an issue, but after a clean wipe and fresh install of PVE v4, both have experienced crashes, most likely under high I/O workloads. They have a HW RAID-6 with a ZFS raid-0 install as the root. Last weekend one crashed, and came up with a simple "grub rescue> " prompt, and while I was restoring some CTs onto the other one, it suddenly rebooted and came up with the same grub rescue prompt. I did see another server crash while I was restoring another large CT onto it, but luckily, it came up cleanly, and I did not attempt any further major I/O onto it.
Some details on the crashes:
* nothing was written in any of the logs or the hardware logs - the system just starts logging about the boot process
* these were slightly out of date - pve-manager/4.4-5/c43015a5 (running kernel: 4.4.35-2-pve)
* I believe these two machines have not ever been updated - they were installed with the above version a few months ago and after some brief testing, joined the PVE v4 cluster

The machines come up after the bios immediately (no delay) drop to:
error: unknown filesystem.
Entering rescue mode...
grub rescue>

When I type 'ls' I see:
(hd0) (hd0,gpt9) (hd0,gpt2) (hd0,gpt1) (fd0)

(I'm assuming that fd0 is just the virtual media from the IPMI - no floppies here! :) )
When I type 'set', I get:
cmdpath=(hd0)
prefix=(hd0,gpt2)/ROOT/pve-1@/boot/grub
root=hd0,gpt2

I can type 'insmod zfs' and it returns right away, but I still don't get any valid output from 'ls (hd0,gpt2)' or 'insmod normal'. It hangs for several seconds and then reports:
error: unknown filesystem

I was a little stuck getting a ZFS rescue disk, but ultimately discovered that the v3.4 ISO file worked with my IPMI keyboard. I tried using the Rescue mode on the v4.4 and v5.0 beta ISOs, but after hanging for several seconds, both reported:
error: failure reading sector 0x0 from `fd0'.
error: no such device: rpool.
ERROR: unable to find boot disk automatically.

I used the commands from https://forum.proxmox.com/threads/grub2-recovery-on-zfs-proxmox-ve-3-4.21306/ to import the zpool and ran a scrub on it (which reported no errors), but when I reinstalled and updated grub and the initramfs, I still get dumped back to the same "error: unknown filesystem" on boot.

I did some looking around and found some possible bugs, but didn't find a particular scenario that matched what I'm seeing...

Any help or suggestions?
 
Hardware raid controller - an LSI SAS controller, with 8 SAS disks attached in a RAID-6
I had read that ZFS can have issues with trying to boot off a RAID device with more than a few disks, so it seemed prudent to stick with a single virtual disk.
 
Never, never, never install ZFS on HWraid. This have poor performance, and kill your datas.
 
If it is such a liability, why doesn't the Proxmox installer mention anything about that then?
My problem is still with Grub and ZFS, not data loss or performance problems - for all intents and purposes this is no different from trying to boot off a single drive.
 
Can you boot live cd with zfs support ? Try to import the pool to see is everything ok with zfs pool and try to update grub.
 
As I mentioned, I was able to boot with the v3.4 and import and scrub the zpool without any errors, but after updating and installing grub, I still get dumped back to the same "error: unknown filesystem" on reboot.
 
As I mentioned, I was able to boot with the v3.4 and import and scrub the zpool without any errors, but after updating and installing grub, I still get dumped back to the same "error: unknown filesystem" on reboot.

could you post a complete transcript of the session in the live cd (which commands you run, and their output)? please run "grub-install" with an added "-v" to get more output. the generated grub configuration would also be interesting, as well as "zpool get all rpool" and "zfs get all -r rpool/ROOT/".
 
Hello,

Usually, it is a bad ideea to install zfs on any RAID HW controller. You can try to add something like:

rootdelay=XX in /etc/default/zfs, where XX is like 10 or even better 20! It is not unusually that the controller "show" the disk to the grub after a delay!
 
Hello,

Usually, it is a bad ideea to install zfs on any RAID HW controller. You can try to add something like:

rootdelay=XX in /etc/default/zfs, where XX is like 10 or even better 20! It is not unusually that the controller "show" the disk to the grub after a delay!

that kernel/initramfs parameter is irrelevant if grub cannot load its stage 2 files.
 
  • Like
Reactions: jcrsoto
after chroot to zfs root partition with live cd what does
Code:
# grub-probe /
show?
Maybe problem with grub, no zfs support ?
 
Ok. I am not know what is the probem in this case, but I have the same problem in this context:

4 identical Dell(ecc ram /jbod disks, not raid controller )servers (updateed all at the same time) with zfs (mirror). In a day, on one of this 4 servers, I seen the same message and grub show the rescue mode. I was able to see zpool, and zfs root in grub rescue mode. Even after I put these hdd in another node, the same ... grub rescue. A zpool scrub did not show any problems.
So my own conclusion was to avoid any zfs on boot/root. I use zfs on many servers for many years(on many different hardware), without zfs on root/boot, and nothing was happend(including fuse-zfs). zfs on boot/root is hard to diagnose/repair and from time to time grub/zfs has some problems to work without errors.
In a production enviroment, a separate boot/root disk, is the best option(zfs for data like vms, kvm guests ). You could have N nodes, like this, with disk clones (clonezilla image) for each of them. If the boot disk crash ... in 20 minutes I can restore any node.
In the zfs boot/root you can only hope to restore the node in 20 minutes but you can try for many hours if you want...;)

Even better, with urbackup can do this (boot disk image) automatically (one full image/month is ok for me)
 
Last edited:
  • Like
Reactions: GadgetPig
If it is such a liability, why doesn't the Proxmox installer mention anything about that then?

Hardware raid is totally transparent to the operating system. it's impossible for proxmox installer detect an hardware raid and it's configuration.
Operating system only see a single disk, there is no way to know that that disk is an array and not a single disk.

If you only see /dev/sda, how can you know that is a single device or an hardware raid ? You can't.
 
Interesting update here: I had left the machine booted up from the live cd for a few days so I could copy all the data off it, and the next time I tried to boot off the drives after running the same commands, grub actually loaded (after sitting for ~60 seconds), but then it fails with:

Code:
Loading Linux 4.4.49-1-pve ...
error: attempt to read or write outside of disk `hd0'.
Loading initial ramdisk ...
error: you need to load the kernel first.

"grub-probe /" does show "zfs"

could you post a complete transcript of the session in the live cd (which commands you run, and their output)?
I've put a screen capture here:
https://www.dropbox.com/s/r3f6p6n3gxty5qg/pm4zfs.mp4?dl=0
though it's not perfect, it should show most of the details. I can try to get a better capture if you would like?
Here are the commands I run:
Code:
zpool import -R /mnt rpool
mount -t proc /proc /mnt/proc
mount --rbind /dev /mnt/dev
mount --rbind /sys /mnt/sys
chroot /mnt bin/bash
grub-install /dev/sda
update-grub2
update-initramfs -u
and none of them report any errors

So my own conclusion was to avoid any zfs on boot/root
That's kind of what I'm thinking... :)

Hardware raid is totally transparent to the operating system. it's impossible for proxmox installer detect an hardware raid and it's configuration.

Linux has a pretty good idea of what's going on with its disks. Look at the output from lshw:
Code:
           *-storage
                description: RAID bus controller
                product: MegaRAID SAS 1078
               ...
              *-disk
                   description: SCSI Disk
                   product: MegaRAID SAS RMB
                   vendor: LSI
                   logical name: /dev/sda
 
Linux has a pretty good idea of what's going on with its disks. Look at the output from lshw:
Code:
           *-storage
                description: RAID bus controller
                product: MegaRAID SAS 1078
               ...
              *-disk
                   description: SCSI Disk
                   product: MegaRAID SAS RMB
                   vendor: LSI
                   logical name: /dev/sda

And then?
There you can't see if it's a raid or just a disk in JBOD mode.
You only see that there is a raid controller, nothing more. That info is totally useless

One of the main advantage/disadvantage of an hardware raid is to be totally transparent to the operating system
If you'll be able to see raid config from operating system, then your raid isn't transparent.
 
And then?
There you can't see if it's a raid or just a disk in JBOD mode.
You only see that there is a raid controller, nothing more. That info is totally useless

It means that whatever drive you're looking at is under a RAID controller - if the user selects ZFS on that drive, at the very least you can put up a warning that says "You might not want to install ZFS on a HW raid controller, proceed at your own risk".
 
  • Like
Reactions: guletz
That doesn't make sense.
Using an hardware raid controller with disk in JBOD/passthrough mode is fine (only useless, an HBA is cheaper than a raid controller)
 
Interesting update here: I had left the machine booted up from the live cd for a few days so I could copy all the data off it, and the next time I tried to boot off the drives after running the same commands, grub actually loaded (after sitting for ~60 seconds), but then it fails with:

Code:
Loading Linux 4.4.49-1-pve ...
error: attempt to read or write outside of disk `hd0'.
Loading initial ramdisk ...
error: you need to load the kernel first.

if this is reproducible, the output of "ls (hd0)" and "ls (hd0,gpt1)" "ls (hd0,gpt2)" "ls (hd0,gpt9)" would be nice.

"grub-probe /" does show "zfs"


I've put a screen capture here:
https://www.dropbox.com/s/r3f6p6n3gxty5qg/pm4zfs.mp4?dl=0
though it's not perfect, it should show most of the details. I can try to get a better capture if you would like?
Here are the commands I run:
Code:
zpool import -R /mnt rpool
mount -t proc /proc /mnt/proc
mount --rbind /dev /mnt/dev
mount --rbind /sys /mnt/sys
chroot /mnt bin/bash
grub-install /dev/sda
update-grub2
update-initramfs -u
and none of them report any errors

I actually meant record the output into a file with "&>> logfile" ;) but that output looks okay, so all that's left is that maybe your Bios or Grub does not read the HW raid normally? How big is the "combined" disk? Not sure if non-EFI Grub has a limit there?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!