Crashes with ZFS root, and stuck on grub rescue> prompt

if this is reproducible, the output of "ls (hd0)" and "ls (hd0,gpt1)" "ls (hd0,gpt2)" "ls (hd0,gpt9)" would be nice.
Sure!
grub> insmod zfs
grub> ls (hd0)
Device hd0: No known filesystem detected - Sector size 512B - Total size 1362276352KiB
grub> ls (hd0,gpt1)
Partition hd0,gpt1: No known filesystem detected - Partition start at 17KiB - Total size 1007KiB
grub> ls (hd0,gpt2)
Partition hd0,gpt2: Filesystem type zfs - Label `rpool' - Last modification time 2017-04-20 19:21:01 Thursday, UUID ab38xxxxxxxxxxxxxxxx - Partition start at 1024KiB - Total size 3509750767KiB

Interestingly, the hd0 reports size of around 1.3TB, but the hd0,gpt2 reports the (more correct) 3.3TB size of the array.

I actually meant record the output into a file with "&>> logfile" ;) but that output looks okay, so all that's left is that maybe your Bios or Grub does not read the HW raid normally? How big is the "combined" disk? Not sure if non-EFI Grub has a limit there?

lol! That didn't occur to me (probably because getting the machine back on the network is a bit of work)
Like I mentioned above, the whole array size is ~3.4TB, but, the machine did boot just fine for quite a while, and, even more interestingly, the other (same machine) one that crashed (at the same time) displayed the same symptoms this one did, but actually came back up, and is seeming to be entirely normal after one particularly un-noteworthy round of running the grub commands! :-/
I'm a bit perplexed...
 
Sure!
grub> insmod zfs
grub> ls (hd0)
Device hd0: No known filesystem detected - Sector size 512B - Total size 1362276352KiB
grub> ls (hd0,gpt1)
Partition hd0,gpt1: No known filesystem detected - Partition start at 17KiB - Total size 1007KiB
grub> ls (hd0,gpt2)
Partition hd0,gpt2: Filesystem type zfs - Label `rpool' - Last modification time 2017-04-20 19:21:01 Thursday, UUID ab38xxxxxxxxxxxxxxxx - Partition start at 1024KiB - Total size 3509750767KiB

Interestingly, the hd0 reports size of around 1.3TB, but the hd0,gpt2 reports the (more correct) 3.3TB size of the array.

Like I mentioned above, the whole array size is ~3.4TB, but, the machine did boot just fine for quite a while, and, even more interestingly, the other (same machine) one that crashed (at the same time) displayed the same symptoms this one did, but actually came back up, and is seeming to be entirely normal after one particularly un-noteworthy round of running the grub commands! :-/
I'm a bit perplexed...

The differing sizes might be the cause though? Seems like Grub thinks the "disk" is smaller than it actually is, and depending on where the data that it wants to read is it might fail at various points in the process - i.e., already when trying to load the basic stuff (-> rescue shell) or later when trying to load the kernel/initramfs (error message "error: attempt to read or write outside of disk `hd0'"), or it might work (because all the things Grub wants to access are within the "limit" of the wrong disk size?

If that is the case, every rewrite of the Grub and/or other /boot files could trigger / solve the problem?
 
That's an interesting idea, and I think you might be onto something!
I checked grub on the identical machine with the working grub, and everything is identical (including the incorrect sizes). I ran grub-install /dev/sda, update-grub2, and update-initramfs -u on the working machine several times and it happily booted every single time.
On a whim, I decided to copy /boot from the working server to the non-booting one, and it Just Worked (tm). (well, after sitting and thinking for 1-2 minute after bios & before grub) I ran grub-install /dev/sda & update-grub2 first, but I wasn't expecting that.
There are only 2 files that are different between the old and new /boot dirs: grub/grub.cfg and initrd.img-4.4.49-1-pve
The only difference in the /boot/grub/grub.cfg is that the bad file includes "--hint='hd0,gpt2'" on all of it's "search" commands.
I had to use the "update-initramfs -u -t" to force the update to initram, but it still worked.
I also replaced the /boot/grub/grub.cfg file with the original one (though I copied, and did not move it) and was still able to boot.

I decided to mv the new /boot dir away, and mv'd the old one back, and that resulted in the previous problems.
When I renamed and then cp'd the old vmlinuz-4.4.49-1-pve file back to vmlinuz-4.4.49-1-pve (eg, "mv vmlinuz-4.4.49-1-pve temp; cp temp vmlinuz-4.4.49-1-pve"), the machine booted as normal.

So, that's an interesting situation!
 
So... Is there anything we can take away from this experience, except not to use ZFS on your boot filesystems? :-/

Sorry to say that no, we can not do more about this. From what I hear from others who use zfs on boot, from time to time, bad things are happend, and in such case you need a lot of time to recover the server who can not boot. So in my opinion, the safe path is to not use zfs on /boot. Maybe at home it is ok, if your mini-server can not boot, but in a 24h/24 landscape, you do not want this events.
 
So... Is there anything we can take away from this experience, except not to use ZFS on your boot filesystems? :-/

yes, don't use ZFS on top of HW raid ;) not having /boot on ZFS means not having a redundant /boot - so you need manual recovery as well when your boot drive fails..
 
yes, don't use ZFS on top of HW raid ;) not having /boot on ZFS means not having a redundant /boot - so you need manual recovery as well when your boot drive fails..

nope ... if you do like this:

- use a urbackup server, and setup a full image clone for the boot drive
- when something is broken on boot device, boot with a urbackup live cd, and restore your last image from urbackup-server
- in my case(ssd) it take max 20 min to restore(I can have this live cd in the server, and with iDrac I can boot one-time/when I need from CD)

I am trying to say that in my case, I only need to follow some simple steps(check-list) and in 20 minutes my recovery task is finish. Evean a junior admin can do it, without any linux know-how. If you have 2xSSD(or HDD) as boot decives and using a HW Raid(mirror), it is even simple to recover one disk from the HW mirror.
 
nope ... if you do like this:

- use a urbackup server, and setup a full image clone for the boot drive
- when something is broken on boot device, boot with a urbackup live cd, and restore your last image from urbackup-server
- in my case(ssd) it take max 20 min to restore(I can have this live cd in the server, and with iDrac I can boot one-time/when I need from CD)

which is manual recovery? (note that you can do the same in ~1 minute by booting a live environment (with zfs support in this case), chrooting into it and re-installing grub - which can also be automated/checklistified)

I am trying to say that in my case, I only need to follow some simple steps(check-list) and in 20 minutes my recovery task is finish. Evean a junior admin can do it, without any linux know-how. If you have 2xSSD(or HDD) as boot decives and using a HW Raid(mirror), it is even simple to recover one disk from the HW mirror.

you mean put /boot on a HW raid which gives wrong size information to grub, which leads to random grub failures? (sorry, could not resist..)
 
which is manual recovery? (note that you can do the same in ~1 minute by booting a live environment (with zfs support in this case), chrooting into it and re-installing grub - which can also be automated/checklistified)
- with the observation, that in some cases is not working (like for the initiator of this post said - and I think he was able to solve his problem in more then 60 minutes, or in my own case)

you mean put /boot on a HW raid which gives wrong size information to grub, which leads to random grub failures? (sorry, could not resist..)

- no like that if you have the proper HW(in 5 years, I do not have any problem with grub on 2 server, it was not my option to use HW raid but is works)

Any solution is good if it is working.
 
- with the observation, that in some cases is not working (like for the initiator of this post said - and I think he was able to solve his problem in more then 60 minutes, or in my own case)

again - caused by the HW raid, not by ZFS...

- no like that if you have the proper HW(in 5 years, I do not have any problem with grub on 2 server, it was not my option to use HW raid but is works)

Any solution is good if it is working.

ZFS as boot works in 99.5% of the cases.. (and I think this will be my last post in this thread, the argument runs in circles).
 
again - caused by the HW raid, not by ZFS...

In my own case, I do not use any HW raid, only JBOD mode, with zfs-mirror/boot. I have try to recover with proxmox-iso like you said, and this not help at all(2 month ago, with the last updates avalaible at that time).

ZFS as boot works in 99.5% of the cases..

Maybe .... anyway I do not know how you get this 99.5 % ;)

Have a nice day!
 
Hi,

I try to install Proxmox VE 5.3 on Proliant DL380 Gen10 server with RAIDZ3 under ZFS and I encountered the same issue. I have a RAID controller in mixed mode (the only possibility with this server) which allow me to see the disks directly. There is no hardware RAID configured. I can't modify the hardware configuration.
I tried to install Proxmox with a RAID1 under ZFS and it worked without any issue.
I followed the instructions from the links given in this thread and I tried many options but I still can't install it in RAIDZ3 (RAIDZ2, RAIDZ1 and even RAID0 don't work).

Is there something else I could try to use Proxmox with RAIDZ3 ?
 
Hi,

I try to install Proxmox VE 5.3 on Proliant DL380 Gen10 server with RAIDZ3 under ZFS and I encountered the same issue. I have a RAID controller in mixed mode (the only possibility with this server) which allow me to see the disks directly. There is no hardware RAID configured. I can't modify the hardware configuration.
I tried to install Proxmox with a RAID1 under ZFS and it worked without any issue.
I followed the instructions from the links given in this thread and I tried many options but I still can't install it in RAIDZ3 (RAIDZ2, RAIDZ1 and even RAID0 don't work).

Is there something else I could try to use Proxmox with RAIDZ3 ?
Hi Henon,
I've gone through this very issue last week too (as discussed here). I ended up throwing in the towel and installing Proxmox VE to a separate SSD. You definitely CAN work with ZFS (and I assume it's not relevant in which level). for the storages (VM disks, images, containers, ..) but from my experience I wouldn't recommend using ZFS as the rootfs, nor would I recommend to boot from it. Just keep the OS away from the zpool and you'll be fine.
Warm regards
bvlgy-ple
 
I didn't expect that it could be possible to mix ext4 and zfs so I didn't try this solution. I was trying to install RAID0 on a single SSD but it failed too.
So I'm going to install Proxmox on a SSD in ext4 and next create a RAIDZ3 zpool with the others HDD.
Thanks a lot !
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!