Grub not booting ZFS rpool after disk replacement

Andreas Piening · Feb 2, 2019

I had one disk producing read errors in my 15 disk ZFS raid-z2 pool and replaced it offline.
Trying to boot the system ended in grub rescue mode:

Code:

error: no such device: <hex string>.
Entering rescue mode...

I booted with sysresccd+zfs and was able to import the pool with

Code:

zpool import -o altroot=/mnt/proxmox rpool

I initiated a resilver with zpool replace and it started.
I mounted /proc, /sys and /dev and chrooted into the rpool root system to reinstall grub:

Code:

grub-install /dev/sda
Installing for i386-pc platform.
grub-install: error: unknown filesystem.

I get the same error-message for all the other drives as well.
It seems my rpool could not be identified by grub as ZFS:

Code:

grub-probe /
grub-probe: error: unknown filesystem.

Any help on reinstalling grub2 and getting the system to boot would be greatly appreciated.

Andreas Piening · Feb 2, 2019

I received the hint that grub-install and even booting from the rpool does not work while the disk replacing is in progress.
And in fact, after the resilvering finished I did the mount / chroot again and grub-install worked on all 15 disks without errors.
Even 'grub-probe /' returns 'zfs' now.

However, grub still puts me on rescue shell with: "error: attempt to read or write outside of disk 'hd4'".
I have never seen this error.
Any clue?

fabian · Feb 4, 2019

Andreas Piening said:
However, grub still puts me on rescue shell with: "error: attempt to read or write outside of disk 'hd4'".
I have never seen this error.
Any clue?

what kidn of disk controller are you using? usually this is a result of controller firmware reporting wrong disk sizes in legacy/MBR boot mode.. the error will randomly appear and disappear depending on where (physically) on the disk the data that needs to be read for booting is stored..

Andreas Piening · Feb 4, 2019

Hi Fabian,
the controller is a

Code:

02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)

which does not support real JBOD, but there is a mode that creates RAID 0 vDevs for every attached disk.
I guess you are right when blaming the controller for this. After the disk swap there was one vDev less and only 14 disks out of 15 were reported to the system. I had to re-activate the mode that presents every disk as a separate vDev to see all 15 drives again.
The resilvering worked fine when booting with a sysresccd, but grub2 does not seem to handle that.
I wonder how I should manage this if a disk fails again. Now I've sent the VMs with zfs send/receive to another host, did a clean install (which worked without any issues) and then transferred the VMs back. This was very time consuming and I have no clue how I can avoid this if a disk fails again.

jim.bond.9862 · Feb 4, 2019

Here are few sugestion for the future, if you can swing it today.

1. Check if you can flush the controller to IT mode and if that will make it support a true jbod mode.
I recently had same issue when did a full refurb of my server. New mb had on board controller. Which as it turned out simply was tools. It could not be flashed to support my setup.
Unwittingly I went and got, what I believe a great deal on an Adaptec raid card, it was perfect by mine understating. But it was just like your controller. No support for jbod. After carefully researching my options I decided to eat the cost of Adaptec card and just get a controller that will work as preferred. Found one on eBay already flashed to IT mode that supports full jbod for 24 disks.
Had the whole thing up an running in 20 min.
Even had 2 disks failed an replaced since than. I have a few old disks in my setup that I replace as they give out.
The only difference with your setup is that I separate boot pools from data pools.
My setup uses different size disks so a single bootable pool is not an option for me.

Andreas Piening · Feb 4, 2019

The server is part of our rented infrastructure. Unfortunately I am not allowed to flash alternate firmwares or swap or extend the hardware. I don't even have physical access to the machine.
I have two identical servers of this type and they were running for over 18 months very well until one disk died.

I expected the swapped drive to replace the old one and leave everything else untouched. It came as a surprise to me, that the vDev config was gone after the original drive was removed.
I wonder if it would have changed anything if the drive was replaced online. If the drive mapping would stay intact it may have prevented the issues grub2 ran into. Just guessing.

It looks like a general grub2 / zfs issue to me, not really specific to proxmox.
However, if I gather more information on this or find a solution, I'll post it here.

mailinglists · Feb 4, 2019

Next time try to offline the disk, hot swap it, zfs replace, do grub install before rebooting and then test with reboot.

jim.bond.9862 · Feb 4, 2019

mailinglists said:
Next time try to offline the disk, hot swap it, zfs replace, do grub install before rebooting and then test with reboot.

I do not think he can do that.
His raid controller is a real raid with no support for vanilla jbod option. Instead he has to go into controller bios and configure virtual volumes for all disks. Onpy than this sudo-disks are seen by the system.
I am not 100% sure but most controllers can only be accessed during boot.
It would be safer in this setup to have a preconfigured hot spare in the system.
This way you would have an extra volume available to do zfs disk replace hot before reboot. Than you can reboot. Make sure all is up and only than Repalce the bad disk with new one.

mailinglists · Feb 5, 2019

He can manipulate his RAID with megacli:
https://hwraid.le-vert.net/wiki/LSIMegaRAIDSAS

fabian · Feb 5, 2019

Andreas Piening said:
Hi Fabian,
the controller is a

Code:

02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)

which does not support real JBOD, but there is a mode that creates RAID 0 vDevs for every attached disk.
I guess you are right when blaming the controller for this. After the disk swap there was one vDev less and only 14 disks out of 15 were reported to the system. I had to re-activate the mode that presents every disk as a separate vDev to see all 15 drives again.
The resilvering worked fine when booting with a sysresccd, but grub2 does not seem to handle that.
I wonder how I should manage this if a disk fails again. Now I've sent the VMs with zfs send/receive to another host, did a clean install (which worked without any issues) and then transferred the VMs back. This was very time consuming and I have no clue how I can avoid this if a disk fails again.

indeed, those are exactly the kind of controllers/modes which are known to be problematic. if you can't flash to IT mode or switch to a HBA, your best course of action would be to move /boot to a separate, non-ZFS (non-redundant!) device or partition and use that to boot.

Search

Search

Grub not booting ZFS rpool after disk replacement

Andreas Piening

Renowned Member

Andreas Piening

Renowned Member

fabian

Proxmox Staff Member

Andreas Piening

Renowned Member

jim.bond.9862

Renowned Member

Andreas Piening

Renowned Member

mailinglists

Renowned Member

jim.bond.9862

Renowned Member

mailinglists

Renowned Member

fabian

Proxmox Staff Member

We value your privacy