I managed to temporary fix it after some detours (mostly due to my very slow IPMI). My issue turns out to be related to
org.zfsonlinux:large_dnode = active.
This was an old HP BL460C G6 with a p410i (not an ideal setting to run ZFS). I setup two HW RAID0 on the raid card (because p410i does not allow SATA-ACHI/JBOD mode) and formed a standard ZFS RAID1 PVE install via ISO in Nov. 2020. For a while it was fine, then until last week when I reboot the server, it goes to grub rescue with the same behavior in this post.
Initially, I thought this is an old-HP-specific issue and tried the USB boot approach, but didn't works (although I'd suspect my slow IPMI is to blame). So I then googled similar grub failure in ZFS and found some post citing the root cause to dnodesize in zpool is not set to legacy, which will cause grub to failed to read the ZFS partition when large_dnode is active, resulted in unknown filesystem:
https://lucatnt.com/2020/05/grub-unknown-filesystem-on-zfs-based-proxmox/
https://forum.proxmox.com/threads/grub-probe-error-unknown-filesystem.52436/
https://www.reddit.com/r/zfs/comments/g9mtll/linux_zfs_root_issue_grub2_hates_dnodesizeauto/
Then I check the zpool on my broken server:
# zpool get feature@large_dnode
NAME PROPERTY VALUE SOURCE
rpool feature@large_dnode active local
And on another working install:
root@good-one:~# zpool get feature@large_dnode
NAME PROPERTY VALUE SOURCE
rpool feature@large_dnode enabled local
From ZFS doc, it stated:
This feature becomes active once a dataset contains an object with a dnode larger than 512B, which occurs as a result of setting the dnodesize dataset property to a value other than legacy. The feature will return to being enabled once all filesystems that have ever contained a dnode larger than 512B are destroyed.
So it seems to me, PVE installer set
dnodesize=auto
not legacy. By default,
large_dnode=enabled
, at one point between my last reboot and this, a dataset contains an object with a dnode larger than 512B, then
large_dnode=active
, then grub failed at this reboot. However, I cannot find this active dataset with
zfs get -r dnodesize rpool
, all my datasets are legacy except snapshots so I cannot simply just remove a dataset to fix the issue (still searching on this one).
So following
this, I patched the grub to ignore reading
large_dnode
. Here's my step if anyone want to build a patched grub again:
Bash:
apt install git build-essential quilt debhelper patchutils flex bison po-debconf help2man texinfo gcc-8-multilib xfonts-unifont libfreetype6-dev libdevmapper-dev libsdl1.2-dev xorriso parted libfuse-dev ttf-dejavu-core liblzma-dev mtools pkg-config libefiboot-dev libefivar-dev
git clone git://git.proxmox.com/git/zfs-grub.git
cd zfs-grub
wget https://savannah.gnu.org/bugs/download.php?file_id=45313 -O pvepatches/ignore-large-dnode.patch
echo "ignore-large-dnode.patch" >> pvepatches/series
DEB_BUILD_OPTIONS=nocheck make deb
Then on the server with issue, use
apt list --installed | grep grub
to see the list of package u need to upload and replace, then run
dpkg -i *
and
update-grub
. After that, reboot should work.
Note 1: make sure your
rpool/ROOT/pve-1
has
dnodesize=legacy
in
zfs get -r dnodesize rpool
before applying the patch.
Note 2: This is a temporary fix. For more permanent fix, either find way to change large_dnode back to enabled. Or switch from grub to systemd-boot.