ZFS - error: no such device, error: unknown filesystem, Entering rescue mode

Jota V.

Well-Known Member
Jan 29, 2018
59
9
48
49
Hi, we are running a Proxmox 6.1 cluster with five nodes. On a node with low memory, running ok for more than 100 days, i've limited ZFS max memory setting this max size to /etc/modprobe.d/zfs.conf

options zfs zfs_arc_max=4294967296

update-initramfs -u

reboot

Then i've rebooted server and can't boot.

1598817111471.png

What can i do?

There are two 2TB disks running in ZFS Raid 1
 
This is the last messages before reboot

"/etc/modprobe.d/zfs.conf" [New] 1L, 35C written
root@vcloud05:~# update-initramfs -u
update-initramfs: Generating /boot/initrd.img-5.3.10-1-pve
Running hook script 'zz-pve-efiboot'..
Re-executing '/etc/kernel/postinst.d/zz-pve-efiboot' in new private mount namespace..
No /etc/kernel/cmdline found - falling back to /proc/cmdline
Copying and configuring kernels on /dev/disk/by-uuid/584B-9E07
Copying kernel and creating boot-entry for 5.3.10-1-pve
Copying and configuring kernels on /dev/disk/by-uuid/584C-3EBD
Copying kernel and creating boot-entry for 5.3.10-1-pve
root@vcloud05:~# reboot
 
I have the same problem. I've been trying a lot of things over the past few hours, but no luck yet. Lots of posts back to 2017 about this.

Also raidz1, but on 2 500GB SSDs.

Anyone able to solve this?
 
Nobody?

I've booted using proxmox boot cd using rescue mode

zpool import -a
zfs set mountpoint=/mnt rpool/ROOT/pve-1
zfs mount rpool/ROOT/pve-1
mount -t proc /proc /mnt/proc
mount --rbind /dev /mnt/dev
mount --rbind /sys /mnt/sys
chroot /mnt /bin/bash
rm /etc/modprobe.d/zfs.conf
update-initramfs -u
(exited chroot)
zfs set mountpoint=/ rpool/ROOT/pve-1

And same error
 
I also tried install-grub /dev/sda in the chroot, but it just gives me an error about no filesystem.
 
Me too. Same error.

I've done this.

We had a ZFS Raid 1.
We have reinstalled Proxmox con THE FIRST DRIVE, seconds NOTHING using Ext4
Mount ZFS and added to storage.
Backup VMs
Reformat without ZFS Root Filesystem
Restore VMs on EXT4

On the other four nodes of the cluster, ROOT FS it's on Ext4 because we have problems in the past with ZFS ROOT
 
I have the same error on a zfs proxmox backup server. And NO I do not have an hardware raid controller.
After reboot upgrading from 0.8.x to 0.9-1 it does not boot anymore.
In addition to this I can see that with an ubuntu live cd I can mount and see my zfs pool so it is not corrupted.
it is the grub part that has been corrupted on reboot.
Zfs is not stable at all it is not the first time it happens to me, and THIS time no raid controller so no lame excuses...
 
Hi,
the same error after upgrading (dist-upgrade) from PVE 6.3-1 to 6.3-3. First reboot succeded, but after creating new zfs dataset and installing nfs-kernel-server and rebooting once more - got "grub rescue".
 
Last edited:
I think the same my trouble is because of new ZFS features (zpool upgrade) and using new compression=zstd not covered by grub, but successfully booted by EFI (I've switched one of my servers manually in BIOS). Unfortunately, another my server has not UEFI at all, so I can't either mount ZFS volumes to restore my data.

Below is the shell output of Proxmox debug installation at remote IPMI console. How can I get it working or get my data back?

Proxmox_ZFS_root_grub_rescue_zstd.png
 
  • Like
Reactions: ams_tschoening
I think the same my trouble is because of new ZFS features (zpool upgrade) and using new compression=zstd not covered by grub, but successfully booted by EFI (I've switched one of my servers manually in BIOS). Unfortunately, another my server has not UEFI at all, so I can't either mount ZFS volumes to restore my data.

Below is the shell output of Proxmox debug installation at remote IPMI console. How can I get it working or get my data back?

View attachment 24099

you can get your data by booting any live CD with recent ZFS support (e.g., Debian Bullseye/Sid, or Debian Buster with backports). you can also use such a live CD to move /boot away from ZFS (e.g., onto another spare disk), but that is a bit more advanced and maybe good to play-through in a VM first to avoid many cycles of "boot live CD, setup ZFS, chroot, change something, reboot, repeat" ;)
 
  • Like
Reactions: Ansy
you can get your data by booting any live CD with recent ZFS support (e.g., Debian Bullseye/Sid, or Debian Buster with backports). you can also use such a live CD to move /boot away from ZFS (e.g., onto another spare disk), but that is a bit more advanced and maybe good to play-through in a VM first to avoid many cycles of "boot live CD, setup ZFS, chroot, change something, reboot, repeat" ;)
Thanks, I'm already doing that things now, installing Proxmox on flash drive (inserted by datacenter support guys, ISO attached to IPMI via smb), but it'll be proxmox-ve_6.3-1.iso only, so I'll need to apt-get update ; apt-get dist-upgrade it, then get some space back from ZFS (is it'll be safe to repartition one of ZIL/LOG SSD's?) and make ext4 partition for /boot there? Or / (root) partition need to be ext4 too? Because I suppose I can't restore zstd-ed blocks off from ZFS filesystems -- is it enough for grub to use only /boot compatible?
 
grub only needs access to /boot (which contains the grub config and modules, kernel and initrd files)
 
  • Like
Reactions: Ansy
I have the same error with my installation. After reading a lot of information I believe my hardware is part of my problem.
I've a HP Microserver Gen8 and this hardware is not able to boot UEFI (according to the infos I found). I've installed proxmox in 2 mirrored 3TB drives. Both drives looking the same:

sda
- sda1
- sda2 (vfat)
- sda3

sdb
- sdb1
- sdb2 (vfat)
- sdb3

I can only mout sda2 or sdb2. Each of the two partitions has 2 subdirectories:
- EFI
- loader

Will I be right that this is for UEFI-boot? Why proxmox did the installation in UEFI mode when the hardware is not able to work with? Or have I choosed a wrong option during installation?
I installed proxmox after a couple of test. After installation the server ran fine and I migrated my old system. With the first reboot I had the problem.

Is there a way to fix my problem?
 
GRUB fails when it detects features that it does not know about .This has been a problem for years, especially for ZFS features that cannot be disabled (like dnodesize=auto). Try to disable all new features that are new on your rpool (and everything below it) while booting from a CD. UEFI boot with systemd does not have this problem, because it uses a ESP/FAT partition to boot from.
Or create a new rpool (and below) and copy (not zfssend/receive) all files or the original rpool. Once certain features are enabled and used (like dnodesize=auto or compresss=zstd) and cannot be disabled, files need to be rewritten and there is no way (known to me) to apply this to an existing ZFS pool.
In my (not so humble) opinion, this is a known issue with GRUB and ZFS, and it should not be surprising that it happens again with new ZFS 2.0 features.
 
I've finally found some workaround, but it's rather ugly.
  1. Boot with Proxmox installation ISO (not so fresh to import upgraded ZFS volumes).
  2. Get some free drive inserted to install on (I've personally used USB flash drive -- toooo slooow, then SSD removed from ZFS log/cache).
  3. Install fresh Proxmox into drive formatted as ext4, don't touch other drives.
  4. apt-get update ; apt-get dist-upgrade your installed Proxmox system, then reboot.
  5. Now you can zpool import -f rpool your ZFS volumes successfully thanks to ZFS software fresh updates.
  6. DO NOT REBOOT! You can backup your ZSF pool data to remote or external storage, etc.
  7. Reboot (I mean default Proxmox ZFS rpool configuration) will failed now, because grub looks for kernel in LVM instead of ZFS ROOT/boot.
  8. But you can intercept grub blue boot screen menu and manually edit (e) linux line to something like that linux /boot/vmlinuz-5.4.101-1-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet and press F10 to boot.
Writed as memorized after sleepless nights struggling with this bug, may be something is wrong -- please proove or correct my instructions.
 
I had the same issue. I felt it was easier to just reinstall the Proxmox OS since I had backups one day old of all my VM and LXC.

what is true command to disable the new features? What are the new features to disable? I did not see this information included in any previous posts. Also didn’t see that anywhere else on web.
 
I have dis
Then I would expect that there is a big warning while installing ZFS with grub and no UEFI! But I found nothing like this .... :mad:
Yes because most zfs people will reply as a mantra that it is your fault because for sure you will for sure have used an hardware raid controller.
And nobody really hear for the real problem.
Anyway I am using uefi and it is flawless (even in virtual machines...)
 
I wonder how the Proxmox installer of the next version with ZFS rpool (and GRUB) will boot without UEFI...
The current situation of two ways to boot Proxmox, with different configuration files, often comes up as part of the issues on this forum: Did you change the boot configuration? Which boot configuration do you use? How do you find out which boot configuration is used? This technicality is much more important than most users realize and it would appear that documentation and warnings about this are not reaching the (potentially) affected users. Every time ZFS has new features, people get excited and want to use them but even older features are dangerous to enable with GRUB, if they cannot be reverted once it is used by a single block anywhere on the rpool. And everyone affected only finds out afterwards and most of the time a reinstall was the way to fix it...
As an alternative, would it be possible to get GRUB to boot from the ESP partitions? Of course, this would require the ESP partitions to exists beforehand, and maybe not all required stuff is currently copied to those ESP partitions (yet)?