Boot issues

Archmatux

Active Member
Oct 18, 2016
16
5
43
32
Hi All,

I have a few Proxmox hosts.
On my more recent hosts I have 4 SSDs in ZFS RAID 10 (pool of mirrors) using the ZFS RAID 10 option in the installer.

My understanding is that on the first mirror the disks are partitioned with the second partition being ZFS and the first partition being for the bootloader.

One of my servers PVE01-01 has been running for several months but after a reboot gets stuck in grub rescue:

Code:
error: no such device: 886a2b06b22c62c7
error: unknown filesystem.
Entering rescue mode...
grub rescue> _

I've tried booting from each of the disks with no success.
I've also tried a live disk to check the SMART data which reports the disks as being OK.

My Proxmox install was fully up to date and I'm struggling to find a rescue disk that supports the feature flags on the pool so that I can mount the zpool and chroot in to reinstall grub.

Does anyone have any ideas?

I do have full backups so can nuke and re-install if necessary.
All my important VMs/Containers have replication and HA and so are running on other nodes.
 

Attachments

  • Capture.PNG
    Capture.PNG
    37.9 KB · Views: 11
I was able to resolve this in the end as follows:

I used an ArchLinux live disk (any live distro with bleeding edge packages should work) and used the cow_spacesize=10G option at boot so I had some room to work with.

I downloaded and compiled the zfs-linux and zfs-utils packages from the AUR (had to check out a tagged release with git to compile against the live disk kernel).

I was then able to mount my Proxmox ZFS root, chroot in and re-install Grub as follows:

Code:
zfs import rpool -R /mnt -f
mount -t proc /proc /mnt/proc
mount --rbind /dev /mnt/dev
mount --rbind /sys /mnt/sys
chroot /mnt /bin/bash # Chroot into Proxmox
source /etc/profile # Required to resolve issues with $PATH
grub-install /dev/sda
grub-install /dev/sdb
update-grub2
update-initramfs -u
exit # Back to Live Disk
umount /mnt/sys
umount /mnt/dev
umount /mnt/proc
zfs unmount rpool/ROOT/pve-1
shutdown -r now

After this the Proxmox install booted up as normal.

My question now is, how would I be best to go about preventing this from being an issue in the future?
 
  • Like
Reactions: moxmox
My Proxmox install was fully up to date and I'm struggling to find a rescue disk that supports the feature flags on the pool so that I can mount the zpool and chroot in to reinstall grub.
hm - If you've upgraded your zpool it's not unlikely that grub cannot boot it - grub has a separate implementation for ZFS, with a reduced featureset
(see '2.3 Create the boot pool' in https://github.com/zfsonlinux/zfs/wiki/Debian-Buster-Root-on-ZFS)

If possible on your system consider switching to UEFI booting and systemd-boot (the default for uefi booted systems from ZFS 6.0 onwards: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysboot

What's the disk-controller where the disks are connected to? (sometimes those boot-errors are due to the drivers for RAID-controllers in grub)

I hope this helps!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!