GRUB error on reboot - device not found

euant

New Member
May 11, 2017
13
4
1
31
We recently powered down one of our Proxmox servers, and on reboot we're hitting a GRUB error (see below for a screenshot) stating that a device is not found. There have been no hardware changes on this server for the past year, and it has been rebooted several times without error before now.

I've tried booting into the debug installer for PVE 5.1 as described here, and running the install-grub, update-grub2 and update-initramfs described here, but with no luck so far. The commands complete without reporting any errors, but upon rebooting I hit the same now familiar GRUB error.

I also tried backing up the existing VMs whilst chrooted into the ZFS system from the installer, but it looks like vzdump relies on a whole bunch of services to be running such as DBus and pve-cluster and such which didn't want to cooperate whilst inside a chroot.

If anybody has any ideas on eithe rhow to recover the existing system or create a backup of the two KVM VMs that were running inside it from a rescue session, I'd much appreciate it.

grub_error.jpg
 
  • Like
Reactions: osteoboon
what does "set" and "ls" output in the grub rescue shell? what is your ZFS pool layout?

you can backup the disk images from any ZFS capable live system, they are just zvols / datasets. the VM config is stored in pmxcfs, which is backed by an sqlite database in /var/lib/pve-cluster
 
  • Like
Reactions: osteoboon
what does "set" and "ls" output in the grub rescue shell? what is your ZFS pool layout?

you can backup the disk images from any ZFS capable live system, they are just zvols / datasets. the VM config is stored in pmxcfs, which is backed by an sqlite database in /var/lib/pve-cluster

Hi fabian, thanks for the quick response.

The output of "set" and "ls" is as follows:

Code:
> set
cmdpath=(hd0)
prefix=(hd0)/ROOT/pve-1@/boot/grub
root=hd0

> ls
(hd0) (hd0,gpt9) (hd0,gpt2) (hd0,gpt1) (hd1) (hd1,gpt9) (hd1,gpt2) (hd1,gpt1) (hd2) (hd2,gpt2) (hd2,gpt1)

grub_output.jpg

The ZFS pool layout is pretty simple:

zpool_status.jpg

Two hard drives in RAID 0, with an SSD with two partitions - logs and cache.

I'm going to try copying the disk images and config to a USB hard drive now so that if all goes badly I can restore them on a clean install on this ssytem.
 
  • Like
Reactions: osteoboon
how big are the disks? what does "ls (hdX)" output for each disk X?

(OT: raid 0 is not mirror ;))
 
  • Like
Reactions: osteoboon
how big are the disks? what does "ls (hdX)" output for each disk X?

(OT: raid 0 is not mirror ;))

Hi fabian,

The two hard drives are 3.7TB (sold as 4TB). The SSD is 223GB (sold as 250GB). They are partitioned as follows (ignore the /dev/sdd at the bottom - that is a USB drive plugged in for the purpose of backing up the data):

fdisk_l.jpg

"ls (hd0)" simply reports "unknown filesystem". I've tried running "insmod zfs" and still get the same "unknown filesystem" message.
 
  • Like
Reactions: osteoboon
We have a similar problem at one Server.
You can make a usb-boot-stick with grub and /boot directory to boot Proxmox kernel.
 
Last edited:
  • Like
Reactions: osteoboon
Regarding the disk images being zvols/datasets, is there an easy way to export one of these to an archive or such that I can put on a USB hard disk?
 
With USB Stick you can start your proxmox server. I copy my boot usb-stick right now for you. It's take some minutes. But at first you should import und export a zpool via Proxmox installer
 
  • Like
Reactions: osteoboon and euant
1. Extract proxmoxusbboot.dd.gz with gunzip/7zip etc.
2. Write proxmoxusbboot.dd with Linux dd or Windows win32image to USB
3. try to boot from USB

Brilliant, thanks. Your USB boot disk works like a charm! I've managed to get logged in to the web interface and am now exporting my VMs to vzdump images to import in a clean install on a new server.
 
  • Like
Reactions: osteoboon
We're updated x4 Servers (5.0->5.1) and only one of them have this issue. I think it's really difficult to reproduce this bug. Maybe HW-RAID issue ? I don't now.
Now we use ext4 for system & ceph. Ceph hat some issues with virtio drivers but its more reliable e.g. HA with live migration
 
  • Like
Reactions: osteoboon and euant
I am having the same experience. Booting a server after quite a while and the system is not starting up with the error messages "no such device" and "unknown filesystem". The "ls" command gives the output "unknow filesystem". Proxmox version is 4.4 and I did not update the packages before rebooting.

Where do I find "proxmoxusbboot.dd.gz"?
 
BTW, that was the second Proxmox server within 1 week. Same problem with the same hardware which is HP Microserver Gen8 with Xeon CPU and 16 GB RAM. Both drives are connected as SATA AHCI devices.
 
When booting with a grub boot stick I get the error message "checksum verification failed". Any ideas?
 
I think proxmox installer should create a /boot partition for this issue, because ... sometimes grub cannot find a root partition on zfs

If somebody need to create a bootable usb with proxmox kernels you can use my small script. It's should be like:

# mkfs.ext4 /dev/YOURUSBSTICK_PART
# mount /dev/YOURUSBSTICK_PART /media
# cp -Rpvf /boot/* /media/
# grub-install --boot-directory=/media/ /dev/YOURUSBSTICK
# sync
# umount /media

YOURUSBSTICK -> your Stick : sdb, sdc, ...
YOURUSBSTICK_PART -> Partition of your USB stick
 
  • Like
Reactions: osteoboon
With a grub boot stick I can ls into (hd0,gpt2)/ROOT/pve-1@/boot/grub but will get a "checksum verification failed" error after "insmod normal".
 
  • Like
Reactions: osteoboon
Same problem for me. Always HP Proliant G8 (G1610T).

Seems related to /boot on ZFS raid-1 and related to a reboot (after having updated kernel ??)

Hoping somebody can help to debug/resolve. I'm still working with a reduced RAID-1 ZFS (on 3 disks) due to need of remove one disk to insert a new one (G8 has 4 slots only) on which I made a new installation from scratch of Proxmox (then import existing ZFS storage).
 
  • Like
Reactions: osteoboon
it might help to provide the following:
  • pveversion -v
  • zpool layout
  • disks as seen by grub ('ls', and 'ls (hdX,gptY)' for all X and Y)
  • variables set for grub ('set')
  • is the pool importable when booting from a live-CD?
  • any error messages
note that grub has both a pager ('set pager=1') and support for serial consoles to make copying/screenshotting output easier ;)
 
  • pveversion -v
    • => 4.x, cannot check at the moment since I booted the Proxmox Debug Console
  • zpool layout
    • => 2 drives as ZFS mirror
  • disks as seen by grub ('ls', and 'ls (hdX,gptY)' for all X and Y)
    • except for (hd0,gpt2) and (hd1,gpt2) "unknow filesystem"
      • boot with (hd0,gpt2) => checksum verification failed
  • variables set for grub ('set')
    • like the ones above from euant
  • is the pool importable when booting from a live-CD?
    • yes, backing up the image at the moment
  • any error messages
    • except for the grub message none
At the moment I am running a "zpool scrub rpool" which take about 10h. Any ideas what I can do else to fix the boot issue?
 
  • Like
Reactions: osteoboon

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!