GRUB error on reboot - device not found

Discussion in 'Proxmox VE: Installation and configuration' started by euant, Dec 11, 2017.

  1. euant

    euant New Member

    Joined:
    May 11, 2017
    Messages:
    13
    Likes Received:
    4
    We recently powered down one of our Proxmox servers, and on reboot we're hitting a GRUB error (see below for a screenshot) stating that a device is not found. There have been no hardware changes on this server for the past year, and it has been rebooted several times without error before now.

    I've tried booting into the debug installer for PVE 5.1 as described here, and running the install-grub, update-grub2 and update-initramfs described here, but with no luck so far. The commands complete without reporting any errors, but upon rebooting I hit the same now familiar GRUB error.

    I also tried backing up the existing VMs whilst chrooted into the ZFS system from the installer, but it looks like vzdump relies on a whole bunch of services to be running such as DBus and pve-cluster and such which didn't want to cooperate whilst inside a chroot.

    If anybody has any ideas on eithe rhow to recover the existing system or create a backup of the two KVM VMs that were running inside it from a rescue session, I'd much appreciate it.

    grub_error.jpg
     
    osteoboon likes this.
  2. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,193
    Likes Received:
    494
    what does "set" and "ls" output in the grub rescue shell? what is your ZFS pool layout?

    you can backup the disk images from any ZFS capable live system, they are just zvols / datasets. the VM config is stored in pmxcfs, which is backed by an sqlite database in /var/lib/pve-cluster
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
    osteoboon likes this.
  3. euant

    euant New Member

    Joined:
    May 11, 2017
    Messages:
    13
    Likes Received:
    4
    Hi fabian, thanks for the quick response.

    The output of "set" and "ls" is as follows:

    Code:
    > set
    cmdpath=(hd0)
    prefix=(hd0)/ROOT/pve-1@/boot/grub
    root=hd0
    
    > ls
    (hd0) (hd0,gpt9) (hd0,gpt2) (hd0,gpt1) (hd1) (hd1,gpt9) (hd1,gpt2) (hd1,gpt1) (hd2) (hd2,gpt2) (hd2,gpt1)
    grub_output.jpg

    The ZFS pool layout is pretty simple:

    zpool_status.jpg

    Two hard drives in RAID 0, with an SSD with two partitions - logs and cache.

    I'm going to try copying the disk images and config to a USB hard drive now so that if all goes badly I can restore them on a clean install on this ssytem.
     
    osteoboon likes this.
  4. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,193
    Likes Received:
    494
    how big are the disks? what does "ls (hdX)" output for each disk X?

    (OT: raid 0 is not mirror ;))
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
    osteoboon likes this.
  5. euant

    euant New Member

    Joined:
    May 11, 2017
    Messages:
    13
    Likes Received:
    4
    Hi fabian,

    The two hard drives are 3.7TB (sold as 4TB). The SSD is 223GB (sold as 250GB). They are partitioned as follows (ignore the /dev/sdd at the bottom - that is a USB drive plugged in for the purpose of backing up the data):

    fdisk_l.jpg

    "ls (hd0)" simply reports "unknown filesystem". I've tried running "insmod zfs" and still get the same "unknown filesystem" message.
     
    osteoboon likes this.
  6. lankaster

    lankaster New Member

    Joined:
    Dec 10, 2017
    Messages:
    12
    Likes Received:
    8
    We have a similar problem at one Server.
    You can make a usb-boot-stick with grub and /boot directory to boot Proxmox kernel.
     
    #6 lankaster, Dec 11, 2017
    Last edited: Dec 11, 2017
    osteoboon likes this.
  7. euant

    euant New Member

    Joined:
    May 11, 2017
    Messages:
    13
    Likes Received:
    4
    Regarding the disk images being zvols/datasets, is there an easy way to export one of these to an archive or such that I can put on a USB hard disk?
     
  8. lankaster

    lankaster New Member

    Joined:
    Dec 10, 2017
    Messages:
    12
    Likes Received:
    8
    With USB Stick you can start your proxmox server. I copy my boot usb-stick right now for you. It's take some minutes. But at first you should import und export a zpool via Proxmox installer
     
    osteoboon and euant like this.
  9. lankaster

    lankaster New Member

    Joined:
    Dec 10, 2017
    Messages:
    12
    Likes Received:
    8
    1. Extract proxmoxusbboot.dd.gz with gunzip/7zip etc.
    2. Write proxmoxusbboot.dd with Linux dd or Windows win32image to USB
    3. try to boot from USB
     
    osteoboon and euant like this.
  10. euant

    euant New Member

    Joined:
    May 11, 2017
    Messages:
    13
    Likes Received:
    4
    Brilliant, thanks. Your USB boot disk works like a charm! I've managed to get logged in to the web interface and am now exporting my VMs to vzdump images to import in a clean install on a new server.
     
    osteoboon likes this.
  11. lankaster

    lankaster New Member

    Joined:
    Dec 10, 2017
    Messages:
    12
    Likes Received:
    8
    We're updated x4 Servers (5.0->5.1) and only one of them have this issue. I think it's really difficult to reproduce this bug. Maybe HW-RAID issue ? I don't now.
    Now we use ext4 for system & ceph. Ceph hat some issues with virtio drivers but its more reliable e.g. HA with live migration
     
    osteoboon and euant like this.
  12. OH24

    OH24 New Member

    Joined:
    Jun 1, 2017
    Messages:
    15
    Likes Received:
    6
    I am having the same experience. Booting a server after quite a while and the system is not starting up with the error messages "no such device" and "unknown filesystem". The "ls" command gives the output "unknow filesystem". Proxmox version is 4.4 and I did not update the packages before rebooting.

    Where do I find "proxmoxusbboot.dd.gz"?
     
    chrone and osteoboon like this.
  13. OH24

    OH24 New Member

    Joined:
    Jun 1, 2017
    Messages:
    15
    Likes Received:
    6
    BTW, that was the second Proxmox server within 1 week. Same problem with the same hardware which is HP Microserver Gen8 with Xeon CPU and 16 GB RAM. Both drives are connected as SATA AHCI devices.
     
    chrone and osteoboon like this.
  14. OH24

    OH24 New Member

    Joined:
    Jun 1, 2017
    Messages:
    15
    Likes Received:
    6
    When booting with a grub boot stick I get the error message "checksum verification failed". Any ideas?
     
  15. lankaster

    lankaster New Member

    Joined:
    Dec 10, 2017
    Messages:
    12
    Likes Received:
    8
    I think proxmox installer should create a /boot partition for this issue, because ... sometimes grub cannot find a root partition on zfs

    If somebody need to create a bootable usb with proxmox kernels you can use my small script. It's should be like:

    # mkfs.ext4 /dev/YOURUSBSTICK_PART
    # mount /dev/YOURUSBSTICK_PART /media
    # cp -Rpvf /boot/* /media/
    # grub-install --boot-directory=/media/ /dev/YOURUSBSTICK
    # sync
    # umount /media

    YOURUSBSTICK -> your Stick : sdb, sdc, ...
    YOURUSBSTICK_PART -> Partition of your USB stick
     
    osteoboon likes this.
  16. OH24

    OH24 New Member

    Joined:
    Jun 1, 2017
    Messages:
    15
    Likes Received:
    6
    With a grub boot stick I can ls into (hd0,gpt2)/ROOT/pve-1@/boot/grub but will get a "checksum verification failed" error after "insmod normal".
     
    osteoboon likes this.
  17. Mr Pumo

    Mr Pumo New Member

    Joined:
    Oct 4, 2017
    Messages:
    8
    Likes Received:
    1
    Same problem for me. Always HP Proliant G8 (G1610T).

    Seems related to /boot on ZFS raid-1 and related to a reboot (after having updated kernel ??)

    Hoping somebody can help to debug/resolve. I'm still working with a reduced RAID-1 ZFS (on 3 disks) due to need of remove one disk to insert a new one (G8 has 4 slots only) on which I made a new installation from scratch of Proxmox (then import existing ZFS storage).
     
    osteoboon likes this.
  18. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,193
    Likes Received:
    494
    it might help to provide the following:
    • pveversion -v
    • zpool layout
    • disks as seen by grub ('ls', and 'ls (hdX,gptY)' for all X and Y)
    • variables set for grub ('set')
    • is the pool importable when booting from a live-CD?
    • any error messages
    note that grub has both a pager ('set pager=1') and support for serial consoles to make copying/screenshotting output easier ;)
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
    chrone and osteoboon like this.
  19. OH24

    OH24 New Member

    Joined:
    Jun 1, 2017
    Messages:
    15
    Likes Received:
    6
    • pveversion -v
      • => 4.x, cannot check at the moment since I booted the Proxmox Debug Console
    • zpool layout
      • => 2 drives as ZFS mirror
    • disks as seen by grub ('ls', and 'ls (hdX,gptY)' for all X and Y)
      • except for (hd0,gpt2) and (hd1,gpt2) "unknow filesystem"
        • boot with (hd0,gpt2) => checksum verification failed
    • variables set for grub ('set')
      • like the ones above from euant
    • is the pool importable when booting from a live-CD?
      • yes, backing up the image at the moment
    • any error messages
      • except for the grub message none
    At the moment I am running a "zpool scrub rpool" which take about 10h. Any ideas what I can do else to fix the boot issue?
     
    osteoboon likes this.
  20. OH24

    OH24 New Member

    Joined:
    Jun 1, 2017
    Messages:
    15
    Likes Received:
    6
    I read that downgrading grub might be an option. Is that true?
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice