GRUB error on reboot - device not found

Rob Loan

Member
Mar 25, 2017
48
5
13
56
I bet
GRUB_CMDLINE_LINUX_DEFAULT="rootdelay=1"
in /etc/default/grub
then run update-grub (or add rootdelay=1 to the linux line of /boot/grub/grub.cfg )

would fix everything. the kernels in pve 5.x tries to load vmlinuz-*-pve faster than 4.x and faster than they might be ready. I hit this with a NVMe device.

Edit: oops, this is for the OP who's boot loader worked, but grub couldn't find vmlinuz-*-pve. For cases with "no OS found" that's a boot loader issue, perhaps GPT and legacy BIOS, most commonly seen with a "I did a full clean install, but the reboot failed to find an OS". in those cases, turn off legacy boot and let the bios do the uEFI thing. (sometimes will fail to find the cdrom image on the USB stick, but this is all for another thread.)
 
Last edited:

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
5,025
821
163
  • pveversion -v
    • => 4.x, cannot check at the moment since I booted the Proxmox Debug Console
  • zpool layout
    • => 2 drives as ZFS mirror
  • disks as seen by grub ('ls', and 'ls (hdX,gptY)' for all X and Y)
    • except for (hd0,gpt2) and (hd1,gpt2) "unknow filesystem"
      • boot with (hd0,gpt2) => checksum verification failed
  • variables set for grub ('set')
    • like the ones above from euant
  • is the pool importable when booting from a live-CD?
    • yes, backing up the image at the moment
  • any error messages
    • except for the grub message none
At the moment I am running a "zpool scrub rpool" which take about 10h. Any ideas what I can do else to fix the boot issue?

if you import the pool, bind mount /dev /proc and /sys into it and then chroot, what do
  • grub-probe /
  • grub-probe -vvvv /
  • update-grub
  • grub-install /dev/sda
  • grub-install /dev/sdb
report?

you should also be able to collect 'pveversion -v' that way..
 

OH24

New Member
Jun 1, 2017
15
6
3
43
Thanks @fabian for your answer but I did a clean install yesterday. Do you have any hunch what the cause of the non-booting grub might be?
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
5,025
821
163
activation of some zpool feature that grub does not support, corrupt BIOS boot partition (or overwritten with an old Grub version), BIOS/disk controller lying about disk sizes. but the first and last (usually) lead to different error messages.. a wrong/broken/corrupt Grub stage1 in the BIOS boot partition should be fixed by re-running grub-install.
 
  • Like
Reactions: Lock

lankaster

New Member
Dec 10, 2017
12
8
3
38
@fabian After system upgrade proxmox 5.0 -> 5.1 one of 4 nodes can't boot:
Code:
error: no such device xxxxxxx
error: unknown filesystem
Entering rescue mode ...

upgrade grub-probe, grub update via chroot / after booting via usb stick doesn't help. This node ran with hand made usb-boot sick one month.
  • pveversion -v
Code:
proxmox-ve: 5.1-26 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-36 (running version: 5.1-36/131401db)
pve-kernel-4.13.4-1-pve: 4.13.4-26
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9
  • zpool layout: zpool on HW raid1
  • (hd0)(hd0,gpt9)(hd0,gpt2)(hd0,gpt1)
  • variables -> cannot show (we're reinstall node and migrate to ceph)
  • We can import/export/scub a pool via proxmox installer without problem.
HW Info: HP microserver gen 10, AMD Opteron(tm) X3421 APU, RAM: 16 GB ECC, RAID: Fujitsu D2607-A1, HDDs x2 10TB - ST10000NM0016
 

Rob Loan

Member
Mar 25, 2017
48
5
13
56
> We can import/export/scub a pool via proxmox installer without problem.

sorry my note above wasn't clear, but choose emergency boot in the PVE install iso, or your USB stick to get the box up. (or zpool import ; exit to continue the stuck boot)

zpool remove rpool sdc1 # simplify pool design, add back when auto importing is working
zpool remove rpool sdc2 # simplify pool design, add back when auto importing is working
zpool scrub rpool # mainly to update /etc/zfs/zpool.cache but is not a bad idea anyway
zpool get bootfs rpool # verify it returns rpool/ROOT/pve-1
update-initramfs -u # to commit /etc/zfs/zpool.cache to /boot/initrd.img*
vi /etc/default/grub # add "rootdelay=1" in GRUB_CMDLINE_LINUX_DEFAULT
update-grub # to commit changes to /boot/grub/grub.cfg
grub-probe / # verify it returns zfs
grub-install /dev/sda
grub-install /dev/sdb
reboot # test it :)
 
Last edited:
  • Like
Reactions: tomte76

OH24

New Member
Jun 1, 2017
15
6
3
43
Maybe it has something to do with the RAID controller of the HP microserver, even if we use the drives as SATA AHCI ones. I never had this problem with other server hardware (e.g. Supermicro with no RAID controller) so far.
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
5,025
821
163
seems likely since that seems to be a common factor..
 

lankaster

New Member
Dec 10, 2017
12
8
3
38
@Rob Loan I am sorry my English really poor.
* At first we try to boot a kernel via grub rescue, but it doesnt work -> ls (hd0,gpt1), ls (hd0,gpt2), ls (hd0,gpt9) shows "unknown filesystem"
* Second: We try to repair a grub via PVE install iso -> chroot->mount /dev,/proc, /sys etc.-> update-initramfs, update-grub, grub-install -> it doesn't work, but pool was OK
* After that we get the node via sub stick up and try again to repair a grub loader -> it doesn't work, but node ran and pool was ok
 

cserem

Member
Jan 15, 2017
17
5
23
37
I do not mean to hijack this thread, but there was no solution posted yet, and I seem to be suffering from the same problem.

The system I am using is a Dell PowerEdge R530 server.
After rebooting for a kernel upgrade it took around 1-2 minutes to reach the
Code:
grub>
prompt.
There was no error message, it just displayed this prompt instead of the regular menu.

Trying the recovery option of the PVE installer again took 1-2 minutes to execute, but it failed with
Code:
error: attempt to read or write outside of disk 'hd2'
Press any key to continue
(then, without pressing a key, after a couple seconds it re-displayed the boot menu of the installer)

After succesfully booting from an external device (kernel and initrd on an usb stick) I have executed the requested commans: https://pastebin.com/1zadrmQA

After rebooting again I get dropped to the "grub rescue>" (notice before it was plain "grub>") prompt and running "insmod normal" fails again with the outside of disk error.

I am now I am only able to boot into the system with an external grub and external kernel disk.
 
  • Like
Reactions: chrone

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
5,025
821
163
that error message indicates that your motherboard (or scsi controller) lies about the disk size. you will need to boot from an external small boot device until PVE supports ZFS and UEFI boot (or play the lottery on every kernel and grub update and hope that grub files, kernel and initrd are within the wrong limits imposed by your hardware).
 
  • Like
Reactions: chrone

osteoboon

New Member
Jan 4, 2018
6
0
1
34
Hi All. Thanks very much to everyone for making and supporting proxmox. I'm still pretty new to proxmox and I'm very impressed with it, but I've encountered three problems in my first two weeks that make me begin to wonder if this is really the right solution for me.

The first problem (which occurred last week) was an apparent inability to change the node's ip address (see previous posts to this forum on that topic). I haven't figured out how to solve that, but it's moot for me now because the node is back in its original network.

The second problem (occurred this past weekend) was that the node could not perform automatic updates (I saw error messages in the web interface indicating this and solved it by changing the /etc/apt/sources.list.d/ file to point to the non-enterprise non-subscription sources).

But fixing the second problem seems to have created the third problem I've encountered (occurred today) which is the exact same problem as the original poster (euant) in this thread had a month ago. Their screenshot looks identical to mine except that (to be expected, I know) my volume ID or disk ID is different than theirs.

My hardware is an HP ProLiant ML10 v2 with 5 physical disks (each 4TB total space as advertised) connected to the HP Smart Array B120i RAID controller set to AHCI mode in the BIOS. The proxmox 5.1 install image worked beautifully from a USB stick and seems to have created a RAID array and logical volumes with ZFS all very much to my satisfaction and with minimal time and effort during installation. The install was truly a quick, simple, and problem-free process. But with the first upgrade, I can now no longer boot from either the hard drive or the install image (rescue mode) on the USB stick.

I managed to successfully reboot this server at least 8 times between original installation and encountering this third problem, but nightly automatic updates (apt-get update) were failing before because the sources were pointing only to the enterprise update archives, and I have not yet subscribed. So as soon as I fixed that problem (apt-get update failing), the node apparently updated itself, and in the process, broke itself so it no longer boots.

I'd like to try booting from lankaster's boot stick that euant used successfully, but I don't see an image to write to my USB stick?

And of course ideally, I'd like to again be able to boot from the hard drives.

Since there seems to be several people experiencing this same problem, is there anything I can do to help find the root cause so that someone can fix it?

Thank you fabian for asking euant for some details. For me, the output of "set" in the grub rescue shell is identical to that of euant:

Code:
> set
cmdpath=(hd0)
prefix=(hd0)/ROOT/pve-1@/boot/grub
root=hd0

And the output of "ls" in the grub rescue shell is also similar, but I have 5 identical physical disks, so:

Code:
> ls
(hdX)  (hdX,gpt9)  (hdX,gpt2)  (hdX,gpt1)...
where X=0,1,2,3,4

Not being able to boot into the node, I don't know how to give you my ZFS pool layout.

Thanks for any suggestions on how to resolve, and I'll be happy to try anything else to help troubleshoot the root cause.

-Kevin
 

Rob Loan

Member
Mar 25, 2017
48
5
13
56
GRUB2 gets confused. While the legacy boot (grub-install /dev/sda) seems simple, it isn't when it doesn't work. a good read is http://www.rodsbooks.com/efi-bootloaders/

When I had an issue with a nvme ZFS boot device, I switched to uEFI boot and used http://www.rodsbooks.com/efi-bootloaders/refit.html as a boot manager and then grub's boot loader. I can only hope showing what I did, might help your case.

I see three ways ZFS is partitioned:

Code:
root@pve1:/etc/pve/rob# fdisk -l
Disk /dev/nvme0n1: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 16A405B5-D719-41F0-81CB-2BEE62B7FF91

Device              Start        End    Sectors   Size Type
/dev/nvme0n1p1         34       2047       2014  1007K BIOS boot
/dev/nvme0n1p2       2048 1953508749 1953506702 931.5G Solaris /usr & Apple ZFS
/dev/nvme0n1p9 1953508750 1953525134      16385     8M Solaris reserved 1

root@pve2:~# fdisk -l
Disk /dev/nvme0n1: 477 GiB, 512110190592 bytes, 1000215216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000

root@pvebak:~# fdisk -l 
Disk /dev/sda: 55.9 GiB, 60022480896 bytes, 117231408 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 19148AE5-C2D0-4860-9E4B-E4EF88E4510B

Device         Start       End   Sectors  Size Type
/dev/sda1         34      2047      2014 1007K BIOS boot
/dev/sda2       2048 117214989 117212942 55.9G Solaris /usr & Apple ZFS
/dev/sda9  117214990 117231374     16385    8M Solaris reserved 1

Disk /dev/sdb: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 6B235178-354D-B741-8335-C2FF49FFA6AF

Device           Start         End     Sectors  Size Type
/dev/sdb1         2048 11721027583 11721025536  5.5T Solaris /usr & Apple ZFS
/dev/sdb9  11721027584 11721043967       16384    8M Solaris reserved 1

The top case, pve1 was converted from legacy to uEFI while the other two are booting legacy. Note sdb on pvebak skips the space at the begging of the disk so there is still room for `grub-install /dev/sdb` even if there is no partition for it.

make the legacy partition a tiny uEFI partition

Code:
root@pve1# mkdosfs -F 32 -n EFI /dev/nvme0n1p1
root@pve1# mount /boot/efi
root@pve1# cd /boot/efi
install rEFIt

that got a boot manager, but I'm always loading proxmox so I wanted a hard coded boot loader

Code:
root@pve1# cp /boot/grub/x86_64-efi/grub.efi /boot/efi/EFI/proxmox/grubx64.efi
root@pve1# grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=proxmox --recheck --no-floppy

and it boots straight into proxmox. for reference this is the mess I have in the tiny FAT uEFI partition:

Code:
root@pve1:/boot/grub# find /boot/efi -print
/boot/efi
/boot/efi/EFI
/boot/efi/EFI/proxmox
/boot/efi/EFI/proxmox/grubx64.efi
/boot/efi/EFI/BOOT
/boot/efi/EFI/BOOT/BOOT.CSV
/boot/efi/EFI/BOOT/BOOTx64.EFI
/boot/efi/EFI/BOOT/refind.conf
/boot/efi/EFI/BOOT/icons
/boot/efi/EFI/BOOT/icons/arrow_left.png
/boot/efi/EFI/BOOT/icons/arrow_right.png
/boot/efi/EFI/BOOT/icons/boot_linux.png
/boot/efi/EFI/BOOT/icons/boot_win.png
/boot/efi/EFI/BOOT/icons/func_about.png
/boot/efi/EFI/BOOT/icons/func_csr_rotate.png
/boot/efi/EFI/BOOT/icons/func_exit.png
/boot/efi/EFI/BOOT/icons/func_firmware.png
/boot/efi/EFI/BOOT/icons/func_hidden.png
/boot/efi/EFI/BOOT/icons/func_reset.png
/boot/efi/EFI/BOOT/icons/func_shutdown.png
/boot/efi/EFI/BOOT/icons/mouse.png
/boot/efi/EFI/BOOT/icons/os_debian.png
/boot/efi/EFI/BOOT/icons/os_proxmox.png
/boot/efi/EFI/BOOT/icons/README
/boot/efi/EFI/BOOT/icons/tool_apple_rescue.png
/boot/efi/EFI/BOOT/icons/tool_fwupdate.png
/boot/efi/EFI/BOOT/icons/tool_memtest.png
/boot/efi/EFI/BOOT/icons/tool_mok_tool.png
/boot/efi/EFI/BOOT/icons/tool_netboot.png
/boot/efi/EFI/BOOT/icons/tool_part.png
/boot/efi/EFI/BOOT/icons/tool_rescue.png
/boot/efi/EFI/BOOT/icons/tool_shell.png
/boot/efi/EFI/BOOT/icons/tool_windows_rescue.png
/boot/efi/EFI/BOOT/icons/transparent.png
/boot/efi/EFI/BOOT/icons/vol_external.png
/boot/efi/EFI/BOOT/icons/vol_internal.png
/boot/efi/EFI/BOOT/icons/vol_net.png
/boot/efi/EFI/BOOT/icons/vol_optical.png
/boot/efi/EFI/BOOT/keys
/boot/efi/EFI/BOOT/keys/altlinux.cer
/boot/efi/EFI/BOOT/keys/canonical-uefi-ca.der
/boot/efi/EFI/BOOT/keys/centos.cer
/boot/efi/EFI/BOOT/keys/fedora-ca.cer
/boot/efi/EFI/BOOT/keys/microsoft-kekca-public.der
/boot/efi/EFI/BOOT/keys/microsoft-pca-public.der
/boot/efi/EFI/BOOT/keys/microsoft-uefica-public.der
/boot/efi/EFI/BOOT/keys/openSUSE-UEFI-CA-Certificate-4096.cer
/boot/efi/EFI/BOOT/keys/openSUSE-UEFI-CA-Certificate.cer
/boot/efi/EFI/BOOT/keys/refind.cer
/boot/efi/EFI/BOOT/keys/refind_local.cer
/boot/efi/EFI/BOOT/keys/refind_local.crt
/boot/efi/EFI/BOOT/keys/SLES-UEFI-CA-Certificate.cer

None of this is proxmox or even linux fault, but rather motherboard or disk card BIOS errors.
 
  • Like
Reactions: osteoboon

osteoboon

New Member
Jan 4, 2018
6
0
1
34
1. Extract proxmoxusbboot.dd.gz with gunzip/7zip etc.
2. Write proxmoxusbboot.dd with Linux dd or Windows win32image to USB
3. try to boot from USB

Thanks for your several contributions to this thread, lankaster.

Where can I find your proxmoxusbboot.dd.gz ?

Once I can boot into my node, I'll be sure to execute your guidance on creating a bootable usb with proxmox kernels with your small script too, but at this point, I can't even boot into it, and it's the only machine I have with proxmox installed on it.
 

osteoboon

New Member
Jan 4, 2018
6
0
1
34
Wow Rob, thanks so much for the detailed follow-up! I really appreciate it! I agree that Rod Smith's articles on booting are extremely helpful, and I've used rEFIt in the past with other machines.

But after having invested something like 30 hours creating and configuring virtual machines in my node, I'm not yet ready to modify my installed hard drives. All the partitioning on each of my 5 HDD was done by the proxmox installer, and I'm thinking that manually modifying one or more of them would most likely break the node even worse than just having a boot problem as I have now.

I'm going to try booting into my node from a USB stick first so I can make backups of these virtual machines, but I'll definitely try your suggestion after I have backups.

Thanks again!

-Kevin
 

tomte76

New Member
Mar 6, 2015
20
0
1
Are there any news on that? Having the same problem on a HP DL360G7 with Proxmox 4.x (latest updates) RAIDz1 after a crash. The disks are one RAID0 array each on a P410i controller and handed to ZFS like this (/dev/sda, /dev/sdb etc.). This worked fine since a long time but after the last crash and reboot it does not come up any more. It shows

error: no such device xxxxxxx
error: unknown filesystem
Entering rescue mode ...

Already booted with different rescue disks (PVE, Ubuntu) and the zpool is ok. All disks are fine, all data seems to be in place. I also compared it with working systems for /etc/default/grub and /boot and it looks fine. Also reinstalled grub on all disks multiple times without success. Any other chances to get this working again or do I need to reinstall the whole system?
 

tomte76

New Member
Mar 6, 2015
20
0
1
interesting. using an older proxmox 4.4 iso and rescue boot does not work. It cannot find rpool. If i boot ubuntu 17.10 and do a "zpool import rpool" everything works fine.
 

tomte76

New Member
Mar 6, 2015
20
0
1
  • tried downgrading grub without success.
  • tried recreating /etc/zfs/zpool.cache and update initramfs without success.
  • scrubed rpool without any error
  • reinstalled grub on all devices afterwards
  • tried to boot from all disks by setting the bootdisks
  • error is still the same
Any other ideas except reinstalling?
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
5,025
821
163
  • tried downgrading grub without success.
  • tried recreating /etc/zfs/zpool.cache and update initramfs without success.
  • scrubed rpool without any error
  • reinstalled grub on all devices afterwards
  • tried to boot from all disks by setting the bootdisks
  • error is still the same
Any other ideas except reinstalling?

use a separate /boot partition that is not on ZFS.
 
  • Like
Reactions: tomte76

tomte76

New Member
Mar 6, 2015
20
0
1
Thank you. Unfortunately I have no spare disk slots or space on existing disks to set up a separate boot partition. I'll backup the data now with zfs send. Does the recommendation to have a non-zfs /boot partition mean, that it is not recommended to boot from ZFS any more? The I woul take to disks out of the zpool to set up as non-zfs system disks.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!