Proxmox (on ZFS) refuses to boot after the latest kernel upgrade

nightrider

New Member
Mar 22, 2018
6
0
1
46
Hello!

About a month ago I've decided to give Proxmox-VE a try. The installation and the creation of a few test VMs went without hitches. Everything was working smooth (reboots also) until yesterday when I performed dist-upgrade to version 5.1.42. After the upgrade I had to reboot because of the new kernel version and after the reboot the Grub failed to continue after stage 1. It gives some errors and then drops to rescue.

Machine setup info

ProLiant DL360e Gen8 (not UEFI) with two 4TB disks. Proxmox-VE with ZFS in mirror (RAID1) mode.

Here is the Grub stage 1 output:

Code:
Attempting Boot From Hard Drive (C:)
error: no such device 295864b8ee73d9fe.
error: unknown filesystem.
Entering rescue mode ...

What I've tried so far

- Used the Proxmox-VE CD to boot into rescue environment. Then imported and chrooted the zfs pool, and after that re-installed grub to /dev/sda and dev/sdb and regenerated the grub config. After reboot the boot process ended again with the same grub errors.

Useful detailed info

Code:
grub-probe /
returns zfs
Code:
295864b8ee73d9fe
corresponds to sda2/sdb2 (zfs pool) when converted to decimal

Code:
grub-rescue> ls
(hd0) (hd0,gpt9) (hd0,gpt2) (hd0,gpt1) (hd1) (hd1,gpt9) (hd1,gpt2) (hd1,gpt1)
grub-rescue> ls (hd0,gpt2)
(hd0,gpt2): Filesystem is unknown.
grub-rescue> set
cmdpath=(hd0)
prefix=(hd0)/ROOT/pve-1@/boot/grub
root=hd0
grub-rescue> insmod normal
error: unknown filesystem.
grub-rescue> insmod zfs
grub-rescue> set debug=zfs
grub-rescue> ls (hd0,gpt2)
fs/zfs/zfs.c:1192: label ok 0
fs/zfs/zfs.c:1007: check 2 passed
fs/zfs/zfs.c:1018: check 3 passed
fs/zfs/zfs.c:1025: check 4 passed
fs/zfs/zfs.c:1035: check 6 passed
fs/zfs/zfs.c:1043: check 7 passed
fs/zfs/zfs.c:1054: check 8 passed
fs/zfs/zfs.c:1064: check 9 passed
fs/zfs/zfs.c:1086: check 11 passed
fs/zfs/zfs.c:1112: check 10 passed
fs/zfs/zfs.c:1128: str=com.delphix:hole_birth
fs/zfs/zfs.c:1128: str=com.delphix:embedded_data
fs/zfs/zfs.c:1137: check 12 passed (feature flags)
fs/zfs/zfs.c:1878: zio_read: E 0: size 2048/2048
fs/zfs/zfs.c:1899: endian = -1
fs/zfs/zfs.c:595: dva=8, 11c0717f8
fs/zfs/zfs.c:442: checksum feltcher4 verification failed
fs/zfs/zfs.c:447: actual checksum 0000000000000000 0000000000000000 0000000000000000 0000000000000000
fs/zfs/zfs.c:452: expected checksum 00000004498e0fdc 000007fa34d3423a ee2726 04a3580d3cb4fb7d
fs/zfs/zfs.c:1922: incorrect checksum
(hd0,gpt2): Filesystem is unknown.
grub-rescue>

Code:
root@proxmox:/# blkid
/dev/sda2: LABEL="rpool" UUID="2979241898942913022" UUID_SUB="11474682908795965178" TYPE="zfs_member" PARTLABEL="zfs" PARTUUID="1a99e21a-3305-4f4c-93e2-1c4ce9490810"
/dev/sdb2: LABEL="rpool" UUID="2979241898942913022" UUID_SUB="10144792332914432543" TYPE="zfs_member" PARTLABEL="zfs" PARTUUID="5a3ee6dc-8185-4cc5-aaaf-c7a89a399613"
/dev/sda1: PARTUUID="redacted"
/dev/sda9: PARTUUID="redacted"
/dev/sdb1: PARTUUID="redacted"
/dev/sdb9: PARTUUID="redacted"
root@proxmox:/#

The output of the blkid is redacted (zfs volumes removed) for brevity.

Code:
root@proxmox:/# zpool get all
NAME   PROPERTY                       VALUE                          SOURCE
rpool  size                           3.6T                           -
rpool  capacity                       1%                             -
rpool  altroot                        /mnt                           default
rpool  health                         ONLINE                         -
rpool  guid                           2979241898942913022            -
rpool  version                        -                              default
rpool  bootfs                         rpool/ROOT/pve-1               local
rpool  delegation                     on                             default
rpool  autoreplace                    off                            default
rpool  cachefile                      none                           local
rpool  failmode                       wait                           default
rpool  listsnapshots                  off                            default
rpool  autoexpand                     off                            default
rpool  dedupditto                     0                              default
rpool  dedupratio                     1.00x                          -
rpool  free                           3.57T                          -
rpool  allocated                      57.0G                          -
rpool  readonly                       off                            -
rpool  ashift                         12                             local
rpool  comment                        -                              default
rpool  expandsize                     -                              -
rpool  freeing                        0                              -
rpool  fragmentation                  0%                             -
rpool  leaked                         0                              -
rpool  multihost                      off                            default
rpool  feature@async_destroy          enabled                        local
rpool  feature@empty_bpobj            active                         local
rpool  feature@lz4_compress           active                         local
rpool  feature@multi_vdev_crash_dump  enabled                        local
rpool  feature@spacemap_histogram     active                         local
rpool  feature@enabled_txg            active                         local
rpool  feature@hole_birth             active                         local
rpool  feature@extensible_dataset     active                         local
rpool  feature@embedded_data          active                         local
rpool  feature@bookmarks              enabled                        local
rpool  feature@filesystem_limits      enabled                        local
rpool  feature@large_blocks           enabled                        local
rpool  feature@large_dnode            enabled                        local
rpool  feature@sha512                 enabled                        local
rpool  feature@skein                  enabled                        local
rpool  featureeedonr                  enabled                        local
rpool  feature@userobj_accounting     active                         local
root@proxmox:/#

I guess I can use thumb drive (as a last resort) to install the boot partition on and boot from it.

Do you have anything in mind what is going on as I am kinda getting out of ideas? Any help would be highly appreaciated! Thank you in advance!
 
could you provide the pool properties of the rpool, and the dataset properties of the rpool and ROOT datasets?
 
Fabian, thank you for your reply!

Please find the output attached as a log file since it is too long to paste here as text.

Please tell me if you need any more info! Thank you!
 

Attachments

  • zfs datasets properties.txt
    58.8 KB · Views: 27
could you also add "zpool get all rpool"? thanks!
 
Of course. Please find it attached as log file. Thank you!
 

Attachments

  • zfs pool properties.txt
    3.2 KB · Views: 18
@mbaldini: this machine stops at grub, before inird's busybox.
I think it's because zfs pool is not initialized and so it can't find it and /boot partition
Try to start with proxmox rescue mode, add the rootdelay=10 to /etc/default/grub in line GRUB_CMDLINE_LINUX_DEFAULT, update-grub and then restart.
It's worth a try
 
there has been a similar report in the past (https://forum.proxmox.com/threads/grub-rescue-checksum-verification-failed.39143), interestingly also on an HP (but a DL380 G6).

since we cannot reproduce this issue on our machines, the only possible next step would be to install a custom debug build of Grub with a lot more debugging prints added. would you be willing to test with such a build? note that it would require setting up a serial console to dump the log, as the output would be huge!
 
I think it's because zfs pool is not initialized and so it can't find it and /boot partition
Try to start with proxmox rescue mode, add the rootdelay=10 to /etc/default/grub in line GRUB_CMDLINE_LINUX_DEFAULT, update-grub and then restart.
It's worth a try

and after the reboot the Grub failed to continue after stage 1. It gives some errors and then drops to rescue.

So adding a rootdelay kernel parameter helps grub, the pre-kernel environment?
 
@mbaldini, thank you for your suggestions but I think that @sigxcpu is right and adding parameters to the linux command won't help in my case because the Grub is stuck on earlier stage in the boot process.

@fabian, sure, just provide me with the custom build (instructions if needed) and I will test it and report back. Thank you!
 
@fabian, sure, just provide me with the custom build (instructions if needed) and I will test it and report back. Thank you!

I'll upload some debs on Monday - feel free to ping me in case I forget!
 
@fabian,

this is a little bit off topic, is there a reason why we are not using uefi boot as an alternative option during install? MBR boot is a pain in the neck because it's just harder for me to migrate a zfs rpool to another drive. thanks
 
@fabian,

this is a little bit off topic, is there a reason why we are not using uefi boot as an alternative option during install? MBR boot is a pain in the neck because it's just harder for me to migrate a zfs rpool to another drive. thanks

there is no support for redundant ESPs at the moment in Grub - we'd rather not have people install with ZFS as / and then realize they can't boot once the wrong disk has failed. but we will likely have to whip up a solution of our own for this problem, since neither upstream seems interested in this feature and legacy-booting is slowly dying a painful death anyway...
 
  • Like
Reactions: NewDude
there is no support for redundant ESPs at the moment in Grub - we'd rather not have people install with ZFS as / and then realize they can't boot once the wrong disk has failed. but we will likely have to whip up a solution of our own for this problem, since neither upstream seems interested in this feature and legacy-booting is slowly dying a painful death anyway...

Well, if we were to move to UEFI, we wouldn't need grub at all. During installation, the installer would detect if the system supports EFI, if so, it will go ahead an setup EFI boot order. If it does not support, proxmox would fall back to grub boot solution. If I am not mistaken, Ubuntu actually uses both solutions. In other words, they use EFI and grub at the same time. First partition is ESP (first boot priority with UEFI supported bios), and second partition is grub (for legacy mode). Perhaps, proxmox can consider a similar approach!
 
Well, if we were to move to UEFI, we wouldn't need grub at all. During installation, the installer would detect if the system supports EFI, if so, it will go ahead an setup EFI boot order. If it does not support, proxmox would fall back to grub boot solution. If I am not mistaken, Ubuntu actually uses both solutions. In other words, they use EFI and grub at the same time. First partition is ESP (first boot priority with UEFI supported bios), and second partition is grub (for legacy mode). Perhaps, proxmox can consider a similar approach!

UEFI does not support ZFS, and if it would, we would need to keep the ESPs on all boot devices in sync same as with Grub. we already do install both legacy and EFI Grub for single-disk LVM setups - those are not redundant anyway, so not having redundancy in the boot loader is okay.
 
debs are available on http://download.proxmox.com/temp/grub-zfs/

Code:
7ee3861bb10038a624b0080f5dd06c5c  grub2_2.02-pve7~zfsdebug1_amd64.deb
5ac6cce183033810ffc958318f980294  grub2-common_2.02-pve7~zfsdebug1_amd64.deb
7d88dc1ad01ef41d6c282c90875ea498  grub-common_2.02-pve7~zfsdebug1_amd64.deb
dbae49de83592f2983b5bd28758fac70  grub-efi_2.02-pve7~zfsdebug1_amd64.deb
8f1c82762866728674e2fc968f64fed5  grub-efi-amd64_2.02-pve7~zfsdebug1_amd64.deb
8553b4cb99b1ac88cc9178e8da200ea0  grub-efi-amd64-bin_2.02-pve7~zfsdebug1_amd64.deb
e52a631712d21e230e3764f346ff441c  grub-efi-ia32_2.02-pve7~zfsdebug1_amd64.deb
f2e861123c2fedc17eadfee74deda107  grub-efi-ia32-bin_2.02-pve7~zfsdebug1_amd64.deb
27daaad88ce061cb52de86204a1a6ffc  grub-pc_2.02-pve7~zfsdebug1_amd64.deb
9ec64d93e9d3ddd63ebbf3bbc3911e0d  grub-pc-bin_2.02-pve7~zfsdebug1_amd64.deb
8453476e3a5b7dd4f8bb653c67d0d88b  grub-pc-dbg_2.02-pve7~zfsdebug1_amd64.deb
7e89b1d8dd9c22cbec1bb9307517f52c  grub-rescue-pc_2.02-pve7~zfsdebug1_amd64.deb
155d08c5fd8e0dba1994042603dea00d  grub-theme-starfield_2.02-pve7~zfsdebug1_amd64.deb
3b44de6428c028b1c46c6a102dfb836350fe1968e92148d960b593e145795f2f  grub2_2.02-pve7~zfsdebug1_amd64.deb
4a4359603f4a92844b745060e433d17f4a56e85cd2e9e68931c40f969026948a  grub2-common_2.02-pve7~zfsdebug1_amd64.deb
0c11b3d4a014492fa546ba76a04025003856ae05eac06342d91c77f9e2ba5cf7  grub-common_2.02-pve7~zfsdebug1_amd64.deb
1f57a30cc15ba726a77a4216d4628076a9c91e55402f3cc21d8547b3f6eb5175  grub-efi_2.02-pve7~zfsdebug1_amd64.deb
4861a618115d2bdc8f9389ba89029621d7a3222d78e44feaeb1e5c90fd81b2a4  grub-efi-amd64_2.02-pve7~zfsdebug1_amd64.deb
b9ca5b1c37de7de664185c4122a78803f2231dc52e65d4fac4b6c7de5ca0d74e  grub-efi-amd64-bin_2.02-pve7~zfsdebug1_amd64.deb
9bb5d295740ff761688eb86b29a65a515757adf27ec50015119bacbd6b14aa56  grub-efi-ia32_2.02-pve7~zfsdebug1_amd64.deb
7bdfba0b38e464f28e71233bb4598aeb260b0c13f96d555ea99a9e373b7bb4c0  grub-efi-ia32-bin_2.02-pve7~zfsdebug1_amd64.deb
5e49a8bf57d2b5d67f4317a8ea0abf6df406918f86e513f4496fb8814ddc8bd4  grub-pc_2.02-pve7~zfsdebug1_amd64.deb
3aeafad97fe073e9b03685b8d133162a07fd97873a282416f0437c80d9cf4dca  grub-pc-bin_2.02-pve7~zfsdebug1_amd64.deb
aebb162bbc9ab140388247ba1d3d4d430e258603397cc9be15b0c2ac476bde78  grub-pc-dbg_2.02-pve7~zfsdebug1_amd64.deb
c0474c00827e12649984430516490b955998e427538f6939b744b79a76f8357c  grub-rescue-pc_2.02-pve7~zfsdebug1_amd64.deb
ef244b3a75776745d69e32c5c64c3d00a8da7335946290b7020465e4b59b2e12  grub-theme-starfield_2.02-pve7~zfsdebug1_amd64.deb

the following should install the needed grub packages after downloading
Code:
dpkg -i grub-pc-bin_2.02-pve7~zfsdebug1_amd64.deb grub-pc_2.02-pve7~zfsdebug1_amd64.deb grub2-common_2.02-pve7~zfsdebug1_amd64.deb grub-efi-amd64-bin_2.02-pve7~zfsdebug1_amd64.deb  grub-common_2.02-pve7~zfsdebug1_amd64.deb  grub-efi-ia32-bin_2.02-pve7~zfsdebug1_amd64.deb

don't forget to re-install grub to the device you are booting from (with "grub-install").

like I said, the debug build will produce lots of output when you do anything after setting "debug=zfs", so a serial connection is a must for collecting all of it.
 
You might be running into a firmware issue where the drive controller can only see the 1st 2TB of the drive. I got bit by this problem with LSI controller cards and 8TB drives. You can confirm it by turning on debugging and using the ls command on your boot drive and you will see messages about trying to read outside the maximum partition size in the debug log.

My solution was to pull out the LSI controller card and just hook the drives to the internal SATA ports, after which the system just booted up.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!