Proxmox (on ZFS) refuses to boot after the latest kernel upgrade

Discussion in 'Proxmox VE: Installation and configuration' started by nightrider, Mar 22, 2018.

  1. nightrider

    nightrider New Member

    Joined:
    Mar 22, 2018
    Messages:
    6
    Likes Received:
    0
    Hello!

    About a month ago I've decided to give Proxmox-VE a try. The installation and the creation of a few test VMs went without hitches. Everything was working smooth (reboots also) until yesterday when I performed dist-upgrade to version 5.1.42. After the upgrade I had to reboot because of the new kernel version and after the reboot the Grub failed to continue after stage 1. It gives some errors and then drops to rescue.

    Machine setup info

    ProLiant DL360e Gen8 (not UEFI) with two 4TB disks. Proxmox-VE with ZFS in mirror (RAID1) mode.

    Here is the Grub stage 1 output:

    Code:
    Attempting Boot From Hard Drive (C:)
    error: no such device 295864b8ee73d9fe.
    error: unknown filesystem.
    Entering rescue mode ...
    What I've tried so far

    - Used the Proxmox-VE CD to boot into rescue environment. Then imported and chrooted the zfs pool, and after that re-installed grub to /dev/sda and dev/sdb and regenerated the grub config. After reboot the boot process ended again with the same grub errors.

    Useful detailed info

    Code:
    grub-probe /
    returns zfs
    Code:
    295864b8ee73d9fe
    corresponds to sda2/sdb2 (zfs pool) when converted to decimal

    Code:
    grub-rescue> ls
    (hd0) (hd0,gpt9) (hd0,gpt2) (hd0,gpt1) (hd1) (hd1,gpt9) (hd1,gpt2) (hd1,gpt1)
    grub-rescue> ls (hd0,gpt2)
    (hd0,gpt2): Filesystem is unknown.
    grub-rescue> set
    cmdpath=(hd0)
    prefix=(hd0)/ROOT/pve-1@/boot/grub
    root=hd0
    grub-rescue> insmod normal
    error: unknown filesystem.
    grub-rescue> insmod zfs
    grub-rescue> set debug=zfs
    grub-rescue> ls (hd0,gpt2)
    fs/zfs/zfs.c:1192: label ok 0
    fs/zfs/zfs.c:1007: check 2 passed
    fs/zfs/zfs.c:1018: check 3 passed
    fs/zfs/zfs.c:1025: check 4 passed
    fs/zfs/zfs.c:1035: check 6 passed
    fs/zfs/zfs.c:1043: check 7 passed
    fs/zfs/zfs.c:1054: check 8 passed
    fs/zfs/zfs.c:1064: check 9 passed
    fs/zfs/zfs.c:1086: check 11 passed
    fs/zfs/zfs.c:1112: check 10 passed
    fs/zfs/zfs.c:1128: str=com.delphix:hole_birth
    fs/zfs/zfs.c:1128: str=com.delphix:embedded_data
    fs/zfs/zfs.c:1137: check 12 passed (feature flags)
    fs/zfs/zfs.c:1878: zio_read: E 0: size 2048/2048
    fs/zfs/zfs.c:1899: endian = -1
    fs/zfs/zfs.c:595: dva=8, 11c0717f8
    fs/zfs/zfs.c:442: checksum feltcher4 verification failed
    fs/zfs/zfs.c:447: actual checksum 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    fs/zfs/zfs.c:452: expected checksum 00000004498e0fdc 000007fa34d3423a ee2726 04a3580d3cb4fb7d
    fs/zfs/zfs.c:1922: incorrect checksum
    (hd0,gpt2): Filesystem is unknown.
    grub-rescue>
    
    Code:
    root@proxmox:/# blkid
    /dev/sda2: LABEL="rpool" UUID="2979241898942913022" UUID_SUB="11474682908795965178" TYPE="zfs_member" PARTLABEL="zfs" PARTUUID="1a99e21a-3305-4f4c-93e2-1c4ce9490810"
    /dev/sdb2: LABEL="rpool" UUID="2979241898942913022" UUID_SUB="10144792332914432543" TYPE="zfs_member" PARTLABEL="zfs" PARTUUID="5a3ee6dc-8185-4cc5-aaaf-c7a89a399613"
    /dev/sda1: PARTUUID="redacted"
    /dev/sda9: PARTUUID="redacted"
    /dev/sdb1: PARTUUID="redacted"
    /dev/sdb9: PARTUUID="redacted"
    root@proxmox:/#
    
    The output of the blkid is redacted (zfs volumes removed) for brevity.

    Code:
    root@proxmox:/# zpool get all
    NAME   PROPERTY                       VALUE                          SOURCE
    rpool  size                           3.6T                           -
    rpool  capacity                       1%                             -
    rpool  altroot                        /mnt                           default
    rpool  health                         ONLINE                         -
    rpool  guid                           2979241898942913022            -
    rpool  version                        -                              default
    rpool  bootfs                         rpool/ROOT/pve-1               local
    rpool  delegation                     on                             default
    rpool  autoreplace                    off                            default
    rpool  cachefile                      none                           local
    rpool  failmode                       wait                           default
    rpool  listsnapshots                  off                            default
    rpool  autoexpand                     off                            default
    rpool  dedupditto                     0                              default
    rpool  dedupratio                     1.00x                          -
    rpool  free                           3.57T                          -
    rpool  allocated                      57.0G                          -
    rpool  readonly                       off                            -
    rpool  ashift                         12                             local
    rpool  comment                        -                              default
    rpool  expandsize                     -                              -
    rpool  freeing                        0                              -
    rpool  fragmentation                  0%                             -
    rpool  leaked                         0                              -
    rpool  multihost                      off                            default
    rpool  feature@async_destroy          enabled                        local
    rpool  feature@empty_bpobj            active                         local
    rpool  feature@lz4_compress           active                         local
    rpool  feature@multi_vdev_crash_dump  enabled                        local
    rpool  feature@spacemap_histogram     active                         local
    rpool  feature@enabled_txg            active                         local
    rpool  feature@hole_birth             active                         local
    rpool  feature@extensible_dataset     active                         local
    rpool  feature@embedded_data          active                         local
    rpool  feature@bookmarks              enabled                        local
    rpool  feature@filesystem_limits      enabled                        local
    rpool  feature@large_blocks           enabled                        local
    rpool  feature@large_dnode            enabled                        local
    rpool  feature@sha512                 enabled                        local
    rpool  feature@skein                  enabled                        local
    rpool  featureeedonr                  enabled                        local
    rpool  feature@userobj_accounting     active                         local
    root@proxmox:/#
    
    I guess I can use thumb drive (as a last resort) to install the boot partition on and boot from it.

    Do you have anything in mind what is going on as I am kinda getting out of ideas? Any help would be highly appreaciated! Thank you in advance!
     
  2. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,201
    Likes Received:
    497
    could you provide the pool properties of the rpool, and the dataset properties of the rpool and ROOT datasets?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. nightrider

    nightrider New Member

    Joined:
    Mar 22, 2018
    Messages:
    6
    Likes Received:
    0
    Fabian, thank you for your reply!

    Please find the output attached as a log file since it is too long to paste here as text.

    Please tell me if you need any more info! Thank you!
     

    Attached Files:

  4. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,201
    Likes Received:
    497
    could you also add "zpool get all rpool"? thanks!
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. nightrider

    nightrider New Member

    Joined:
    Mar 22, 2018
    Messages:
    6
    Likes Received:
    0
    Of course. Please find it attached as log file. Thank you!
     

    Attached Files:

  6. mbaldini

    mbaldini Member

    Joined:
    Nov 7, 2015
    Messages:
    166
    Likes Received:
    19
  7. sigxcpu

    sigxcpu Member

    Joined:
    May 4, 2012
    Messages:
    433
    Likes Received:
    9
    @mbaldini: this machine stops at grub, before inird's busybox.
     
  8. udo

    udo Well-Known Member
    Proxmox Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,835
    Likes Received:
    158
    Hi,
    for me help the rootdelay too - and AFAIK the machine stops before entering busybox.

    Udo
     
  9. mbaldini

    mbaldini Member

    Joined:
    Nov 7, 2015
    Messages:
    166
    Likes Received:
    19
    I think it's because zfs pool is not initialized and so it can't find it and /boot partition
    Try to start with proxmox rescue mode, add the rootdelay=10 to /etc/default/grub in line GRUB_CMDLINE_LINUX_DEFAULT, update-grub and then restart.
    It's worth a try
     
  10. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,201
    Likes Received:
    497
    there has been a similar report in the past (https://forum.proxmox.com/threads/grub-rescue-checksum-verification-failed.39143), interestingly also on an HP (but a DL380 G6).

    since we cannot reproduce this issue on our machines, the only possible next step would be to install a custom debug build of Grub with a lot more debugging prints added. would you be willing to test with such a build? note that it would require setting up a serial console to dump the log, as the output would be huge!
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  11. sigxcpu

    sigxcpu Member

    Joined:
    May 4, 2012
    Messages:
    433
    Likes Received:
    9
    So adding a rootdelay kernel parameter helps grub, the pre-kernel environment?
     
  12. nightrider

    nightrider New Member

    Joined:
    Mar 22, 2018
    Messages:
    6
    Likes Received:
    0
    @mbaldini, thank you for your suggestions but I think that @sigxcpu is right and adding parameters to the linux command won't help in my case because the Grub is stuck on earlier stage in the boot process.

    @fabian, sure, just provide me with the custom build (instructions if needed) and I will test it and report back. Thank you!
     
  13. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,201
    Likes Received:
    497
    I'll upload some debs on Monday - feel free to ping me in case I forget!
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  14. tpham

    tpham New Member

    Joined:
    Apr 23, 2017
    Messages:
    7
    Likes Received:
    0
    @fabian,

    this is a little bit off topic, is there a reason why we are not using uefi boot as an alternative option during install? MBR boot is a pain in the neck because it's just harder for me to migrate a zfs rpool to another drive. thanks
     
  15. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,201
    Likes Received:
    497
    there is no support for redundant ESPs at the moment in Grub - we'd rather not have people install with ZFS as / and then realize they can't boot once the wrong disk has failed. but we will likely have to whip up a solution of our own for this problem, since neither upstream seems interested in this feature and legacy-booting is slowly dying a painful death anyway...
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
    NewDude likes this.
  16. tpham

    tpham New Member

    Joined:
    Apr 23, 2017
    Messages:
    7
    Likes Received:
    0
    Well, if we were to move to UEFI, we wouldn't need grub at all. During installation, the installer would detect if the system supports EFI, if so, it will go ahead an setup EFI boot order. If it does not support, proxmox would fall back to grub boot solution. If I am not mistaken, Ubuntu actually uses both solutions. In other words, they use EFI and grub at the same time. First partition is ESP (first boot priority with UEFI supported bios), and second partition is grub (for legacy mode). Perhaps, proxmox can consider a similar approach!
     
  17. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,201
    Likes Received:
    497
    UEFI does not support ZFS, and if it would, we would need to keep the ESPs on all boot devices in sync same as with Grub. we already do install both legacy and EFI Grub for single-disk LVM setups - those are not redundant anyway, so not having redundancy in the boot loader is okay.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  18. nightrider

    nightrider New Member

    Joined:
    Mar 22, 2018
    Messages:
    6
    Likes Received:
    0
    @fabian, hello! I'm reminding you about the custom Grub debs. Thank you!
     
  19. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,201
    Likes Received:
    497
    debs are available on http://download.proxmox.com/temp/grub-zfs/

    Code:
    7ee3861bb10038a624b0080f5dd06c5c  grub2_2.02-pve7~zfsdebug1_amd64.deb
    5ac6cce183033810ffc958318f980294  grub2-common_2.02-pve7~zfsdebug1_amd64.deb
    7d88dc1ad01ef41d6c282c90875ea498  grub-common_2.02-pve7~zfsdebug1_amd64.deb
    dbae49de83592f2983b5bd28758fac70  grub-efi_2.02-pve7~zfsdebug1_amd64.deb
    8f1c82762866728674e2fc968f64fed5  grub-efi-amd64_2.02-pve7~zfsdebug1_amd64.deb
    8553b4cb99b1ac88cc9178e8da200ea0  grub-efi-amd64-bin_2.02-pve7~zfsdebug1_amd64.deb
    e52a631712d21e230e3764f346ff441c  grub-efi-ia32_2.02-pve7~zfsdebug1_amd64.deb
    f2e861123c2fedc17eadfee74deda107  grub-efi-ia32-bin_2.02-pve7~zfsdebug1_amd64.deb
    27daaad88ce061cb52de86204a1a6ffc  grub-pc_2.02-pve7~zfsdebug1_amd64.deb
    9ec64d93e9d3ddd63ebbf3bbc3911e0d  grub-pc-bin_2.02-pve7~zfsdebug1_amd64.deb
    8453476e3a5b7dd4f8bb653c67d0d88b  grub-pc-dbg_2.02-pve7~zfsdebug1_amd64.deb
    7e89b1d8dd9c22cbec1bb9307517f52c  grub-rescue-pc_2.02-pve7~zfsdebug1_amd64.deb
    155d08c5fd8e0dba1994042603dea00d  grub-theme-starfield_2.02-pve7~zfsdebug1_amd64.deb
    3b44de6428c028b1c46c6a102dfb836350fe1968e92148d960b593e145795f2f  grub2_2.02-pve7~zfsdebug1_amd64.deb
    4a4359603f4a92844b745060e433d17f4a56e85cd2e9e68931c40f969026948a  grub2-common_2.02-pve7~zfsdebug1_amd64.deb
    0c11b3d4a014492fa546ba76a04025003856ae05eac06342d91c77f9e2ba5cf7  grub-common_2.02-pve7~zfsdebug1_amd64.deb
    1f57a30cc15ba726a77a4216d4628076a9c91e55402f3cc21d8547b3f6eb5175  grub-efi_2.02-pve7~zfsdebug1_amd64.deb
    4861a618115d2bdc8f9389ba89029621d7a3222d78e44feaeb1e5c90fd81b2a4  grub-efi-amd64_2.02-pve7~zfsdebug1_amd64.deb
    b9ca5b1c37de7de664185c4122a78803f2231dc52e65d4fac4b6c7de5ca0d74e  grub-efi-amd64-bin_2.02-pve7~zfsdebug1_amd64.deb
    9bb5d295740ff761688eb86b29a65a515757adf27ec50015119bacbd6b14aa56  grub-efi-ia32_2.02-pve7~zfsdebug1_amd64.deb
    7bdfba0b38e464f28e71233bb4598aeb260b0c13f96d555ea99a9e373b7bb4c0  grub-efi-ia32-bin_2.02-pve7~zfsdebug1_amd64.deb
    5e49a8bf57d2b5d67f4317a8ea0abf6df406918f86e513f4496fb8814ddc8bd4  grub-pc_2.02-pve7~zfsdebug1_amd64.deb
    3aeafad97fe073e9b03685b8d133162a07fd97873a282416f0437c80d9cf4dca  grub-pc-bin_2.02-pve7~zfsdebug1_amd64.deb
    aebb162bbc9ab140388247ba1d3d4d430e258603397cc9be15b0c2ac476bde78  grub-pc-dbg_2.02-pve7~zfsdebug1_amd64.deb
    c0474c00827e12649984430516490b955998e427538f6939b744b79a76f8357c  grub-rescue-pc_2.02-pve7~zfsdebug1_amd64.deb
    ef244b3a75776745d69e32c5c64c3d00a8da7335946290b7020465e4b59b2e12  grub-theme-starfield_2.02-pve7~zfsdebug1_amd64.deb
    
    the following should install the needed grub packages after downloading
    Code:
    dpkg -i grub-pc-bin_2.02-pve7~zfsdebug1_amd64.deb grub-pc_2.02-pve7~zfsdebug1_amd64.deb grub2-common_2.02-pve7~zfsdebug1_amd64.deb grub-efi-amd64-bin_2.02-pve7~zfsdebug1_amd64.deb  grub-common_2.02-pve7~zfsdebug1_amd64.deb  grub-efi-ia32-bin_2.02-pve7~zfsdebug1_amd64.deb
    don't forget to re-install grub to the device you are booting from (with "grub-install").

    like I said, the debug build will produce lots of output when you do anything after setting "debug=zfs", so a serial connection is a must for collecting all of it.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  20. Paul Bucher

    Paul Bucher New Member
    Proxmox Subscriber

    Joined:
    Jan 8, 2018
    Messages:
    1
    Likes Received:
    0
    You might be running into a firmware issue where the drive controller can only see the 1st 2TB of the drive. I got bit by this problem with LSI controller cards and 8TB drives. You can confirm it by turning on debugging and using the ls command on your boot drive and you will see messages about trying to read outside the maximum partition size in the debug log.

    My solution was to pull out the LSI controller card and just hook the drives to the internal SATA ports, after which the system just booted up.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice