Can't boot after upgrade

dsi

Renowned Member
Dec 15, 2015
25
5
68
Germany
Hello,

after recent upgrade from GUI I can't boot the system (ZFS root on RAID1) anymore:
error: no such device: 1d54.....
error: unknown filesystem
Entering rescue mode
grub rescue>

Rebooting via install disk in rescue mode didn't help.
Rebooting via install disk in Install (Debug) mode didn't help either - I'm accessing the system via HP iLO and the USB keyboard is not supported.
Rebooting via install disk in Install mode, pressing CTRL-ALT-FN1 and CTRL-C gives me a shell - yeah!

I can import the pool with: zpool import -R /mnt pool. Following errors occur:
cannot mount: '/mnt/rpool': directory is not empty
cannot mount: '/mnt/rpool/pve': directory is not empty

System was initially setup under Proxmox 3.3 and continuously updated without problems. Now I'm stuck!

How do I fix the system? Any help appreciated!

Dirk
 
Update on my problem which is still not solved.

From the shell out of the install disk I could verify that `zpool status rpool` is ok.
I tried to reinstall Grub (https://pve.proxmox.com/wiki/Recover_From_Grub_Failure)
--> no success - after reboot I end up in grub rescue mode

Entering Grub console from the install disk again:
Code:
grub> insmod zfs
grub> ls
(hd0) (hd0,gpt3) (hd0,gpt2) (hd0,gpt1) ...
grub> ls (hd0,gpt3)/
error: unknown filesystem
grub> zfsinfo (hd0,gpt3)
error: Unsupported features in pool.

Then I loaded the kernel from a MicroSD card:
Code:
root=(hd1,msdos1)
linux /vmlinuz-4.4.35-2-pve root=ZFS=rpool/ROOT/pve-1 ro boot=zfs
initrd /initrd.img-4.4.35-2-pve
boot

The system started normally. I performed following checks:
Code:
root@pve:~# grub-probe -t fs_uuid /
1d540bdfe4ab7feb

root@pve:~# grub-probe -vvvv /
grub-core/kern/fs.c:56: Detecting zfs...
grub-core/osdep/hostdisk.c:416: opening the device `/dev/sda3' in open_device()
grub-core/fs/zfs/zfs.c:1192: label ok 0
grub-core/osdep/hostdisk.c:395: reusing open device `/dev/sda3'
grub-core/fs/zfs/zfs.c:1007: check 2 passed
grub-core/fs/zfs/zfs.c:1018: check 3 passed
grub-core/fs/zfs/zfs.c:1025: check 4 passed
grub-core/fs/zfs/zfs.c:1035: check 6 passed
grub-core/fs/zfs/zfs.c:1043: check 7 passed
grub-core/fs/zfs/zfs.c:1054: check 8 passed
grub-core/fs/zfs/zfs.c:1064: check 9 passed
grub-core/fs/zfs/zfs.c:1086: check 11 passed
grub-core/fs/zfs/zfs.c:1112: check 10 passed
grub-core/fs/zfs/zfs.c:1128: str=com.delphix:hole_birth
grub-core/fs/zfs/zfs.c:1128: str=com.delphix:embedded_data
grub-core/fs/zfs/zfs.c:1137: check 12 passed (feature flags)

root@pve:~# grub-fstest /dev/sda3 ls /ROOT/pve-1/@/boot/grub
i386-pc/ locale/ fonts/  grubenv x86_64-efi/ unicode.pf2   grub.cfg

Issue seems to be that Grub boot loader can't read the zfs partition.

Attached the list of file from recent upgrade.

Any ideas?
 

Attachments

could you post the output of "zpool get all rpool" and "zfs get all rpool/ROOT/pve-1"
 
Here we go:
Code:
root@pve:~# zpool get all rpool
NAME   PROPERTY                    VALUE                       SOURCE
rpool  size                        2.72T                       -
rpool  capacity                    43%                         -
rpool  altroot                     -                           default
rpool  health                      ONLINE                      -
rpool  guid                        2113327181385662443         default
rpool  version                     -                           default
rpool  bootfs                      rpool/ROOT/pve-1            local
rpool  delegation                  on                          default
rpool  autoreplace                 off                         default
rpool  cachefile                   -                           default
rpool  failmode                    wait                        default
rpool  listsnapshots               off                         default
rpool  autoexpand                  off                         default
rpool  dedupditto                  0                           default
rpool  dedupratio                  1.00x                       -
rpool  free                        1.53T                       -
rpool  allocated                   1.19T                       -
rpool  readonly                    off                         -
rpool  ashift                      12                          local
rpool  comment                     -                           default
rpool  expandsize                  -                           -
rpool  freeing                     0                           default
rpool  fragmentation               30%                         -
rpool  leaked                      0                           default
rpool  feature@async_destroy       enabled                     local
rpool  feature@empty_bpobj         active                      local
rpool  feature@lz4_compress        active                      local
rpool  feature@spacemap_histogram  active                      local
rpool  feature@enabled_txg         active                      local
rpool  feature@hole_birth          active                      local
rpool  feature@extensible_dataset  enabled                     local
rpool  feature@embedded_data       active                      local
rpool  feature@bookmarks           enabled                     local
rpool  feature@filesystem_limits   enabled                     local
rpool  feature@large_blocks        enabled                     local
Code:
root@pve:~# zfs get all rpool/ROOT/pve-1
NAME              PROPERTY              VALUE                  SOURCE
rpool/ROOT/pve-1  type                  filesystem             -
rpool/ROOT/pve-1  creation              Fri Mar 27 20:19 2015  -
rpool/ROOT/pve-1  used                  69.1G                  -
rpool/ROOT/pve-1  available             1.37T                  -
rpool/ROOT/pve-1  referenced            69.1G                  -
rpool/ROOT/pve-1  compressratio         1.08x                  -
rpool/ROOT/pve-1  mounted               yes                    -
rpool/ROOT/pve-1  quota                 none                   default
rpool/ROOT/pve-1  reservation           none                   default
rpool/ROOT/pve-1  recordsize            128K                   default
rpool/ROOT/pve-1  mountpoint            /                      local
rpool/ROOT/pve-1  sharenfs              off                    default
rpool/ROOT/pve-1  checksum              on                     default
rpool/ROOT/pve-1  compression           lz4                    inherited from rpool
rpool/ROOT/pve-1  atime                 off                    inherited from rpool
rpool/ROOT/pve-1  devices               on                     default
rpool/ROOT/pve-1  exec                  on                     default
rpool/ROOT/pve-1  setuid                on                     default
rpool/ROOT/pve-1  readonly              off                    default
rpool/ROOT/pve-1  zoned                 off                    default
rpool/ROOT/pve-1  snapdir               hidden                 default
rpool/ROOT/pve-1  aclinherit            restricted             default
rpool/ROOT/pve-1  canmount              on                     default
rpool/ROOT/pve-1  xattr                 on                     default
rpool/ROOT/pve-1  copies                1                      default
rpool/ROOT/pve-1  version               5                      -
rpool/ROOT/pve-1  utf8only              off                    -
rpool/ROOT/pve-1  normalization         none                   -
rpool/ROOT/pve-1  casesensitivity       sensitive              -
rpool/ROOT/pve-1  vscan                 off                    default
rpool/ROOT/pve-1  nbmand                off                    default
rpool/ROOT/pve-1  sharesmb              off                    default
rpool/ROOT/pve-1  refquota              none                   default
rpool/ROOT/pve-1  refreservation        none                   default
rpool/ROOT/pve-1  primarycache          all                    default
rpool/ROOT/pve-1  secondarycache        all                    default
rpool/ROOT/pve-1  usedbysnapshots       0                      -
rpool/ROOT/pve-1  usedbydataset         69.1G                  -
rpool/ROOT/pve-1  usedbychildren        0                      -
rpool/ROOT/pve-1  usedbyrefreservation  0                      -
rpool/ROOT/pve-1  logbias               latency                default
rpool/ROOT/pve-1  dedup                 off                    default
rpool/ROOT/pve-1  mlslabel              none                   default
rpool/ROOT/pve-1  sync                  standard               inherited from rpool
rpool/ROOT/pve-1  refcompressratio      1.08x                  -
rpool/ROOT/pve-1  written               69.1G                  -
rpool/ROOT/pve-1  logicalused           73.9G                  -
rpool/ROOT/pve-1  logicalreferenced     73.9G                  -
rpool/ROOT/pve-1  filesystem_limit      none                   default
rpool/ROOT/pve-1  snapshot_limit        none                   default
rpool/ROOT/pve-1  filesystem_count      none                   default
rpool/ROOT/pve-1  snapshot_count        none                   default
rpool/ROOT/pve-1  snapdev               hidden                 default
rpool/ROOT/pve-1  acltype               off                    default
rpool/ROOT/pve-1  context               none                   default
rpool/ROOT/pve-1  fscontext             none                   default
rpool/ROOT/pve-1  defcontext            none                   default
rpool/ROOT/pve-1  rootcontext           none                   default
rpool/ROOT/pve-1  relatime              on                     temporary
rpool/ROOT/pve-1  redundant_metadata    all                    default
rpool/ROOT/pve-1  overlay               off                    default
 
that looks okay. does "update-grub" or "grub-install /dev/sda" display any errors?
 
"update-grub" and "grub-install /dev/sda" (see attachment) went fine. Key question is still why:
a) Grub bootloader on /dev/sda can't load subsequent modules from /dev/sda3
b) Grub on Proxmox 4.4 install disk can't access kernel/initrd from zfs pool on /dev/sda3.

Thanks for your support!
 

Attachments

"update-grub" and "grub-install /dev/sda" (see attachment) went fine. Key question is still why:
a) Grub bootloader on /dev/sda can't load subsequent modules from /dev/sda3
b) Grub on Proxmox 4.4 install disk can't access kernel/initrd from zfs pool on /dev/sda3.

Thanks for your support!

since it works in general, I suspect the issue has to do with your old-and-upgraded setup.. could you post the output of "lsblk" (booted system) and "ls XYZ" for all the disks and partitions printed by "ls" in the grub rescue shell? the current /boot/grub/grub.cfg might also help. also, "pveversion -v" for completeness sake ;)
 
As requested - appreciate your help!
Code:
root@pve:~# lsblk
NAME     MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda        8:0    0  2.7T  0 disk
├─sda1     8:1    0    1M  0 part
├─sda2     8:2    0  128M  0 part
└─sda3     8:3    0  2.7T  0 part
sdb        8:16   0  2.7T  0 disk
├─sdb1     8:17   0    1M  0 part
├─sdb2     8:18   0  128M  0 part
└─sdb3     8:19   0  2.7T  0 part
sdc        8:32   0  922M  0 disk
└─sdc1     8:33   0  918M  0 part
zd0      230:0    0    4G  0 disk [SWAP]
zd16     230:16   0   40G  0 disk
├─zd16p1 230:17   0  350M  0 part
├─zd16p2 230:18   0    2G  0 part
├─zd16p3 230:19   0    1G  0 part
├─zd16p4 230:20   0    1K  0 part
├─zd16p5 230:21   0   13G  0 part
├─zd16p6 230:22   0  5.4G  0 part
├─zd16p7 230:23   0   17G  0 part
└─zd16p8 230:24   0  1.1G  0 part
zd32     230:32   0   32G  0 disk
zd48     230:48   0  4.5G  0 disk
zd64     230:64   0  4.5G  0 disk

Code:
root@pve:~# pveversion -v
proxmox-ve: 4.4-79 (running kernel: 4.4.35-2-pve)
pve-manager: 4.4-12 (running version: 4.4-12/e71b7a74)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.13-1-pve: 4.4.13-56
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.4.35-2-pve: 4.4.35-79
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.15-1-pve: 4.4.15-60
pve-kernel-4.2.8-1-pve: 4.2.8-41
pve-kernel-4.4.16-1-pve: 4.4.16-64
pve-kernel-4.4.24-1-pve: 4.4.24-72
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.4.10-1-pve: 4.4.10-54
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-107
pve-firmware: 1.1-10
libpve-common-perl: 4.0-90
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-73
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-3
pve-qemu-kvm: 2.7.1-1
pve-container: 1.0-93
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-1
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve14~bpo80
fence-agents-pve: 4.0.20-1
openvswitch-switch: 2.6.0-2

Code:
root@pve:~# sgdisk -p /dev/sda
Disk /dev/sda: 5860533168 sectors, 2.7 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 6B141385-13E5-4F42-8732-BEFEC7816711
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 5860533134
Partitions will be aligned on 2048-sector boundaries
Total free space is 4029 sectors (2.0 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048            4095   1024.0 KiB  EF02  Grub-Boot-Partition
   2            4096          266239   128.0 MiB   EF00  EFI-System-Partition
   3          266240      5860531119   2.7 TiB     8300  PVE-ZFS-Partition

Grub rescue - (hd1) is a local MicroSD Card, (hd2) same result:
Code:
Attempting Boot From Hard Drive (C:)
error: no such device: 1d540bdfe4ab7feb.
error: unknown filesystem.
Entering rescue mode...
grub rescue> ls
(hd0) (hd0,gpt3) (hd0,gpt2) (hd0,gpt1) (hd1) (hd2)(hd2,gpt3) (hd2,gpt2) (hd2,gpt1)
grub rescue> ls (hd0,gpt1)
(hd0,gpt1): Filesystem is unknown.
grub rescue> ls (hd0,gpt2)
(hd0,gpt2): Filesystem is unknown.
grub rescue> ls (hd0,gpt3)
(hd0,gpt3): Filesystem is unknown.
 

Attachments

what does "blkid -U 1d540bdfe4ab7feb" report? (on the booted system)
 
what does "blkid -U 1d540bdfe4ab7feb" report? (on the booted system)

There is no result, however if you convert 1d540bdfe4ab7feb (hex) to 2113327181385662443 (dec):
Code:
root@pve:~# blkid -U 1d540bdfe4ab7feb
root@pve:~# blkid -U 2113327181385662443
/dev/sda3
root@pve:~# lsblk -o NAME,FSTYPE,LABEL,UUID /dev/sda3 /dev/sdb3
NAME FSTYPE     LABEL UUID
sda3 zfs_member rpool 2113327181385662443
sdb3 zfs_member rpool 2113327181385662443
 
The system started normally. I performed following checks:
Code:
root@pve:~# grub-probe -t fs_uuid /
1d540bdfe4ab7feb

root@pve:~# grub-probe -vvvv /
grub-core/kern/fs.c:56: Detecting zfs...
grub-core/osdep/hostdisk.c:416: opening the device `/dev/sda3' in open_device()
grub-core/fs/zfs/zfs.c:1192: label ok 0
grub-core/osdep/hostdisk.c:395: reusing open device `/dev/sda3'
grub-core/fs/zfs/zfs.c:1007: check 2 passed
grub-core/fs/zfs/zfs.c:1018: check 3 passed
grub-core/fs/zfs/zfs.c:1025: check 4 passed
grub-core/fs/zfs/zfs.c:1035: check 6 passed
grub-core/fs/zfs/zfs.c:1043: check 7 passed
grub-core/fs/zfs/zfs.c:1054: check 8 passed
grub-core/fs/zfs/zfs.c:1064: check 9 passed
grub-core/fs/zfs/zfs.c:1086: check 11 passed
grub-core/fs/zfs/zfs.c:1112: check 10 passed
grub-core/fs/zfs/zfs.c:1128: str=com.delphix:hole_birth
grub-core/fs/zfs/zfs.c:1128: str=com.delphix:embedded_data
grub-core/fs/zfs/zfs.c:1137: check 12 passed (feature flags)

root@pve:~# grub-fstest /dev/sda3 ls /ROOT/pve-1/@/boot/grub
i386-pc/ locale/ fonts/  grubenv x86_64-efi/ unicode.pf2   grub.cfg

sorry, back to that post - was that the complete output? especially of the first grub-probe command (I get a lot more)?

what does "dpkg --list | grep grub" say? (just to make sure we are testing with the same versions)
what about the full "blkid /dev/sda3" and "blkid /dev/sdb3" ?

does booting from sdb (via the BIOS boot menu / selector) give the same error message?
 
Code:
root@pve:~# dpkg --list | grep grub
ii  grub-common                    2.02-pve5                      amd64        GRand Unified Bootloader (common files)
ii  grub-efi-amd64-bin             2.02-pve5                      amd64        GRand Unified Bootloader, version 2 (EFI-AMD64 binaries)
ii  grub-efi-ia32-bin              2.02-pve5                      amd64        GRand Unified Bootloader, version 2 (EFI-IA32 binaries)
ii  grub-pc                        2.02-pve5                      amd64        GRand Unified Bootloader, version 2 (PC/BIOS version)
ii  grub-pc-bin                    2.02-pve5                      amd64        GRand Unified Bootloader, version 2 (PC/BIOS binaries)
ii  grub2-common                   2.02-pve5                      amd64        GRand Unified Bootloader (common files for version 2)
root@pve:~# blkid /dev/sda3
/dev/sda3: LABEL="rpool" UUID="2113327181385662443" UUID_SUB="7247439626105154391" TYPE="zfs_member" PARTLABEL="PVE-ZFS-Partition" PARTUUID="270f9cc3-1776-4fdb-a476-c43c16237fa7"
root@pve:~# blkid /dev/sdb3
/dev/sdb3: LABEL="rpool" UUID="2113327181385662443" UUID_SUB="10622808652303524429" TYPE="zfs_member" PARTLABEL="PVE-ZFS-Partition" PARTUUID="2af7429f-2d14-44c7-a263-5d9aefe8458b"
root@pve:~# grub-probe -t fs_uuid /
1d540bdfe4ab7feb
"grub-probe -vvvv /" see attachment

Regarding your last question: I can't switch directly to sdb via BIOS. System has a cage with four slots and starts to boot from slot1 onwards. However, chainloading from Install-CD to (hd2) is possible but didn't succeed either.
 

Attachments

one more thing:
what does "set" output in the grub rescue shell?
 
Code:
grub rescue> set
cmdpath=(hd0)
prefix=(hd0)/ROOT/pve-1@/boot/grub
root=hd0

okay, so this probably means that grub got installed to the MBR and the BIOS boot partition, in which case the one from the MBR gets used. could you try the following in the rescue prompt:

Code:
prefix=(hd0,gpt3)/ROOT/pve-1@/boot/grub
insmod normal
normal

and report any errors?

alternatively, if you feel comfortable with that, you can also wipe the MBR and try rebooting, but in that case it is highly advisable to have good backups in case you accidentally wipe the wrong thing.
 
okay, so this probably means that grub got installed to the MBR and the BIOS boot partition, in which case the one from the MBR gets used. could you try the following in the rescue prompt:

Code:
prefix=(hd0,gpt3)/ROOT/pve-1@/boot/grub
insmod normal
normal

and report any errors?

alternatively, if you feel comfortable with that, you can also wipe the MBR and try rebooting, but in that case it is highly advisable to have good backups in case you accidentally wipe the wrong thing.

After "insmod normal" I get the "error: unknown filesystem" again.
I still do not understand why I can't access the zfs filesystem from Grub - even Grub from Proxmox Install CD doesn't work. I assume wiping the MTB wouldn't help. Two possibilities:
a) install Grub on MicroSD and load kernel/initrd from there --> easy workaround
b) reinstall from recent Proxmox Install CD --> clean, but more work

Other ideas?

Fabian, thanks for your great support!
 
After "insmod normal" I get the "error: unknown filesystem" again.
I still do not understand why I can't access the zfs filesystem from Grub - even Grub from Proxmox Install CD doesn't work. I assume wiping the MTB wouldn't help. Two possibilities:
a) install Grub on MicroSD and load kernel/initrd from there --> easy workaround
b) reinstall from recent Proxmox Install CD --> clean, but more work

both should work, for "a" you should probably put the complete /boot on there otherwise you'll need to do manual fiddling to get new kernels (the update-grub script should detect the device where your /boot is automatically).

Other ideas?

kind of running out of them at this point.
 
kind of running out of them at this point.
One more question. I have recognized that several ZFS packages have been updated (see history.log attached above; e.g. libzfs2:amd64 (0.6.5.8-pve13~bpo80, 0.6.5.8-pve14~bpo80) and then the problems started. But Grub is still from November 30, 2016. Might this be the reason for Grub not recognizing the ZFS pool anymore? What is the difference between the ZFS versions and is it worth to compile Grub against 0.6.5.8-pve14~bpo80?
 
One more question. I have recognized that several ZFS packages have been updated (see history.log attached above; e.g. libzfs2:amd64 (0.6.5.8-pve13~bpo80, 0.6.5.8-pve14~bpo80) and then the problems started. But Grub is still from November 30, 2016. Might this be the reason for Grub not recognizing the ZFS pool anymore? What is the difference between the ZFS versions and is it worth to compile Grub against 0.6.5.8-pve14~bpo80?

our Grub packages are not compiled against ZFS, because linking with libzpool / libzfs is not allowed license-wise. Grub calls the zpool binary itself to determine the pool layout (in grub-probe, etc..), and has its own small ZFS implementation (for the actual bootloader part).

the -pve14 packages just contain some minor bug fixes (moving zed to /usr/sbin, fixing the PATH in the monthly scrub cron job), so I don't really see how they could affect the boot behaviour.
 
I'm facing exactly the same issue with HP server. Any idea how to boot up (at least)
All the VMs are stored as ZFS dataset on the same pool, so no way to reinstall PVE(((
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!