Hello Fabian,
grub unfortunately does not handle missing devices gracefully. is this message followed by an initramfs prompt where you can enter commands?
I was receiving this message after the initramfs, but this was
before I had removed the SSD cache device. Due to the fact that device mappings were messed up at this point of time I wasn't able to import the rpool, and didn't want to force the import. Now, doing the aftermath, I think that the reason for the messed up device mappings was that I had
hot-added the SSD cache. During normal operations the SSD received the device name /dev/sdh and I was uncauticious enough to use the /dev/sdh1 and /dev/sdh2 partitions as log devices for both zpools. But then, after the reboot, the SSD received another device name. I'm almost sure that this problem would never have occured if I had used /dev/disk/by-id/<ID>-part1 and /dev/disk/by-id/<ID>-part2 equivalents when adding the cache devices to the zpools.
Anyway, this is how I managed to recover from this "I have a shiny new PVE 4.2 installed, but I'm not able to boot..." situation:
First I removed the SSD cache device and tried to boot the PVE 4.2 installer using debug mode in order to remove the cache devices from both zpools and then re-install GRUB. For an unknown reason, after reaching the first shell prompt, it was simply unpossible to input anything via keyboard - no matter if I was using HP iLO Java console or the directly attached USB keyboard - I was simply stuck at that prompt. Pretty strange and discouraging, but I had to find some way out of this...
After a short research on the Internet I found a
SystemRescueCD with ZFS 0.6.5 built into it. Thanks to the Funtoo Linux folks out there for providing it to the communuity. After booting the SystemRescueCD I went like this until I finally managed to do what I hoped to be able to do with the PVE 4.2 installer:
Code:
root@sysresccd /root % zpool import -a
cannot import 'rpool': pool may be in use from other system, it was last accessed by pve (hostid: 0xa8c02302) on Tue May 3 08:49:50 2016
use '-f' to import anyway
cannot import 'tank': pool may be in use from other system, it was last accessed by pve (hostid: 0xa8c02302) on Tue May 3 08:47:44 2016
use '-f' to import anyway
root@sysresccd /root % zpool import -a -f
The devices below are missing, use '-m' to import the pool anyway:
sdh1 [log]
cannot import 'rpool': one or more devices is currently unavailable
The devices below are missing, use '-m' to import the pool anyway:
sdh2 [log]
cannot import 'tank': one or more devices is currently unavailable
root@sysresccd /root % zpool import -a -f -m
cannot mount '/': directory is not empty
root@sysresccd /root % zpool status
pool: rpool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: scrub repaired 0 in 0h23m with 0 errors on Sat Nov 21 10:42:06 2015
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 ONLINE 0 0 0
sda2 ONLINE 0 0 0
sdb2 ONLINE 0 0 0
logs
10008429943656705133 UNAVAIL 0 0 0 was /dev/sdh1
errors: No known data errors
pool: tank
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
mirror-0 ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
logs
4585512667122472357 UNAVAIL 0 0 0 was /dev/sdh2
errors: No known data errors
So, both zpools have been imported, but are in degraded state because of the missing SSD cache devices, but removing them is pretty simple:
Code:
root@sysresccd /root % zpool remove rpool 10008429943656705133
root@sysresccd /root % zpool remove tank 4585512667122472357
root@sysresccd /root % zpool status
pool: rpool
state: ONLINE
scan: scrub repaired 0 in 0h23m with 0 errors on Sat Nov 21 10:42:06 2015
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda2 ONLINE 0 0 0
sdb2 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
errors: No known data errors
OK, now both zpools seem to be doing fine. Let's try to re-install GRUB on both rpool disks as per
onlime's
post:
Code:
root@sysresccd /root % mkdir /mnt/pve
root@sysresccd /root % zfs set mountpoint=/mnt/pve rpool/ROOT/pve-1
root@sysresccd /root % zfs mount rpool/ROOT/pve-1
root@sysresccd /root % mount -t proc /proc /mnt/pve/proc
root@sysresccd /root % mount --rbind /dev /mnt/pve/dev
root@sysresccd /root % mount --rbind /sys /mnt/pve/sys
root@sysresccd /root % chroot /mnt/pve /bin/bash
root@sysresccd:/# source /etc/profile
root@sysresccd:/# grub-install /dev/sda
Installing for i386-pc platform.
Installation finished. No error reported.
root@sysresccd:/# grub-install /dev/sdb
Installing for i386-pc platform.
Installation finished. No error reported.
root@sysresccd:/# update-grub
update-grub update-grub2
root@sysresccd:/# update-grub2
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.4.6-1-pve
Found initrd image: /boot/initrd.img-4.4.6-1-pve
Found linux image: /boot/vmlinuz-4.2.8-1-pve
Found initrd image: /boot/initrd.img-4.2.8-1-pve
Found linux image: /boot/vmlinuz-4.2.6-1-pve
Found initrd image: /boot/initrd.img-4.2.6-1-pve
Found linux image: /boot/vmlinuz-4.2.3-2-pve
Found initrd image: /boot/initrd.img-4.2.3-2-pve
Found linux image: /boot/vmlinuz-4.2.2-1-pve
Found initrd image: /boot/initrd.img-4.2.2-1-pve
Found memtest86+ image: /ROOT/pve-1@/boot/memtest86+.bin
Found memtest86+ multiboot image: /ROOT/pve-1@/boot/memtest86+_multiboot.bin
done
root@sysresccd:/# update-initramfs -u
update-initramfs: Generating /boot/initrd.img-4.4.6-1-pve
root@sysresccd:/# exit
root@sysresccd /root % umount /mnt/pve/sys/fs/fuse/connections
root@sysresccd /root % umount /mnt/pve/sys/kernel/config
root@sysresccd /root % umount /mnt/pve/sys/kernel/debug
root@sysresccd /root % umount /mnt/pve/sys/kernel/security
root@sysresccd /root % umount /mnt/pve/sys
root@sysresccd /root % umount /mnt/pve/dev/shm
root@sysresccd /root % umount /mnt/pve/dev/pts
root@sysresccd /root % umount /mnt/pve/dev/mqueue
root@sysresccd /root % umount /mnt/pve/dev
It is
very important to move away the /etc/zfs/zpool.cache file because otherwise subsequent attempts to boot PVE 4.2 would fail with a kernel stack trace being shown and systemd waiting forever for the zpools to get imported:
Code:
root@sysresccd /root % mkdir /mnt/pve/root/backup && mv /etc/zfs/zpool.cache /mnt/pve/root/backup/zpool.cache
After unmounting rpool/ROOT/pve-1 it is important to set the mount point to / like it was before:
Code:
root@sysresccd /root % umount /mnt/pve
root@sysresccd /root % zfs set mountpoint=/ rpool/ROOT/pve-1
Now, finally I was able to boot PVE 4.2 and after doing a scrub on both zpools they now look like this:
Code:
pool: rpool
state: ONLINE
scan: scrub repaired 0 in 0h2m with 0 errors on Tue May 3 17:23:00 2016
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda2 ONLINE 0 0 0
sdb2 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub repaired 0 in 2h9m with 0 errors on Tue May 3 19:29:49 2016
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-WDC_WD30EFRX-68AX9N0_WD-WMC123456789 ONLINE 0 0 0
ata-WDC_WD30EFRX-68AX9N0_WD-WMC223456789 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-WDC_WD30EFRX-68AX9N0_WD-WMC323456789 ONLINE 0 0 0
ata-WDC_WD30EFRX-68AX9N0_WD-WMC423456789 ONLINE 0 0 0
errors: No known data errors
So, after about 6 hours of downtime, I finally managed to recover from this silly device mapping mismatch and I'm now enjoying the new PVE 4.2 whose Web UI, BTW, is looking pretty modern now!
Any comments and ideas on how to avoid such a scenario in future will be highly appreciated!
Also, it will be good if the PVE installation ISO/CD receives some sort of installer-independent rescue system, just for the case of emergency. As experience shows: Murphy is out there to get you!
Best regards,
Anymemm