Need some help to recover degraded ZFS raid

Emil Makariev

Member
Jun 17, 2016
10
0
21
35
Hello guys! I have a bit of an issue with ZFS on proxmox 4.4...
Yesterday, I tried to expand my zfs pool. Originally has 4HDD's in Raid 10.
What I did was:

zpool add rpool mirror /dev/sde /dev/sdf

then I wanted to add 1 SSD as Cache:
zpool add rpool cache /dev/nvme0n1

all the disks appeared online:

root@supermicro:~# zpool status
pool: rpool
state: ONLINE
scan: scrub repaired 184K in 3h59m with 0 errors on Sun May 14 04:23:54 2017
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-WDC_WD1003FZEX-00MK2A0_WD-WCC3F5HSPX2S-part2 ONLINE 0 0 0
ata-WDC_WD10EZEX-21WN4A0_WCC6Y6UR4A5V-part2 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-WDC_WD1003FZEX-00MK2A0_WD-WCC3F2JZ5LXR ONLINE 0 0 0
ata-WDC_WD10EZEX-21WN4A0_WCC6Y0KA3X41 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
cache
nvme0n1 ONLINE 0 0 0

After that, I rebooted the system, to be sure that everything will works and received the error:

Loading, please wait...
PANIC: blkptr at ffff88100eca4848 DVA 1 has invalid VDEV 2

There is no console or some command line that I can use, so i tried to boot from USB as rescue system:

root@USB:~# zpool status
no pools available
root@USB:~# zpool import rpool
cannot import 'rpool': one or more devices is currently unavailable

root@USB:~# zpool import -d /dev/disk/by-id/
pool: rpool
id: 17715479149632960048
state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
see: http://zfsonlinux.org/msg/ZFS-8000-6X
config:

rpool UNAVAIL missing device
mirror-0 ONLINE
ata-WDC_WD1003FZEX-00MK2A0_WD-WCC3F5HSPX2S-part2 ONLINE
ata-WDC_WD10EZEX-21WN4A0_WCC6Y6UR4A5V-part2 ONLINE
mirror-1 ONLINE
ata-WDC_WD1003FZEX-00MK2A0_WD-WCC3F2JZ5LXR ONLINE
ata-WDC_WD10EZEX-21WN4A0_WCC6Y0KA3X41 ONLINE
cache
lvm-pv-uuid-Rabs6y-HGip-lnW2-6XMT-2kmd-08gQ-6UZWDe

Additional devices are known to be part of this pool, though their
exact configuration cannot be determined.

Any idea, how can I boot the system, even only with my old 4 HDDs? or how can I bring up the status of the pool to online? I tried also to remove the last pair physically. Same shit...

Thanks a lot in advance!
 
I had a similar error when I had my cache and log devices on LVM volumes. I changed them to use simple partitions and the pool was imported at boot. It was possible to fix in other ways but the solution was sufficient.

I recommend doing the same, though your situation is different as your mirror-2 is missing entirely. Are your sde, sdf devices visible in the system?
 
what does "zpool import" say?
 
do the zpool import -N -d /dev/disk/by-id/ trick to import the pool, then`exit` to continue the boot, then `update-grub` or `update-initramfs -u` (I don't remember, do both :-) after to get the /etc/zfs/zpool.cache in the boot image.
 
Hello, thanks for the response! I had no time, so I did reinstall and restore from backup

That also works :D

Just consider to get familiar with recovery process. You learned now the hard way and had downtime, recovery time and maybe data loss. You can play around inside a VM with all aspects of running Proxmox VE including ZFS disks. You can even hotplug then online to "play around". Comes in really handy at "simulating" stuff.