ZFS (RAID1) primary drive fail - rpool boot recovery

raymov

Renowned Member
Dec 17, 2015
6
0
66
To simulate a RAID1 mirror failure, we zero filled /dev/sda
However now fails to boot from the mirror with the following error:
error: no such device: 6290ba42d829d473.
Entering rescue mode...
grub rescue>
ls (hd0,gpt2)
(hd0,gpt2): Filesystem is unknown

Where is the actual device number stored
?

Does this need to be updated somewhere to successfully boot from the second HDD in the mirror?

proxmox no such device.png

We can see the following:

Code:
$ fdisk -l /dev/sda
Disk /dev/sda: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: DD079FEE-3627-495A-8AE0-232E4C1E98B2

Device          Start        End    Sectors   Size Type
/dev/sda1          34       2047       2014  1007K BIOS boot
/dev/sda2        2048 1953508749 1953506702 931.5G Solaris /usr & Apple ZFS
/dev/sda9  1953508750 1953525134      16385     8M Solaris reserved 1

$ fdisk -l /dev/sdb
Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

$ sudo zpool status
  pool: rpool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon Jan 18 04:10:23 2016
config:

        NAME                      STATE     READ WRITE CKSUM
        rpool                     DEGRADED     0     0     0
          mirror-0                DEGRADED     0     0     0
            17690898003920750604  FAULTED      0     0     0  was /dev/sda2
            sda2                  ONLINE       0     0     0
        logs
          nvme1n1p3               ONLINE       0     0     0
        cache
          nvme1n1p4               ONLINE       0     0     0

Tried the following: Grub2 recovery on ZFS Proxmox VE 3.4

However get an error on the following step:
$ update-grub2
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.2.6-1-pve
Found initrd image: /boot/initrd.img-4.2.6-1-pve

/usr/sbin/grub-probe: error: unknown filesystem.
/usr/sbin/grub-probe: error: unknown filesystem.
Found memtest86+ image: /ROOT/pve-1@/boot/memtest86+.bin
Found memtest86+ multiboot image: /ROOT/pve-1@/boot/memtest86+_multiboot.bin
done


How do we successfully boot in a degraded state from the new /dev/sda (previously recognised as /dev/sdb) before the original /dev/sda is "replaced" and re-mirrored?
 
Replaced hard disk, started debug mode:

zpool replace rpool /dev/sdb
/dev/sdb does not contain an EFI label but it may containt partition information in the MBR.

This was easily resolved by copying the GPT (GUID Partition Table):
http://askubuntu.com/questions/5790...rtition-scheme-from-one-hard-drive-to-another

In summary, these were the steps used:
sgdisk -l /dev/sdb (make sure disk isn't being used before overwriting GPT)
sgdisk --backup=table /dev/sda
sgdisk --load-backup=table /dev/sdb
sgdisk -G /dev/sdb
sgdisk -l /dev/sdb

Code:
Disk /dev/sdb: 1953525168 sectors, 931.5 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): F32173B6-3154-403D-9C05-2EF1A2319B7B
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 8-sector boundaries
Total free space is 0 sectors (0 bytes)

Number  Start (sector)    End (sector)  Size       Code  Name
   1              34            2047   1007.0 KiB  EF02
   2            2048      1953508749   931.5 GiB   BF01  zfs
   9      1953508750      1953525134   8.0 MiB     BF07

zpool replace rpool 17690898003920750604 /dev/sdb2


However system still didn't boot, found this helpful:
http://askubuntu.com/questions/1714...er-restore-from-another-machine/171486#171486

In summary, these were the steps used:
zpool import rpool
cannot mount '/': directory is not empty
zfs set mountpoint=/mnt rpool/ROOT/pve-1
zfs mount rpool/ROOT/pve-1


lsblk -f | grep -E 'sd|nvme1'
Code:
sda
├─sda1
├─sda2         zfs_member  rpool 7102381408620500083
└─sda9
sdb
├─sdb1
├─sdb2         zfs_member  rpool 7102381408620500083
└─sdb9
nvme1n1        zfs_member
├─nvme1n1p2    zfs_member  ssd   6845348448452068965
├─nvme1n1p3    zfs_member  rpool 7102381408620500083
└─nvme1n1p4

cp -p /mnt/boot/grub/grub.cfg /mnt/boot/grub/grub.cfg.orig (create backup)
vi /mnt/boot/grub/grub.cfg
Updated the UUID entries with the rpool results shown from lsblk -f (shown above)
grub-install --root-directory=/mnt /dev/sda
grub-install --root-directory=/mnt /dev/sdb
(both drives in mirror)
zfs unmount rpool/ROOT/pve-1
zfs set mountpoint=/ rpool/ROOT/pve-1
(revert mountpoint back to /)
zpool export rpool
exit
(restart)

Got proxmox booting successfully, however when I do the following:
grub-install /dev/sda
grub-install /dev/sdb

The system fails to reboot, reverting back to the UUID shown in the original post above, any ideas where it's getting this from?

Couldn't find anything obvious searching the filesystem with the UUID:
time sudo grep -Hr 6290ba42d829d473 / 2> error.txt
 
Hi!

How boot up the degraded pool to work on it? I have only grub-rescue. Can i start from here? I think the ZFS raid is NOT a raid. IMHO the RAID1 can work with one disk. The ZFS system is dead immediately. Why use this system? Or why they decepted me by the "RAID" acronym?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!