ZFS RAID Recovery Issue while installing grub

kpneema

New Member
Jun 25, 2020
4
0
1
32
Hello,

I have configured RAID-1 while replacing the new disk with old disk not able to boot the OS after selecting the new disk as boot priority, Below is the process that we have tried from our end --

root@host250:~# zpool status
pool: rpool
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub repaired 0B in 0 days 00:00:37 with 0 errors on Thu Jun 25 11:57:25 2020
config:


NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part3 ONLINE 0 0 0
scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-part3 FAULTED 0 0 0 external device fault


root@host250:~# zpool offline -f -t rpool scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-part3

root@host250:~# sgdisk --replicate=/dev/sda /dev/sdb
Creating new GPT entries.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.

root@host250:~# sgdisk --randomize-guids /dev/sdb
Creating new GPT entries.
The operation has completed successfully.

root@host250:~# grub-install /dev/sdb
Installing for i386-pc platform.
grub-install: error: cannot find a GRUB drive for /dev/sda3. Check your device.map.

root@host250:~# zpool replace -f rpool scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-part3 /dev/sdb
Make sure to wait until resilver is done before rebooting.

root@host250:~# watch -n1 zpool status
root@host250:~#

root@host250:~# zpool status
pool: rpool
state: ONLINE
scan: resilvered 970M in 0 days 00:00:24 with 0 errors on Thu Jun 25 12:06:12 2020
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part3 ONLINE 0 0 0
sdb ONLINE 0 0 0

errors: No known data errors
 
Hi,

you need also the data from part 1 and part 2.

use dd to copy the boot and efi part then install grub on sdb
 
Hello,

We did that via sgdisk command and after that install the grab on sdb.
 
Can you please give me the example of dd command like how can i copy that partition by dd command??
 
Can you please give me the example of dd command like how can i copy that partition by dd command??

Assuming sda is your working disk and the sdb is the new disk.

Code:
dd if=/dev/sda1 of=/dev/sdb1 bs=64k
dd if=/dev/sda2 of=/dev/sdb2 bs=64k
 
use dd to copy the boot and efi part then install grub on sdb

I was under the impression that grub-install on a disk installed everything related to the boot partition. Not sure about the EFI. I replaced a bunch of disk in the years and never did any DD of the boot partition. Never had an issue so far and pretty sure I tested it back in the days. How can I confirm that my boot partition is not empty ? If I DD the BOOT partition to a file I can clearly see stuff in there.
 
I was under the impression that grub-install on a disk installed everything related to the boot partition.

Grub is a multi stage bootloader which needs additional stages on a separate grub partition or an ordinary filesystem.
You did not create one, so there is no filesystem. Therefore, you need to copy it - the grub stages are already on the filesystem on the other disk, so copying will just copy everything.

I replaced a bunch of disk in the years and never did any DD of the boot partition. Never had an issue so far and pretty sure I tested it back in the days.

With non-ZFS, there is no problem, but Grub cannot be installed directly on ZFS, therefore you need a grub partition or filesystem. In case of EFI, this is the default on any computer that was built this decade (2011-2020) if not explicitly deactivated. If EFI is enabled, you will have a fat32 boot partition that also holds the grub stages.

How can I confirm that my boot partition is not empty ? If I DD the BOOT partition to a file I can clearly see stuff in there.

If it's a grub partition, you cannot "see" directly, maybe in a hex editor. Otherwise, it is a filesystem that can be mounted.
 
Grub is a multi stage bootloader which needs additional stages on a separate grub partition or an ordinary filesystem.
You did not create one, so there is no filesystem. Therefore, you need to copy it - the grub stages are already on the filesystem on the other disk, so copying will just copy everything.



With non-ZFS, there is no problem, but Grub cannot be installed directly on ZFS, therefore you need a grub partition or filesystem. In case of EFI, this is the default on any computer that was built this decade (2011-2020) if not explicitly deactivated. If EFI is enabled, you will have a fat32 boot partition that also holds the grub stages.



If it's a grub partition, you cannot "see" directly, maybe in a hex editor. Otherwise, it is a filesystem that can be mounted.
Thanks for the precision !
 
Alright just did the following:
  1. Fresh PVE with a single zfs raid 0 disk
  2. Plugged new disk
  3. Copy partition scheme using sgdisk with -R
  4. Generate new UUID also using sgdisk and --randomize-guids
  5. Grub-install on the new disk
  6. Then I attach the new disk to the pool to convert it to a mirror.
At this point I didn't dd the EFI and BIOS boot partition
Code:
Device       Start       End   Sectors  Size Type
/dev/sdb1       34      2047      2014 1007K BIOS boot
/dev/sdb2     2048   1050623   1048576  512M EFI System
/dev/sdb3  1050624 117231374 116180751 55.4G Solaris /usr & Apple ZFS

Code:
    NAME                                            STATE     READ WRITE CKSUM
    rpool                                           ONLINE       0     0     0
      mirror-0                                      ONLINE       0     0     0
        ata-MK0060EAVDR_DCF3X00942SE942B4737-part3  ONLINE       0     0     0
        ata-MK0060EAVDR_DCF3Q00942SE942B4400-part3  ONLINE       0     0     0


Then I shutdown the hypervisor and removed sda which is the original disk that Proxmox was installed to. My understanding is that with only sdb left and since we didn't DD the bios boot and EFI partition it shouldn't be able to boot but it does without an issue except the degraded pool.

Code:
    NAME                                            STATE     READ WRITE CKSUM
    rpool                                           DEGRADED     0     0     0
      mirror-0                                      DEGRADED     0     0     0
        80828583320054935                           UNAVAIL      0     0     0  was /dev/disk/by-id/ata-MK0060EAVDR_DCF3X00942SE942B4737-part3
        ata-MK0060EAVDR_DCF3Q00942SE942B4400-part3  ONLINE       0     0     0

Any pointers here ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!