zpool replace doesnt replace bios and efi

jerrymoushu

New Member
Mar 29, 2021
9
0
1
44
i build a server with 4 disks
first 2 disk raid 1 zfs for proxmox os
the rest 2 disk raid 1 zfs for vm

i have no problem to replace zfs pool for vm however when i do the same for the proxmox os (found both bios and efi) partition missing
any easy way of doing natively? thanks
 
You need to manually partition the new system disk, copy over the ESP/grub and only to a zpool replace on partition 3. See here:

Changing a failed bootable device​

Depending on how Proxmox VE was installed it is either using proxmox-boot-tool [1] or plain grub as bootloader (see Host Bootloader). You can check by running:
# proxmox-boot-tool status
The first steps of copying the partition table, reissuing GUIDs and replacing the ZFS partition are the same. To make the system bootable from the new disk, different steps are needed which depend on the bootloader in use.
# sgdisk <healthy bootable device> -R <new device>
# sgdisk -G <new device>
# zpool replace -f <pool> <old zfs partition> <new zfs partition>
Use the zpool status -v command to monitor how far the resilvering process of the new disk has progressed.
With proxmox-boot-tool:
# proxmox-boot-tool format <new disk's ESP>
# proxmox-boot-tool init <new disk's ESP>
ESP stands for EFI System Partition, which is setup as partition #2 on bootable disks setup by the Proxmox VE installer since version 5.4. For details, see Setting up a new partition for use as synced ESP.
With grub:
# grub-install <new disk>
 
  • Like
Reactions: Windows7ge
in my case, if my raid 1 just degraded - i just have to sgdisk to replace everything over to the new disk and then zpool replace the part-3 (zfs) will do
as i assumed there will be no raid happening on bios and efi re-silvering process happen
 
Yes, there will be no raid for ESP/grub but you still want to copy that over from the healthy disk. Because otherwise your host won't boot anymore if you just resilver the ZFS partition and then the healthy disk fails which is the only one that got a bootloader.
 
almost there but failed to boot up.

proxmox-boot-tool format /dev/sdb2
UUID="52E5-2DB5" SIZE="536870912" FSTYPE="vfat" PARTTYPE="c12a7328-f81f-11d2-ba4b-00a0c93ec93b" PKNAME="sdb" MOUNTPOINT=""
E: '/dev/sdb2' contains a filesystem ('vfat') - exiting (use --force to override)

so i continue with --force

root@pve20:~# proxmox-boot-tool init /dev/sdb2 --force
Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
UUID="04C1-EB05" SIZE="536870912" FSTYPE="vfat" PARTTYPE="c12a7328-f81f-11d2-ba4b-00a0c93ec93b" PKNAME="sdb" MOUNTPOINT=""
Mounting '/dev/sdb2' on '/var/tmp/espmounts/04C1-EB05'.
Installing grub i386-pc target..
Installing for i386-pc platform.
Installation finished. No error reported.
Unmounting '/dev/sdb2'.
Adding '/dev/sdb2' to list of synced ESPs..
Refreshing kernels and initrds..
Running hook script 'proxmox-auto-removal'..
Running hook script 'zz-proxmox-boot'..
WARN: /dev/disk/by-uuid/021C-3CC9 does not exist - clean '/etc/kernel/proxmox-boot-uuids'! - skipping
Copying and configuring kernels on /dev/disk/by-uuid/04C1-EB05
Copying kernel 5.11.22-4-pve
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.11.22-4-pve
Found initrd image: /boot/initrd.img-5.11.22-4-pve
done
Copying and configuring kernels on /dev/disk/by-uuid/8DE4-E75E
Copying kernel 5.11.22-4-pve
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.11.22-4-pve
Found initrd image: /boot/initrd.img-5.11.22-4-pve
done
WARN: /dev/disk/by-uuid/8DE5-854A does not exist - clean '/etc/kernel/proxmox-boot-uuids'! - skipping
WARN: /dev/disk/by-uuid/FEC2-58C1 does not exist - clean '/etc/kernel/proxmox-boot-uuids'! - skipping

for reference

Disk /dev/sda: 465.76 GiB, 500107862016 bytes, 976773168 sectors
Disk model: Samsung SSD 850
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 4F3578CD-3197-4143-AF39-1041C86FB2C6

Device Start End Sectors Size Type
/dev/sda1 34 2047 2014 1007K BIOS boot
/dev/sda2 2048 1050623 1048576 512M EFI System
/dev/sda3 1050624 976773134 975722511 465.3G Solaris /usr & Apple ZFS


Disk /dev/sdb: 465.76 GiB, 500107862016 bytes, 976773168 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 44C6B86D-63AB-4F6D-981B-48FF2C340943

Device Start End Sectors Size Type
/dev/sdb1 34 2047 2014 1007K BIOS boot
/dev/sdb2 2048 1050623 1048576 512M EFI System
/dev/sdb3 1050624 976773134 975722511 465.3G Solaris /usr & Apple ZFS

everything just look fine, not sure which part i missed? could you spare your hands, thanks
 
@Dunuin It might be out of nowhere for you but a year later your post just make my late night and the rest of the week-end a heck of a lot less stressful.

One of my mirrored boot drives failed, luckily not the one with the boot-loader (NVMe so no hot swap). I learned tonight that ZFS really only copies the data partition which is disappointing but your list of commands enabled me to copy and configure the EFI partitions from the original drive over to a pair of replacements by swapping them out one at a time.

Words cannot express my gratitude. I was really preparing for the worst as soon as I discovered this ZFS caveat.
 
  • Like
Reactions: Dunuin
You need to manually partition the new system disk, copy over the ESP/grub and only to a zpool replace on partition 3. See here:
Apologies for bumping this thread, but I wanted to verify something. I recently swapped out a disk, overlooking the fact that merely replacing it doesn't also update the boot loader.

So, I've got a new disk installed, it's functioning smoothly with 11 out of 12 having both the boot loader and EFI partitions. It's configured with the proxmox-boot-loader.

I've since taken the disk offline with the command "zpool offline rpool <device-id>." to correct the disk itself

Now for my perhaps naive question: the disk has been resilvered and was operational before I took it offline. I want to ensure it's set up correctly.

Is it safe for me to run "proxmox-boot-loader format /dev/sdl" in this setup without impacting the other 11 drives? (sdl is the new disk). Following that, can I initiate it with "proxmox-boot-loader init /dev/sdl" and then reintegrate it into my "rpool" with "zpool replace"?

This is by the way; the output of proxmox-boot-tool status:
1712570183901.png
 
Last edited:
Is it safe for me to run "proxmox-boot-loader format /dev/sdl" in this setup without impacting the other 11 drives? (sdl is the new disk). Following that, can I initiate it with "proxmox-boot-loader init /dev/sdl" and then reintegrate it into my "rpool" with "zpool replace"?
Never run proxmox-boot-tool on the whole disk (/dev/sdl). Create the GRUB/bios and ESP and ZFS partitions first (using gdisk /dev/sdl) and then run it on /dev/sdl2 (probably it will be partition number 2) and attach the third partition to your ZFS pool.
See also section "Changing a failed bootable device" in the ZFS chapter of the Proxmox manual: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_zfs
 
  • Like
Reactions: WindowLock
Never run proxmox-boot-tool on the whole disk (/dev/sdl). Create the GRUB/bios and ESP and ZFS partitions first (using gdisk /dev/sdl) and then run it on /dev/sdl2 (probably it will be partition number 2) and attach the third partition to your ZFS pool.
See also section "Changing a failed bootable device" in the ZFS chapter of the Proxmox manual: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_zfs
Thanks for your reply! The current situation or partitions are;

1712570379675.png

These are created by the zpool replace command. So .... I suggest I have to wipe the disk entirely first, and then create the GRUB, ESP and ZFS partitions?

The reason why I am doubting right now about the right implementation is because I use the proxmox-boot-tool.. and in that manual you refer to; it says different steps are needed which depend on the bootloader in use; then they refer to "With proxmox-boot-tool" which is in this matter confusing to me and I don't want to mess up this pool.

1712570620126.png
 
The curial part is the sgdisk-based copy of the partition layout. This implies what you have to do next and your output shows that you didn't run the partition table copy ...
 
  • Like
Reactions: WindowLock
The curial part is the sgdisk-based copy of the partition layout. This implies what you have to do next and your output shows that you didn't run the partition table copy ...
So, small sum-up what to do now.

1) Bring the device offline (already did)
2) Select a healthy device eq /dev/sdc for example and run; sgdisk /dev/sdc -R /dev/sdl (where /dev/sdl is the new device) - format is not needed?
3) Initiate with sgdisk -G /dev/sdl
4) Add it to the pool again using "zpool replace -f rpool /dev/sdl1 /dev/sdl3"

/dev/sdl1 is the current ZFS partition before taking the disk offline - the sdl3 will be the zfs partition because of the copy with sgdisk from /dev/sdc to /dev/sdl

So in my book; that would be the reason for 4) which replaces 1 with 3..


So, I ran the steps 1, 2 and 3. Now after refreshing the "Disks" page:
1712574335611.png

I have 2 ZFS partitions there. (In the rpool the disk is still offline/removed because I haven't ran the replace command yet)

Will this be corrected once the replace has taken place? Or should I use the "Wipe disk" ? It seems like the naming convention is not taking over from the SDC drive..
 
Last edited:
Okay; so; now my disk is initiated. The renaming also happened after resilvering the disk and it's online and healthy

SDL1 => BOOT
SDL2 => EFI
SDL3 => ZFS

But, proxmox-boot-tool still does not show this 12th drive as proxmox boot; and proxmox-boot-tool format /dev/sdl2 is not working and replying an error message
UUID="18254553461174802073" SIZE="1073741824" FSTYPE="zfs_member" PARTTYPE="c12a7328-f81f-11d2-ba4b-00a0c93ec93b" PKNAME="sdl" MOUNTPOINT=""
E: '/dev/sdl2' contains a filesystem ('zfs_member') - exiting (use --force to override)

According to the latest information I have found, proxmox-boot-tool format is mandatory to execute on the EFI partition ( https://forum.proxmox.com/threads/guide-change-disk.141203/ )

Unfortunately also "init /dev/sdl2" is not working because it mentions wrong filesystem
[I]UUID="18254553461174802073" SIZE="1073741824" FSTYPE="zfs_member" PARTTYPE="c12a7328-f81f-11d2-ba4b-00a0c93ec93b" PKNAME="sdl" MOUNTPOINT=""[/I]
E: '/dev/sdl2' has wrong filesystem (!= vfat)

Is it just a matter of proxmox-boot-tool refresh to apply the configuration to the UUID list? Or do I need to "force" the format.
 
Last edited:
Okay; so; now my disk is initiated. The renaming also happened after resilvering the disk and it's online and healthy

SDL1 => BOOT
SDL2 => EFI
SDL3 => ZFS

But, proxmox-boot-tool still does not show this 12th drive as proxmox boot; and proxmox-boot-tool format /dev/sdl2 is not working and replying an error message
UUID="18254553461174802073" SIZE="1073741824" FSTYPE="zfs_member" PARTTYPE="c12a7328-f81f-11d2-ba4b-00a0c93ec93b" PKNAME="sdl" MOUNTPOINT=""
E: '/dev/sdl2' contains a filesystem ('zfs_member') - exiting (use --force to override)
That's probably because of your previous attempt. Use proxmox-boot-tool format --force /dev/sdl2 to override, as the message suggest.
According to the latest information I have found, proxmox-boot-tool format is mandatory to execute on the EFI partition ( https://forum.proxmox.com/threads/guide-change-disk.141203/ )

Unfortunately also "init /dev/sdl2" is not working because it mentions wrong filesystem
UUID="18254553461174802073" SIZE="1073741824" FSTYPE="zfs_member" PARTTYPE="c12a7328-f81f-11d2-ba4b-00a0c93ec93b" PKNAME="sdl" MOUNTPOINT=""
E: '/dev/sdl2' has wrong filesystem (!= vfat)
This is because ormat failed. Fix that first by forcing it.

PS: Please use inline-code instead of italics for commands and output, as it is much easier to read.
 
That's probably because of your previous attempt. Use proxmox-boot-tool format --force /dev/sdl2 to override, as the message suggest.

This is because ormat failed. Fix that first by forcing it.

PS: Please use inline-code instead of italics for commands and output, as it is much easier to read.
Thanks for the assistance :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!