Replacing boot disk problem

simonv92

New Member
Mar 14, 2020
7
0
1
32
Hi all! I have run my proxmox server without problem for month.
One day 1 of the 2 drives used to install Proxmox has failed. I've replaced the broken on, copy all file from the good disk and now I have a green check on rpool health.
But I have a strange problem on boot; if I leave both disk connected I have the error shown in the image below. If I disconnect the new drive Proxmox boot normally.
So I think I have to rebuild the boot mirror configuration but I don't know how to do that.
Hope you can help me!
Best regards

Simone
 

Attachments

  • shell.jpg
    shell.jpg
    163.4 KB · Views: 50
You used two disks so I guess you used ZFS and not hardware raid? If yes, did you follow the instructions from the wiki/documentation?:

Changing a failed bootable device​

Depending on how Proxmox VE was installed it is either using proxmox-boot-tool [1] or plain grub as bootloader (see Host Bootloader). You can check by running:
# proxmox-boot-tool status
The first steps of copying the partition table, reissuing GUIDs and replacing the ZFS partition are the same. To make the system bootable from the new disk, different steps are needed which depend on the bootloader in use.
# sgdisk <healthy bootable device> -R <new device>
# sgdisk -G <new device>
# zpool replace -f <pool> <old zfs partition> <new zfs partition>
Use the zpool status -v command to monitor how far the resilvering process of the new disk has progressed.
With proxmox-boot-tool:
# proxmox-boot-tool format <new disk's ESP>
# proxmox-boot-tool init <new disk's ESP>
ESP stands for EFI System Partition, which is setup as partition #2 on bootable disks setup by the Proxmox VE installer since version 5.4. For details, see Setting up a new partition for use as synced ESP.
With grub:
# grub-install <new disk>
 
Hi, thank you for your reply!
Yes I've used ZFS and not Raid, I've not followed all the instructions...
if I run #proxmox-boot-tool status I have the following:
Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace.. System currently booted with uefi WARN: /dev/disk/by-uuid/0E38-4D77 does not exist - clean '/etc/kernel/proxmox-boot-uuids'! - skipping 0E38-A8FD is configured with: uefi (versions: 5.11.22-5-pve, 5.4.140-1-pve

What can I do? thank you!
 
Hi, did you ever get a reply?
I'm currently in the same situation and i've no idea how to fix this.
I didn't use the proxmox boot tool and just used zpool replace XXX XXX
Is there a way to rectify this situation?
 
Then your replaced disk is partitioned wrong. When just replacing the whole disk using "zpool replace" your disk will end up with only 2 partitions. Partition 1 for ZFS and partition 9 for something else (not sure what is used for).
When doing it correctly according to "Changing a failed bootable device" like described here (https://pve.proxmox.com/wiki/ZFS_on_Linux#_zfs_administration) the disk would end up with 3 partitions. Partition 1 for the grub bootloader, partition 2 for ESP (systemd bootloader) and partition 3 for ZFS. So your pool will work fine but you are missing the 2 Partitions needed for booting. So your replaced disk isn't able to boot anymore. Wouldn't be a problem if you got a striped mirror or raidz1/2/3 but bad if its just a mirror as then there wouldn't be any redundancy for the bootloader.

Didn't tested it but I guess you would need to remove that replaced disk form the pool, wipe it und then follow the instructions linked above to replace the disk the correct way.
 
Hi, did you ever get a reply?
I'm currently in the same situation and i've no idea how to fix this.
I didn't use the proxmox boot tool and just used zpool replace XXX XXX
Is there a way to rectify this situation?
I found the following on reddit and it solved my issue:

Check ID's for disks / partition with `ls -lah /dev/disk/by-id`. My original ssd was /dev/nvme0n1 and my new (the old Freenas one I want to add to my array) is /dev/nvme1n1 !! Your drive names may be different !!

# Copy partitition table and randomize guids
sgdisk /dev/nvme0n1 -R /dev/nvme1n1
sgdisk --randomize-guids /dev/nvme1n1

# Setup the UEFI bootloader
pve-efiboot-tool format /dev/nvme1n1p2 --force
pve-efiboot-tool init /dev/nvme1n1p2

# Attach to rpool and wait for the resilver (should be pretty fast with 128G nvme SSD's) !! Your ID's will be differen !!
zpool attach rpool nvme-eui.6479a7304169387a-part3 nvme-eui.6479a7113026c42c-part3

# zpool status -v
pool: rpool
state: ONLINE
scan: resilvered 3.40G in 0 days 00:00:16 with 0 errors on Wed Jul 1 20:54:48 2020
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme-eui.6479a7304169387a-part3 ONLINE 0 0 0
nvme-eui.6479a7113026c42c-part3 ONLINE 0 0 0

16 seconds for resilvering. Told you I'd be fast enough.

Well, thanks you all. I hope this help someone else in the future.

Credit goes to
AirborneArie
on reddit!
 
Then your replaced disk is partitioned wrong. When just replacing the whole disk using "zpool replace" your disk will end up with only 2 partitions. Partition 1 for ZFS and partition 9 for something else (not sure what is used for).
When doing it correctly according to "Changing a failed bootable device" like described here (https://pve.proxmox.com/wiki/ZFS_on_Linux#_zfs_administration) the disk would end up with 3 partitions. Partition 1 for the grub bootloader, partition 2 for ESP (systemd bootloader) and partition 3 for ZFS. So your pool will work fine but you are missing the 2 Partitions needed for booting. So your replaced disk isn't able to boot anymore. Wouldn't be a problem if you got a striped mirror or raidz1/2/3 but bad if its just a mirror as then there wouldn't be any redundancy for the bootloader.

Didn't tested it but I guess you would need to remove that replaced disk form the pool, wipe it und then follow the instructions linked above to replace the disk the correct way.
I was indeed making that mistake, thank you for your guidance!
I forgot to check the post and found the issue on reddit, i posted it in previous post to add it here since my initial search led me here.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!