[SOLVED] Proxmox node does not load after reboot following shutdown

wayfarerdev

Member
Feb 9, 2023
9
4
8
Hello,

I have one node in a 3 node cluster that did not load Proxmox successfully after rebooting following a shutdown.

When I run
Code:
zpool import
from a bootable Proxmox flashdrive, I receive the following messages:

error-01.jpg
error-02jpg.jpg

Here are my questions:
1. Based on the pool state of ONLINE, am I correct in thinking the pools are likely intact and the data is OK?
2. How can I troubleshoot the problem with the node booting and get it back into my cluster?

Thank you!
 
the pool called rpool seems to have one broken vdev that might need to be replaced asap. what does the system say if you attempt to boot? any concrete errors?
 
@fabian The system does not boot into Proxmox. I was able to get this output from the debug console in the installer.

I imported the pool using the command:
Code:
zpool import rpool
.

Then, I waited for resilvering to complete overnight.

I then had the following output from
Code:
zpool status
:
IMG_0750.jpg

I then tried to online the removed drive: zpool online rpool ata-ST3000DM001-1CH166_Z1F41BV0-part3

Doing so resulted in the following:
IMG_0752.jpg

Based on this, I'm thinking I need to replace the failed device, correct?
 
An update:
I removed the failed drive. I was then able to boot Proxmox normally off the good drive in the mirror and replace the drive.

From there, I could use the following commands (based on the Proxmox documentation) to restore the boot pool:
1. fdisk -l | less
2. sgdisk /dev/sda -R /dev/sdg
3. sgdisk -G /dev/sdg
4. zpool list
5. zpool replace rpool ata-ST3000DM001-1CH166_Z1f41BV0-part3 ata-ST4000VN006-3CW104_ZW634FN9-part3

That triggered the resilver process. Once the resilver process completed, I could fix the EFI partition:
6. proxmox-boot-tool status
7. proxmox-boot-tool format /dev/disk/by-id/ata-ST4000VN006-3CW104_ZW634FN9-part2
8. proxmox-boot-tool init /dev/disk/by-id/ata-ST4000VN006-3CW104_ZW634FN9-part2
9. proxmox-boot-tool clean

For command 8 to succeed, I had to first install systemd-boot since I'm using EFI and not grub.

This thread was quite helpful as a reference once I got the failed drive issue isolated. These pages from Oracle's ZFS documentation and elsewhere on Proxmox Forums were also helpful references:
 
  • Like
Reactions: fabian and UdoB