PBS fails to boot when simulating failure of a disk of zfs OS mirror.

ces

New Member
Jan 29, 2024
5
0
1
Hello,

I am testing PBS with different scenarios of hardware failure.
So far I've already managed to recover Data from a zfs pool/datastore of a failed PBS by reinstalling a new PBS, importing the pool, take ownership of the snapshots and editing the datastore.cfg to point to the path of the datastore and adjusting the config of the new PBS in PVE. This worked fine.
As recommended in the docs and in different threads in this forum, I installed the PBS OS on a zfs mirror with two ssds. To simulate a disk failure I shutdown the Server, disconnected one SSD of the zfs OS mirror, booted again and got the following message:

pbs.jpg

I thought that the root filesystem is mirrored by ZFS and bootloaders will be kept in sync by the proxmox-boot-tool so the PBS could boot from that remaining disk (the main reason for using a mirror).

Could somone please explain what exactly happened here? And how would be the next step to get to a solution other than reinstalling PBS, because "Destroy and re-create the pool from a backup source" could be tricky as the pool is the PBS OS itself.
Thank you in advance.
Tobias
 
Yes it is a clean install of pbs with just 2 identical ssds for the os as a zfs mirror created during installation and 2 HDD for Data also as zfs mirror.
The mentioned rpool in the screenshot is the one for the os.

EDIT: Btw when I reconnect the ssd, PBS boots up normally again.
 
Last edited:
In the meantime, I have deleted the data pool (sdc and sdd) because I am testing a different scenario. However, the boot pool is still untouched:

root@pbs:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 119.2G 0 disk
├─sda1 8:1 0 1007K 0 part
├─sda2 8:2 0 1G 0 part
└─sda3 8:3 0 118.2G 0 part
sdb 8:16 0 119.2G 0 disk
├─sdb1 8:17 0 1007K 0 part
├─sdb2 8:18 0 1G 0 part
└─sdb3 8:19 0 118.2G 0 part
sdc 8:32 0 1.8T 0 disk
sdd 8:48 0 1.8T 0 disk
sde 8:64 1 0B 0 disk
sdf 8:80 1 0B 0 disk
sdg 8:96 1 0B 0 disk

root@pbs:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 2.08G 227G 96K /rpool
rpool/ROOT 2.07G 227G 96K /rpool/ROOT
rpool/ROOT/pbs-1 2.07G 227G 2.07G /
root@pbs:~#
 
Last edited:
root@pbs:~# zpool status
pool: rpool
state: ONLINE
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
ata-550_S3_493502504835027-part3 ONLINE 0 0 0
ata-550_S3_493502504835020-part3 ONLINE 0 0 0

errors: No known data errors

root@pbs:~# zpool get cachefile
NAME PROPERTY VALUE SOURCE
rpool cachefile - default
 
Last edited:
Über systemctl | grep zfs fand ich zwar einen Fehler, dieser bezieht sich jedoch auf das im GUI gelöschte Datenpool(test2):

root@pbs:~# systemctl | grep zfs
zfs-import-cache.service loaded active exited Import ZFS pools by cache file
● zfs-import@test2.service loaded failed failed Import ZFS pool test2
zfs-mount.service loaded active exited Mount ZFS filesystems
zfs-share.service loaded active exited ZFS file system shares
zfs-volume-wait.service loaded active exited Wait for ZFS Volume (zvol) links in /dev
zfs-zed.service loaded active running ZFS Event Daemon (zed)
system-zfs\x2dimport.slice loaded active active Slice /system/zfs-import
zfs-import.target loaded active active ZFS pool import target
zfs-volumes.target loaded active active ZFS volumes are ready
zfs.target loaded active active ZFS startup target
 
was sagt denn "lsblk" und "zpool import" (das importiert nicht, sondern zeigt nur was importierbar waere) in der initramfs shell?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!