PBS fails to boot when simulating failure of a disk of zfs OS mirror.

ces · Jun 6, 2024

Hello,

I am testing PBS with different scenarios of hardware failure.
So far I've already managed to recover Data from a zfs pool/datastore of a failed PBS by reinstalling a new PBS, importing the pool, take ownership of the snapshots and editing the datastore.cfg to point to the path of the datastore and adjusting the config of the new PBS in PVE. This worked fine.
As recommended in the docs and in different threads in this forum, I installed the PBS OS on a zfs mirror with two ssds. To simulate a disk failure I shutdown the Server, disconnected one SSD of the zfs OS mirror, booted again and got the following message:

I thought that the root filesystem is mirrored by ZFS and bootloaders will be kept in sync by the proxmox-boot-tool so the PBS could boot from that remaining disk (the main reason for using a mirror).

Could somone please explain what exactly happened here? And how would be the next step to get to a solution other than reinstalling PBS, because "Destroy and re-create the pool from a backup source" could be tricky as the pool is the PBS OS itself.
Thank you in advance.
Tobias

floh8 · Jun 6, 2024

Did u test with a clean installiert of PBS?

ces · Jun 6, 2024

Yes it is a clean install of pbs with just 2 identical ssds for the os as a zfs mirror created during installation and 2 HDD for Data also as zfs mirror.
The mentioned rpool in the screenshot is the one for the os.

EDIT: Btw when I reconnect the ssd, PBS boots up normally again.

floh8 · Jun 6, 2024

Show lsblk and zfs list when its booted

ces · Jun 6, 2024

In the meantime, I have deleted the data pool (sdc and sdd) because I am testing a different scenario. However, the boot pool is still untouched:

root@pbs:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 119.2G 0 disk
├─sda1 8:1 0 1007K 0 part
├─sda2 8:2 0 1G 0 part
└─sda3 8:3 0 118.2G 0 part
sdb 8:16 0 119.2G 0 disk
├─sdb1 8:17 0 1007K 0 part
├─sdb2 8:18 0 1G 0 part
└─sdb3 8:19 0 118.2G 0 part
sdc 8:32 0 1.8T 0 disk
sdd 8:48 0 1.8T 0 disk
sde 8:64 1 0B 0 disk
sdf 8:80 1 0B 0 disk
sdg 8:96 1 0B 0 disk

root@pbs:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 2.08G 227G 96K /rpool
rpool/ROOT 2.07G 227G 96K /rpool/ROOT
rpool/ROOT/pbs-1 2.07G 227G 2.07G /
root@pbs:~#

floh8 · Jun 6, 2024

and zfs rpool status

ces · Jun 6, 2024

root@pbs:~# zpool status
pool: rpool
state: ONLINE
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
ata-550_S3_493502504835027-part3 ONLINE 0 0 0
ata-550_S3_493502504835020-part3 ONLINE 0 0 0

errors: No known data errors

root@pbs:~# zpool get cachefile
NAME PROPERTY VALUE SOURCE
rpool cachefile - default

ces · Jun 7, 2024

Über systemctl | grep zfs fand ich zwar einen Fehler, dieser bezieht sich jedoch auf das im GUI gelöschte Datenpool(test2):

root@pbs:~# systemctl | grep zfs
zfs-import-cache.service loaded active exited Import ZFS pools by cache file
● zfs-import@test2.service loaded failed failed Import ZFS pool test2
zfs-mount.service loaded active exited Mount ZFS filesystems
zfs-share.service loaded active exited ZFS file system shares
zfs-volume-wait.service loaded active exited Wait for ZFS Volume (zvol) links in /dev
zfs-zed.service loaded active running ZFS Event Daemon (zed)
system-zfs\x2dimport.slice loaded active active Slice /system/zfs-import
zfs-import.target loaded active active ZFS pool import target
zfs-volumes.target loaded active active ZFS volumes are ready
zfs.target loaded active active ZFS startup target

fabian · Jun 7, 2024

was sagt denn "lsblk" und "zpool import" (das importiert nicht, sondern zeigt nur was importierbar waere) in der initramfs shell?

Search

Search

PBS fails to boot when simulating failure of a disk of zfs OS mirror.

ces

New Member

floh8

Renowned Member

ces

New Member

floh8

Renowned Member

ces

New Member

floh8

Renowned Member

ces

New Member

ces

New Member

fabian

Proxmox Staff Member

We value your privacy