Morning Proxmoxers!
I have a serious problem and am at my wits end as to how to resolve: -
I have an 18 node cluster each setup using 2 x sata SSD in ZFS raid1 for the OS & various storage appliances deployed using ISCSI and LVM thick provisioning for shared storage for cluster
Nodes 1-7 were setup last year using the proxmox 8.1 ISO - all the problem nodes (8-18) were installed this year using proxmox 8.3 & then 8.4 ISO
Nodes 1-7 have no issues - newer nodes 8-18 all keep having this issue:-
After a few reboots (random amount) multipathing seems to somehow corrupt my zfs pool by relabelling the disks resulting in the below:
PRODUCTION [root@prox18-dc01 ~]# zpool status -v
pool: rpool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
scan: scrub repaired 0B in 00:01:30 with 0 errors on Sun May 11 00:25:31 2025
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
35111111012345679-part3 ONLINE 0 0 0
5587448289148697325 UNAVAIL 0 0 0 was /dev/sdb3
errors: No known data errors
I have tried zpool replacing the disk i have tried force removing the failed disk formatting and then zpool attach/add the refreshed disk -with no success
Im making an assumption that the issue relates to a bug in the current proxmox/zfs integration as this isnt affecting nodes that were setup last year using an older version of proxmox and i assume and older version of ZFS (my understanding is that when you create a zfs pool, the pool will be created with that version of ZFS - even if u upgrade the ZFS version existing pool will remain on the same version it was created on unless u specifically upgrade the pool).
Has anyone seen this issue before and does anyone have any idea how to resolve without needing to reinstall the host?
If I cant find a solution im going to need to revert to using hardware raid like its 2001
Looking forward to hearing from you all.
I have a serious problem and am at my wits end as to how to resolve: -
I have an 18 node cluster each setup using 2 x sata SSD in ZFS raid1 for the OS & various storage appliances deployed using ISCSI and LVM thick provisioning for shared storage for cluster
Nodes 1-7 were setup last year using the proxmox 8.1 ISO - all the problem nodes (8-18) were installed this year using proxmox 8.3 & then 8.4 ISO
Nodes 1-7 have no issues - newer nodes 8-18 all keep having this issue:-
After a few reboots (random amount) multipathing seems to somehow corrupt my zfs pool by relabelling the disks resulting in the below:
PRODUCTION [root@prox18-dc01 ~]# zpool status -v
pool: rpool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
scan: scrub repaired 0B in 00:01:30 with 0 errors on Sun May 11 00:25:31 2025
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
35111111012345679-part3 ONLINE 0 0 0
5587448289148697325 UNAVAIL 0 0 0 was /dev/sdb3
errors: No known data errors
I have tried zpool replacing the disk i have tried force removing the failed disk formatting and then zpool attach/add the refreshed disk -with no success
Im making an assumption that the issue relates to a bug in the current proxmox/zfs integration as this isnt affecting nodes that were setup last year using an older version of proxmox and i assume and older version of ZFS (my understanding is that when you create a zfs pool, the pool will be created with that version of ZFS - even if u upgrade the ZFS version existing pool will remain on the same version it was created on unless u specifically upgrade the pool).
Has anyone seen this issue before and does anyone have any idea how to resolve without needing to reinstall the host?
If I cant find a solution im going to need to revert to using hardware raid like its 2001

Looking forward to hearing from you all.
Last edited: