Installing Proxmox on ZFS with NVME, doesn't work

RafalO

New Member
Mar 18, 2024
7
2
3
Hello everyone
I been testing ZFS on NVME Drives.

Installed proxmox
on 2 NVME drives:
They are SN:
7VQ09JY9 (SLOT M2_1)
7VQ09H02 (SLOT M2_2)

Filesystem: ZFS Raid 1

Reboot to installed proxmox and check:

zpool status -L
pool: rpool
state: ONLINE
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme1n1p3 ONLINE 0 0 0
nvme0n1p3 ONLINE 0 0 0

errors: No known data errors

To explain which disk is which S/N
ls -l /dev/disk/by-id/ | grep nvme
- truncated----
nvme-Seagate_FireCuda_530_ZP1000GM30023_7VQ09H02 -> ../../nvme0n1
nvme-Seagate_FireCuda_530_ZP1000GM30023_7VQ09JY9 -> ../../nvme1n1
---------------

Now lets simulate failure of one drive. Lets say
(7VQ09H02 -> ../../nvme0n1 ) is dead now !
It's not problem got new one:
7VQ09JY9

Replaced disk, booting proxmox.

Checking:
root@smvm:~# zpool status -L
pool: rpool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
config:

NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
nvme0n1p3 ONLINE 0 0 0
4351272091168832311 UNAVAIL 0 0 0 was /dev/disk/by-id/nvme-eui.6479a7823f0042a1-part3

errors: No known data errors
root@smvm:~#

There's new disk (serial 7VQ09HSX):
nvme-Seagate_FireCuda_530_ZP1000GM30023_7VQ09HSX -> ../../nvme1n1
nvme-Seagate_FireCuda_530_ZP1000GM30023_7VQ09HSX_1 -> ../../nvme1n1
nvme-Seagate_FireCuda_530_ZP1000GM30023_7VQ09HSX_1-part1 -> ../../nvme1n1p1
nvme-Seagate_FireCuda_530_ZP1000GM30023_7VQ09HSX-part1 -> ../../nvme1n1p1

We can replace UNAVAIL disk by

zpool replace -f rpool 4351272091168832311 /dev/disk/by-id/nvme-Seagate_FireCuda_530_ZP1000GM30023_7VQ09HSX

and we get:
root@smvm:~# zpool status
pool: rpool
state: ONLINE
scan: resilvered 1.33G in 00:00:02 with 0 errors on Mon Mar 18 14:49:30 2024
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme-eui.6479a7823f004368-part3 ONLINE 0 0 0
nvme-Seagate_FireCuda_530_ZP1000GM30023_7VQ09HSX ONLINE 0 0 0

errors: No known data errors
root@smvm:~#

Seems fine, but problem is:
By default it was using partition3 on disk (7VQ09H02). We replaced it with nvme-Seagate_FireCuda_530_ZP1000GM30023_7VQ09HSX
as it point to physical disk (new disk don't have any partitions, can't point partition 3)
New replaced disk will be working and mirroring data, but won't be same as other, because
it's not bootable, no boot partitions.


How it is supposed to work ?
Do we need first dd if=oldremainingworkingdisk of=newblankdisk ?
And then add partition 3 of new disk as replace to resilver data, so new disk will be bootable ?

What's point installing proxmox on zfs raid disk as it doesn't provide "mirroring" out of box
and causes more problems for everyone who never faced dead dead drive.
I belive many users will get suprised when they get dead nvme disk.
 
  • Like
Reactions: Kingneutron

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!