Proxmox ZFS Management

TSAN

New Member
Jan 31, 2024
20
2
3
For years I've used VMware ESXi + HW Raid or VSAN, typically with Dell Openmanage / iDRAC to manage HDDs. Well...Broadcom happened.

I'm trying to completely wrap my head around JBOD / HBA ZFS before I even consider using it.
Like fully understanding SCSI IDs locations, disk serial numbers, /dev/sd#, /dev/by-id + all ZFS pool assigned IDs. Want 100% disk documentation when creating host.
Installed lsscsi tool to get IDs matched to /dev/sd# (I'm also hoping to find a backplane hdd blink function - haven't looked into if you know - must integrate into server IPMI?)

Reading about ZFS /dev/disk/by-id is more ideal to use than /dev/sd#.
Per wiki - https://pve.proxmox.com/wiki/ZFS_on_Linux , it says "device" which does work with /dev/sd#

I'm wanting to create ZFS pool + remove/import disks with /dev/disk/by-id

The only /dev/disk/by-id - is the IDE CD-rom. No disks by-id. Why are disks not shown here? Rules / symbolic link issue?

Are the /dev/disk/by-id missing because this is a ESXi nesting Proxmox VM? Proxmox VMs are using VMware Paravirtual disk controller.
VM has 6 disks total. 2 are ZFS mirror default rpool. Other 4 disks I'm playing with ZFS raidz1 + 2 vdev mirror.

---

Other thoughts as I'm learning ZFS. IMO too many fragmented tools to piece together storage picture.
fdisk -l
lsblk
blkid
lsscsi -v
zpool status
zfs list
more and more...
Where is the simple Debian / Proxmox / ZFS function to get a disk's SCSI ID location + disk serial # + /dev/disk/by-id + /dev/sd# + /dev/sd# partitions all in one command?
HW RAID management feels easier but perhaps because its what I've known.

Thanks.
 
Bump. Anybody know why does Proxmox / Debian is not populate /dev/disk/by-id ?
 
Hello, please show me:

Code:
$ lsblk
$ ls -lA /dev/disk/by-id/
$ ls -lA /dev/disk/by-path/

Code:
root@LAB-SMPM-GRUB:~# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda      8:0    0   20G  0 disk
├─sda1   8:1    0 1007K  0 part
├─sda2   8:2    0  512M  0 part
└─sda3   8:3    0 19.5G  0 part
sdb      8:16   0   20G  0 disk
├─sdb1   8:17   0 1007K  0 part
├─sdb2   8:18   0  512M  0 part
└─sdb3   8:19   0 19.5G  0 part
sdc      8:32   0   10G  0 disk
├─sdc1   8:33   0   10G  0 part
└─sdc9   8:41   0    8M  0 part
sdd      8:48   0   10G  0 disk
├─sdd1   8:49   0   10G  0 part
└─sdd9   8:57   0    8M  0 part
sde      8:64   0   10G  0 disk
├─sde1   8:65   0   10G  0 part
└─sde9   8:73   0    8M  0 part
sdf      8:80   0   10G  0 disk
├─sdf1   8:81   0   10G  0 part
└─sdf9   8:89   0    8M  0 part
sdg      8:96   0   10G  0 disk
├─sdg1   8:97   0   10G  0 part
└─sdg9   8:105  0    8M  0 part
sr0     11:0    1 1024M  0 rom 
root@LAB-SMPM-GRUB:~#
root@LAB-SMPM-GRUB:~# ls -lA /dev/disk/by-id/
total 0
lrwxrwxrwx 1 root root 9 Sep 23 13:11 ata-VMware_Virtual_IDE_CDROM_Drive_00000000000000000001 -> ../../sr0
root@LAB-SMPM-GRUB:~#
root@LAB-SMPM-GRUB:~# ls -lA /dev/disk/by-path/
total 0
lrwxrwxrwx 1 root root  9 Sep 23 13:11 pci-0000:00:07.1-ata-1 -> ../../sr0
lrwxrwxrwx 1 root root  9 Sep 23 13:11 pci-0000:00:07.1-ata-1.0 -> ../../sr0
lrwxrwxrwx 1 root root  9 Sep 23 17:27 pci-0000:03:00.0-scsi-0:0:0:0 -> ../../sda
lrwxrwxrwx 1 root root 10 Sep 23 17:27 pci-0000:03:00.0-scsi-0:0:0:0-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Sep 23 17:27 pci-0000:03:00.0-scsi-0:0:0:0-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Sep 23 17:27 pci-0000:03:00.0-scsi-0:0:0:0-part3 -> ../../sda3
lrwxrwxrwx 1 root root  9 Sep 23 13:11 pci-0000:03:00.0-scsi-0:0:1:0 -> ../../sdb
lrwxrwxrwx 1 root root 10 Sep 23 17:27 pci-0000:03:00.0-scsi-0:0:1:0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Sep 23 17:27 pci-0000:03:00.0-scsi-0:0:1:0-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Sep 23 17:27 pci-0000:03:00.0-scsi-0:0:1:0-part3 -> ../../sdb3
lrwxrwxrwx 1 root root  9 Sep 23 15:07 pci-0000:03:00.0-scsi-0:0:2:0 -> ../../sdc
lrwxrwxrwx 1 root root 10 Sep 23 17:27 pci-0000:03:00.0-scsi-0:0:2:0-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Sep 23 17:27 pci-0000:03:00.0-scsi-0:0:2:0-part9 -> ../../sdc9
lrwxrwxrwx 1 root root  9 Sep 23 14:54 pci-0000:03:00.0-scsi-0:0:3:0 -> ../../sdd
lrwxrwxrwx 1 root root 10 Sep 23 17:27 pci-0000:03:00.0-scsi-0:0:3:0-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 Sep 23 17:27 pci-0000:03:00.0-scsi-0:0:3:0-part9 -> ../../sdd9
lrwxrwxrwx 1 root root  9 Sep 23 15:56 pci-0000:03:00.0-scsi-0:0:4:0 -> ../../sde
lrwxrwxrwx 1 root root 10 Sep 23 17:27 pci-0000:03:00.0-scsi-0:0:4:0-part1 -> ../../sde1
lrwxrwxrwx 1 root root 10 Sep 23 17:27 pci-0000:03:00.0-scsi-0:0:4:0-part9 -> ../../sde9
lrwxrwxrwx 1 root root  9 Sep 24 09:14 pci-0000:03:00.0-scsi-0:0:5:0 -> ../../sdf
lrwxrwxrwx 1 root root 10 Sep 24 09:14 pci-0000:03:00.0-scsi-0:0:5:0-part1 -> ../../sdf1
lrwxrwxrwx 1 root root 10 Sep 24 09:14 pci-0000:03:00.0-scsi-0:0:5:0-part9 -> ../../sdf9
lrwxrwxrwx 1 root root  9 Sep 24 08:59 pci-0000:03:00.0-scsi-0:0:6:0 -> ../../sdg
lrwxrwxrwx 1 root root 10 Sep 24 09:08 pci-0000:03:00.0-scsi-0:0:6:0-part1 -> ../../sdg1
lrwxrwxrwx 1 root root 10 Sep 24 08:59 pci-0000:03:00.0-scsi-0:0:6:0-part9 -> ../../sdg9
root@LAB-SMPM-GRUB:~#
root@LAB-SMPM-GRUB:~# lsscsi --scsi_id
[0:0:0:0]    cd/dvd  NECVMWar VMware IDE CDR00 1.00  /dev/sr0   -
[2:0:0:0]    disk    VMware   Virtual disk     2.0   /dev/sda   -
[2:0:1:0]    disk    VMware   Virtual disk     2.0   /dev/sdb   -
[2:0:2:0]    disk    VMware   Virtual disk     2.0   /dev/sdc   -
[2:0:3:0]    disk    VMware   Virtual disk     2.0   /dev/sdd   -
[2:0:4:0]    disk    VMware   Virtual disk     2.0   /dev/sde   -
[2:0:5:0]    disk    VMware   Virtual disk     2.0   /dev/sdf   -
[2:0:6:0]    disk    VMware   Virtual disk     2.0   /dev/sdg   -
 
I've played with moving the HDDs around to different SCSI paths. Don't like what I see.
Existing disks will always change /dev/sd{lettter} to match SCSI port , SCSI port 2 = /dev/sdc , SCSI port 3 = /dev/sdd , etc.

No good IMO
I want ZFS pools to continue to work if I play musical HDDs on backplane. Which is why I want ZFS pools created via disk ID that never changes.
 
It's better to use by-partlabel anyhow.



Depends how they are named, see: /lib/udev/rules.d/60-persistent-storage.rules

See this is where I perceive software RAID as confusing.
Identify by by-partlabel ? Chicken and egg to me. The fresh hdd does not have any partitions .
Once I add a block device to a zfs pool then it creates partions 1 + 9 on disk.
So how does one reference by-partlabel when there isn't one yet?

zpool status references block device /dev/sd(letter} , not zfs partition name.

Thanks for pointing out /lib/udev/rules.d/60-persistent-storage.rules
I'll need to review what is happening here. Perhaps its VMware virtual hdds that have no serial number? They do have a disk identifier.

Code:
root@LAB-SMPM-GRUB:/lib/udev/rules.d# udevadm info --query=all --name=/dev/sdc
P: /devices/pci0000:00/0000:00:15.0/0000:03:00.0/host2/target2:0:2/2:0:2:0/block/sdc
M: sdc
U: block
T: disk
D: b 8:32
N: sdc
L: 0
S: disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0
S: disk/by-diskseq/13
Q: 13
E: DEVPATH=/devices/pci0000:00/0000:00:15.0/0000:03:00.0/host2/target2:0:2/2:0:2:0/block/sdc
E: DEVNAME=/dev/sdc
E: DEVTYPE=disk
E: DISKSEQ=13
E: MAJOR=8
E: MINOR=32
E: SUBSYSTEM=block
E: USEC_INITIALIZED=2897512
E: ID_SCSI=1
E: ID_VENDOR=VMware
E: ID_VENDOR_ENC=VMware\x20\x20
E: ID_MODEL=Virtual_disk
E: ID_MODEL_ENC=Virtual\x20disk\x20\x20\x20\x20
E: ID_REVISION=2.0
E: ID_TYPE=disk
E: ID_BUS=scsi
E: ID_PATH=pci-0000:03:00.0-scsi-0:0:2:0
E: ID_PATH_TAG=pci-0000_03_00_0-scsi-0_0_2_0
E: ID_PART_TABLE_UUID=cf6f1b28-6492-f24e-a098-f5d5a6ddcb69
E: ID_PART_TABLE_TYPE=gpt
E: DEVLINKS=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0 /dev/disk/by-diskseq/13
E: TAGS=:systemd:
E: CURRENT_TAGS=:systemd:

root@LAB-SMPM-GRUB:/lib/udev/rules.d# udevadm info --query=all --name=/dev/sda | grep ID_SERIAL
root@LAB-SMPM-GRUB:/lib/udev/rules.d#

root@LAB-SMPM-GRUB:/lib/udev/rules.d# fdisk /dev/sdc -l
Disk /dev/sdc: 10 GiB, 10737418240 bytes, 20971520 sectors
Disk model: Virtual disk   
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: CF6F1B28-6492-F24E-A098-F5D5A6DDCB69

Device        Start      End  Sectors Size Type
/dev/sdc1      2048 20953087 20951040  10G Solaris /usr & Apple ZFS
/dev/sdc9  20953088 20969471    16384   8M Solaris reserved 1
 
See this is where I perceive software RAID as confusing.
Identify by by-partlabel ? Chicken and egg to me. The fresh hdd does not have any partitions .

Indeed, but ZFS is not really any sort of software RAID. Even the way PVE installs it creates GPT partitions and some fluff around first. So no, it is not really giving the raw block devices to ZFS either.

Once I add a block device to a zfs pool then it creates partions 1 + 9 on disk.

This is PVE's hocus pocus, but it's commonly done like that.

So how does one reference by-partlabel when there isn't one yet?

You can absolutely e.g. gdisk partlabel them and then call the ZFS vdevs by that. You can even do this after the fact and export & re-import the pool with the new nomenclature.

zpool status references block device /dev/sd(letter} , not zfs partition name.

The block devices are GPT partitions. The simply have not been labeled by PVE. It was just a suggestion, I simply had enough experience of udev doing temper tantrums over time.

Thanks for pointing out /lib/udev/rules.d/60-persistent-storage.rules
I'll need to review what is happening here. Perhaps its VMware virtual hdds that have no serial number? They do have a disk identifier.

I really can't tell, it's why I did not add any speculation into my reply.

Code:
root@LAB-SMPM-GRUB:/lib/udev/rules.d# udevadm info --query=all --name=/dev/sdc
P: /devices/pci0000:00/0000:00:15.0/0000:03:00.0/host2/target2:0:2/2:0:2:0/block/sdc
M: sdc
U: block
T: disk
D: b 8:32
N: sdc
L: 0
S: disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0
S: disk/by-diskseq/13
Q: 13
E: DEVPATH=/devices/pci0000:00/0000:00:15.0/0000:03:00.0/host2/target2:0:2/2:0:2:0/block/sdc
E: DEVNAME=/dev/sdc
E: DEVTYPE=disk
E: DISKSEQ=13
E: MAJOR=8
E: MINOR=32
E: SUBSYSTEM=block
E: USEC_INITIALIZED=2897512
E: ID_SCSI=1
E: ID_VENDOR=VMware
E: ID_VENDOR_ENC=VMware\x20\x20
E: ID_MODEL=Virtual_disk
E: ID_MODEL_ENC=Virtual\x20disk\x20\x20\x20\x20
E: ID_REVISION=2.0
E: ID_TYPE=disk
E: ID_BUS=scsi
E: ID_PATH=pci-0000:03:00.0-scsi-0:0:2:0
E: ID_PATH_TAG=pci-0000_03_00_0-scsi-0_0_2_0
E: ID_PART_TABLE_UUID=cf6f1b28-6492-f24e-a098-f5d5a6ddcb69
E: ID_PART_TABLE_TYPE=gpt
E: DEVLINKS=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0 /dev/disk/by-diskseq/13
E: TAGS=:systemd:
E: CURRENT_TAGS=:systemd:

root@LAB-SMPM-GRUB:/lib/udev/rules.d# udevadm info --query=all --name=/dev/sda | grep ID_SERIAL
root@LAB-SMPM-GRUB:/lib/udev/rules.d#

Yes, that's a bummer, I definitely get ID_SERIAL, ID_WWN and if labeled also that there.

Code:
root@LAB-SMPM-GRUB:/lib/udev/rules.d# fdisk /dev/sdc -l
Disk /dev/sdc: 10 GiB, 10737418240 bytes, 20971520 sectors
Disk model: Virtual disk 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: CF6F1B28-6492-F24E-A098-F5D5A6DDCB69

Device        Start      End  Sectors Size Type
/dev/sdc1      2048 20953087 20951040  10G Solaris /usr & Apple ZFS
/dev/sdc9  20953088 20969471    16384   8M Solaris reserved 1

There you go, your 8M fluff, with the actual partition sdc1 that was passed on to ZFS as vdev. You can label it in fdisk as well, it's 'x' in the main menu and then 'n' for partition name, then lsblk -o +PARTLABEL.
 
  • Like
Reactions: TSAN
There you go, your 8M fluff, with the actual partition sdc1 that was passed on to ZFS as vdev. You can label it in fdisk as well, it's 'x' in the main menu and then 'n' for partition name, then lsblk -o +PARTLABEL.

Code:
root@LAB-SMPM-GRUB:/lib/udev/rules.d# lsblk -o +PARTLABEL
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS PARTLABEL
sda      8:0    0   20G  0 disk             
├─sda1   8:1    0 1007K  0 part             
├─sda2   8:2    0  512M  0 part             
└─sda3   8:3    0 19.5G  0 part             
sdb      8:16   0   20G  0 disk             
├─sdb1   8:17   0 1007K  0 part             
├─sdb2   8:18   0  512M  0 part             
└─sdb3   8:19   0 19.5G  0 part             
sdc      8:32   0   10G  0 disk             
├─sdc1   8:33   0   10G  0 part             zfs-c411e52d0b0f60da
└─sdc9   8:41   0    8M  0 part             
sdd      8:48   0   10G  0 disk             
├─sdd1   8:49   0   10G  0 part             zfs-0f3d61d899f09831
└─sdd9   8:57   0    8M  0 part             
sde      8:64   0   10G  0 disk             
├─sde1   8:65   0   10G  0 part             zfs-eea7e33513d301bb
└─sde9   8:73   0    8M  0 part             
sdf      8:80   0   10G  0 disk             
├─sdf1   8:81   0   10G  0 part             zfs-88ead41719280c09
└─sdf9   8:89   0    8M  0 part             
sdg      8:96   0   10G  0 disk             
├─sdg1   8:97   0   10G  0 part             zfs-93d34496ed7060e0
└─sdg9   8:105  0    8M  0 part             
sr0     11:0    1 1024M  0 rom

I changed sdc1 partition label to see if zpool status changed behavior. Still referenced /dev/sdc (no partition) , So I changed it back to original label.

Is their anyway to re-configure pool to update to partition names so zpool status references partition label rather than /dev/sd{letter} ?
And doesn't make sense to me why zpool status doesn't include the partition number /dev/sdc1 - just shows disk - /dev/sdc ?

What I'm wanting to see is the ability to change every disk's SCSI port path and ZFS pool has zero issue remaining intact.
This is why I was originally thought I wanted /dev/disk/by-id
 
Code:
root@LAB-SMPM-GRUB:/lib/udev/rules.d# lsblk -o +PARTLABEL
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS PARTLABEL
sda      8:0    0   20G  0 disk            
├─sda1   8:1    0 1007K  0 part            
├─sda2   8:2    0  512M  0 part            
└─sda3   8:3    0 19.5G  0 part            
sdb      8:16   0   20G  0 disk            
├─sdb1   8:17   0 1007K  0 part            
├─sdb2   8:18   0  512M  0 part            
└─sdb3   8:19   0 19.5G  0 part            
sdc      8:32   0   10G  0 disk            
├─sdc1   8:33   0   10G  0 part             zfs-c411e52d0b0f60da
└─sdc9   8:41   0    8M  0 part            
sdd      8:48   0   10G  0 disk            
├─sdd1   8:49   0   10G  0 part             zfs-0f3d61d899f09831
└─sdd9   8:57   0    8M  0 part            
sde      8:64   0   10G  0 disk            
├─sde1   8:65   0   10G  0 part             zfs-eea7e33513d301bb
└─sde9   8:73   0    8M  0 part            
sdf      8:80   0   10G  0 disk            
├─sdf1   8:81   0   10G  0 part             zfs-88ead41719280c09
└─sdf9   8:89   0    8M  0 part            
sdg      8:96   0   10G  0 disk            
├─sdg1   8:97   0   10G  0 part             zfs-93d34496ed7060e0
└─sdg9   8:105  0    8M  0 part            
sr0     11:0    1 1024M  0 rom

I changed sdc1 partition label to see if zpool status changed behavior. Still referenced /dev/sdc (no partition) , So I changed it back to original label.

It will not, change behaviour because you added names, you literally define the pool by devices, it will keep using them, it's a problem if they get renamed later on, which is why I just prefer partlabels.

Is their anyway to re-configure pool to update to partition names so zpool status references partition label rather than /dev/sd{letter} ?

You can export and import the pool using the new names it will pick it up well.

And doesn't make sense to me why zpool status doesn't include the partition number /dev/sdc1 - just shows disk - /dev/sdc ?

I don't remember this and can't test it now, but it has to do with the default behaviour with the fluff I believe. What I know for sure is that if you created your own pool with a command on those GPT partitions, it will refer to them. It would even work with /dev/mapper/...

The reason I mention this is that ZFS encryption is pretty ... experimental ... and PVE does not cater for it anyways, so should you need that and want to e.g. go with LUKS, you have to create those devs yourself first, then create pool over the crypts. It will work just fine too. Anything goes, as long as it is a blockdev.

What I'm wanting to see is the ability to change every disk's SCSI port path and ZFS pool has zero issue remaining intact.

I believe the partition labels are foolproof, but maybe someone tells me they use WWNs because of some good reason.

This is why I was originally thought I wanted /dev/disk/by-id

It's common way to do it, just your virtualised drives do not seem to generate any.
 
  • Like
Reactions: TSAN
You can export and import the pool using the new names it will pick it up well.

Thank you. Renamed partition 1s on disks to be easier.

zpool export zfs-raid10

zpool import zfs-raid10 -d /dev/disk/by-partlabel/zfs-disk1 -d /dev/disk/by-partlabel/zfs-disk2 -d /dev/disk/by-partlabel/zfs-disk3 -d /dev/disk/by-partlabel/zfs-disk4 -d /dev/disk/by-partlabel/zfs-disk5

All looks good with 2 datasets intact.

Code:
root@LAB-SMPM-GRUB:/dev/disk/by-partlabel# zpool status
  pool: rpool
 state: ONLINE
  scan: resilvered 6.18M in 00:00:00 with 0 errors on Sun Jul 21 01:26:19 2024
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda3    ONLINE       0     0     0
            sdb3    ONLINE       0     0     0

errors: No known data errors

  pool: zfs-raid10
 state: ONLINE
  scan: resilvered 5.18G in 00:00:30 with 0 errors on Tue Sep 24 10:30:05 2024
remove: Removal of vdev 3 copied 76K in 0h0m, completed on Mon Sep 23 15:50:37 2024
        792 memory used for removed device mappings
config:

        NAME           STATE     READ WRITE CKSUM
        zfs-raid10     ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            zfs-disk1  ONLINE       0     0     0
            zfs-disk2  ONLINE       0     0     0
          mirror-4     ONLINE       0     0     0
            zfs-disk3  ONLINE       0     0     0
            zfs-disk4  ONLINE       0     0     0
        spares
          zfs-disk5    AVAIL  

errors: No known data errors

root@LAB-SMPM-GRUB:/dev/disk/by-partlabel# zfs list
NAME                  USED  AVAIL  REFER  MOUNTPOINT
rpool                3.58G  14.8G   104K  /rpool
rpool/ROOT           3.56G  14.8G    96K  /rpool/ROOT
rpool/ROOT/pve-1     3.56G  14.8G  3.56G  /
rpool/data             96K  14.8G    96K  /rpool/data
rpool/var-lib-vz       96K  14.8G    96K  /var/lib/vz
zfs-raid10           10.4G  8.05G   112K  /zfs-raid10
zfs-raid10/ct-data     96K  8.05G    96K  /zfs-raid10/ct-data
zfs-raid10/iso-data  10.4G  8.05G  10.4G  /zfs-raid10/iso-data
zfs-raid10/vm-data     96K  8.05G    96K  /zfs-raid10/vm-data

Mind sharing what is the partition #9 about?
If your using by-partlabel like this do you always first create pool via /dev/sd{letter} for it to auto partition disk or do you do partitioning manually?

Actually not read much about zfs import / export yet.
I do take it that proxmox would not allow it to export (unmount) if a VM or container was running without the force switch? Which is bad mojo if done...
I presently just have 2 ISOs on this pool that are not attached to any VMs. Pool dismounted without force switch.

Thanks for help.

Edit: Missed you shared links about reserved partition.
 
Last edited:
Maybe not good.
Detached the disk containing the zfs-disk4 parition. ZFS acts like pool is OK? uhh...disk definitely not online.
I expected zfs-disk4 to go offline and spare to go from available to online. hmm.

Code:
root@LAB-SMPM-GRUB:/dev/disk/by-partlabel# zpool status
  pool: rpool
 state: ONLINE
  scan: resilvered 6.18M in 00:00:00 with 0 errors on Sun Jul 21 01:26:19 2024
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda3    ONLINE       0     0     0
            sdb3    ONLINE       0     0     0

errors: No known data errors

  pool: zfs-raid10
 state: ONLINE
  scan: resilvered 5.18G in 00:00:30 with 0 errors on Tue Sep 24 10:30:05 2024
remove: Removal of vdev 3 copied 76K in 0h0m, completed on Mon Sep 23 15:50:37 2024
        792 memory used for removed device mappings
config:

        NAME           STATE     READ WRITE CKSUM
        zfs-raid10     ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            zfs-disk1  ONLINE       0     0     0
            zfs-disk2  ONLINE       0     0     0
          mirror-4     ONLINE       0     0     0
            zfs-disk3  ONLINE       0     0     0
            zfs-disk4  ONLINE       0     0     0
        spares
          zfs-disk5    AVAIL   

errors: No known data errors

root@LAB-SMPM-GRUB:/dev/disk/by-partlabel# ls
zfs-disk1  zfs-disk2  zfs-disk3  zfs-disk5
 
If your using by-partlabel like this do you always first create pool via /dev/sd{letter} for it to auto partition disk or do you do partitioning manually?

For full disclosure, I do not have this problem that often as I do not prefer ZFS with hypervisor, but I tried to answer your questions, mostly about udev. :)

Also, my ZFS experience pre-dates PVE, so out of habit, I really would have everything scripted then present it to whatever (PVE) needs it. I am not saying you have to do this, it's just I am one of those people who do not like GUI. (At the end of the day, you can actually check the commands that GUI produces, it's just a wrapper.) Second thing is, I really do not run a spinning disk without LUKS encryption (if it's an SSD, I would want it to be SED drive), so with that I do not really have a choice, I create those crypts first, then create pool over them, also I can skip the weird 8M partitions.

As you discovered, you can add labels later if that's convenient, I would bet most people do use -by-id's just like that. Some might do -uuid's, I do not like those either.

Actually not read much about zfs import / export yet.

It's mostly meant to "prepare" the pool to be put somewhere else. Also, nothing with ZFS is like with normal filesystem, not even mounting. You have /etc/zfs/zpool.cache (you can run strings on it) and that remembers to auto-import it on boot. The pool itself has a marking whether it was exported, so if you try to import it on another machine, it would making a fuss about it (it was mostly put there by Sun so that you do not accidentally import the same pool from more than one system). I think it stores /etc/machine-id somewhere, but I am not sure about this one now (might be another identifier generated on its own).

I do take it that proxmox would not allow it to export (unmount) if a VM or container was running without the force switch? Which is bad mojo if done...

Not sure what GUI allows, but I would expect the command to fail if run while it's being accessed. Imported and mounted are different concepts in ZFS yet again. It's really worth going through some good intro on datasets, zvols, etc. It's not just another normal filesystem.

I presently just have 2 ISOs on this pool that are not attached to any VMs. Pool dismounted without force switch.

So this one does NOT surprise me. It was not in use, at the time, i.e. nothing was accessing it. Think of mounting regular filesystem and if you are not e.g. doing read/writes and are not even in the /mnt/<mountpoint>, it will umount successfully too.

Maybe not good.
Detached the disk containing the zfs-disk4 parition. ZFS acts like pool is OK? uhh...disk definitely not online.
I expected zfs-disk4 to go offline and spare to go from available to online. hmm.

I would too, but not enough experience with spares myself. Spares are something I would be using if I have a pool of 10+ disks. Did it eventually notice it went UNAVAIL, e.g. on further access by VMs?

NB One thing I do remember is that spares do not stand-in permanently, you have to exert some manual effort later on to keep them as replacements! But again I forgot exactly how that worked.
 
Last edited:
it's just I am one of those people who do not like GUI. (At the end of the day, you can actually check the commands that GUI produces, it's just a wrapper.)

I just want to add, before someone else takes me apart, you do have to be careful. ZFS to this day does a lot of things without asking and PVE makes you root by default. So zpool destroy ... no problem, no questions asked. Yeah, it's stupid. :)
 
I would too, but not enough experience with spares myself. Spares are something I would be using if I have a pool of 10+ disks. Did it eventually notice it went UNAVAIL, e.g. on further access by VMs?

NB One thing I do remember is that spares do not stand-in permanently, you have to exert some manual effort later on to keep them as replacements! But again I forgot exactly how that worked.

Agreed about needing to learn more ZFS. I'm just beginning. :)

No partition "disk-4" never went offline. Something ain't right with re-named partitions / pool import / by-partlabel.

Upon rebooting it did go offline yet spare still didn't kick into gear.

I had tested detaching disk when pool build via /dev/sd and drives would immeditely go offline and the spare would immediately re-silver.
And yes the spare would always intent to remain a spare. Would use the zpool replace after reattching disk.

I don't think my by-partlabel is working correctly.
I might spend a moment with udev by-id on this Vmware Proxmox VM. Or I do have few old PE R710s not doing anything I could install Proxmox on to see udev by-id behavior on real host.

Code:
root@LAB-SMPM-GRUB:~# zpool status
  pool: rpool
 state: ONLINE
  scan: resilvered 6.18M in 00:00:00 with 0 errors on Sun Jul 21 01:26:19 2024
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda3    ONLINE       0     0     0
            sdb3    ONLINE       0     0     0

errors: No known data errors

  pool: zfs-raid10
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: resilvered 5.18G in 00:00:30 with 0 errors on Tue Sep 24 10:30:05 2024
remove: Removal of vdev 3 copied 76K in 0h0m, completed on Mon Sep 23 15:50:37 2024
        792 memory used for removed device mappings
config:

        NAME                     STATE     READ WRITE CKSUM
        zfs-raid10               DEGRADED     0     0     0
          mirror-0               ONLINE       0     0     0
            zfs-disk1            ONLINE       0     0     0
            zfs-disk2            ONLINE       0     0     0
          mirror-4               DEGRADED     0     0     0
            zfs-disk3            ONLINE       0     0     0
            2753624725953710428  UNAVAIL      0     0     0  was /dev/disk/by-partlabel/zfs-disk4
        spares
          zfs-disk5              AVAIL
 
Upon rebooting it did go offline yet spare still didn't kick into gear.

I had tested detaching disk when pool build via /dev/sd and drives would immeditely go offline and the spare would immediately re-silver.

That's odd, whether -by-id, or -by-partlabel, those are symlinks after all. I don't remember this behaviour specifically with symlinks from before. Either the device is there or it is not. I will admit I never use /dev/sd* because those can change, in fact sometimes you see someone asking here how to "fix" that after the fact, when they find out.

I don't think my by-partlabel is working correctly.

But I just don't see how you could have caused anything, the symlink is generated just as the others.

I might spend a moment with udev by-id on this Vmware Proxmox VM. Or I do have few old PE R710s not doing anything I could install Proxmox on to see udev by-id behavior on real host.

I think for things like this, you really should try real hardware, but just for figuring out the basics, you are fine virtualised.

Code:
        NAME                     STATE     READ WRITE CKSUM
        zfs-raid10               DEGRADED     0     0     0
          mirror-0               ONLINE       0     0     0
            zfs-disk1            ONLINE       0     0     0
            zfs-disk2            ONLINE       0     0     0
          mirror-4               DEGRADED     0     0     0
            zfs-disk3            ONLINE       0     0     0
            2753624725953710428  UNAVAIL      0     0     0  was /dev/disk/by-partlabel/zfs-disk4
        spares
          zfs-disk5              AVAIL

Maybe check the symlinks, if you labeled what you meant to label, after all. It's definitely strange, this one especially.
 
The pool itself has a marking whether it was exported, so if you try to import it on another machine, it would making a fuss about it (it was mostly put there by Sun so that you do not accidentally import the same pool from more than one system). I think it stores /etc/machine-id somewhere, but I am not sure about this one now (might be another identifier generated on its own).

Just errata: the piece of information it remembers in the pool is /etc/hostid, if you copied it over from one machine to another, it would think it's being imported on the same machine as before.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!