ZFS Metadata Special Device

Thanks - the first link is the one I have been reading, still a lot to understand about this. ie.

The redundancy of the special device should match the one of the pool, since the special device is a point of failure for the whole pool.
So if my pool is a RAID10 consisting of 3 x mirrors, can my special volume be a single mirror or does it need to be 3 x mirrors?

There is also no info in the Proxmox doco about what happens if the special_small_blocks fill up the SSD. If I set to 4K how do I monitor usage? How do I even decide on a value?
 
There is also no info in the Proxmox doco about what happens if the special_small_blocks fill up the SSD.
Why should there be something in the PVE documentation? PVE is the layer on top of the operating system (Ubuntu Kernel, Debian Userland and additionally ZFS) so you may have to look in the documentation of the building blocks beneth so e.g. into the ZFS documentation.

If the special device vdev is full, the data is written to the data disks of the pool as it would be if you would not have a special device.


If I set to 4K how do I monitor usage?
zpool list -v will list the usage.


How do I even decide on a value?
It depends on what you want to archieve. In my pools with special device, I also have a dataset named SSD with a special_small_blocks value of 0, so that everything is going to SSD. This additional dataset is configured as a storage device in PVE so that I can have harddisk and SSD storage available to VMs from one pool. This is at least one distinction on a value you can do and IMHO one of the best. I just use this for a small boot device for VMs and I also set it for the PVE root dataset and send/receive it once so that everything is on the SSD. This will not eat a lot of space, yet it'll reduce boot times. I can also recommend to use its own dataset for /var/lib/vz if not already present on newer installs.
 
  • Like
Reactions: UdoB
So if my pool is a RAID10 consisting of 3 x mirrors, can my special volume be a single mirror or does it need to be 3 x mirrors?
You mean 3-disk mirrors (so 3 disks per mirror) or three 2-disk mirrors? If it is the latter one a single 2-disk mirror would do the job.
 
It is the latter, 3 x 2 disk mirrors. I am stuck at the moment trying to understand how to lay out the disks. I have 6 x 1.2TB SAS drives (spinning rust) and 2 x 800gb SATA SSD's (enterprise drives):

Code:
root@pve01:~# zpool list -v
NAME                               SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
rpool                             3.98T  42.8G  3.94T        -         -     0%     1%  1.00x    ONLINE  -
  mirror-0                        1.09T  14.1G  1.07T        -         -     0%  1.26%      -    ONLINE
    scsi-35000cca02d9c48d0-part3  1.09T      -      -        -         -      -      -      -    ONLINE
    scsi-35000cca02d9c4af8-part3  1.09T      -      -        -         -      -      -      -    ONLINE
  mirror-1                        1.09T  14.3G  1.07T        -         -     0%  1.28%      -    ONLINE
    scsi-35000cca02d9c4b0c-part3  1.09T      -      -        -         -      -      -      -    ONLINE
    scsi-35000cca02d9cd3c8-part3  1.09T      -      -        -         -      -      -      -    ONLINE
  mirror-2                        1.09T  14.1G  1.07T        -         -     0%  1.26%      -    ONLINE
    scsi-35000cca02d9cc264-part3  1.09T      -      -        -         -      -      -      -    ONLINE
    scsi-35000cca02d9cd9c4-part3  1.09T      -      -        -         -      -      -      -    ONLINE
special                               -      -      -        -         -      -      -      -         -
  mirror-3                         744G   307M   744G        -         -     0%  0.04%      -    ONLINE
    sdg                            745G      -      -        -         -      -      -      -    ONLINE
    sdh                            745G      -      -        -         -      -      -      -    ONLINE


And:

Code:
root@pve01:~# zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
rpool                     42.8G  3.11T   104K  /rpool
rpool/ROOT                2.07G  3.11T    96K  /rpool/ROOT
rpool/ROOT/pve-1          2.07G  3.11T  2.07G  /
rpool/data                34.1G  3.11T    96K  /rpool/data
rpool/data/vm-100-disk-0  1.32G  3.11T  1.32G  -
rpool/data/vm-101-disk-0   104K  3.11T   104K  -
rpool/data/vm-101-disk-1  32.7G  3.11T  32.7G  -
rpool/data/vm-101-disk-2    72K  3.11T    72K  -
rpool/var-lib-vz          6.65G  3.11T  6.65G  /var/lib/vz

My 800gb SATA drives are being used as the special device for the RAID10 array but is this now their only purpose? I can't use them for virtual disk storage? Also, should I be creating ZFS datasets for virtual drives on /rpool and applying special_small_blocks to those datasets only?

The way I had originally laid out the disks were 2 x 800gb drives as a mirror and proxmox installed there, 6 x 1.2TB SAS drives as a RAID10 array with no special device.
 
Last edited:
I can't use them for virtual disk storage? Also, should I be creating ZFS datasets for virtual drives on /rpool and applying special_small_blocks to those datasets only?
Yes, if you want to use them to store virtual disks you would have to create a new dataset via CLI, set the special_small_blocks bigger than your volblocksize and add that it as a new ZFS storage to PVE and move your vdisks there.
 
Why should there be something in the PVE documentation? PVE is the layer on top of the operating system (Ubuntu Kernel, Debian Userland and additionally ZFS) so you may have to look in the documentation of the building blocks beneth so e.g. into the ZFS documentation.

If the special device vdev is full, the data is written to the data disks of the pool as it would be if you would not have a special device.



zpool list -v will list the usage.



It depends on what you want to archieve. In my pools with special device, I also have a dataset named SSD with a special_small_blocks value of 0, so that everything is going to SSD. This additional dataset is configured as a storage device in PVE so that I can have harddisk and SSD storage available to VMs from one pool. This is at least one distinction on a value you can do and IMHO one of the best. I just use this for a small boot device for VMs and I also set it for the PVE root dataset and send/receive it once so that everything is on the SSD. This will not eat a lot of space, yet it'll reduce boot times. I can also recommend to use its own dataset for /var/lib/vz if not already present on newer installs.
How is this working exactly? I created a RAID10 array on spinning SAS drives and added the special device (on enterprise SSD's) with special_small_blocks=0 on a child dataset (pool0/SSD). I created a VM on pool0/SSD but zpool list -v did not show the special device grow by the size of the VM disk.

Am I reading your post correctly?
I also have a dataset named SSD with a special_small_blocks value of 0, so that everything is going to SSD
Posts I am reading suggest setting special_small_blocks to a larger value to ensure any blocks equal to or less than this larger value go to the special device. I can't see how setting it to 0 will force al data to the special device SSD's.

Is zpool list -v showing me the right data?

Documented here: https://forum.proxmox.com/threads/dell-r730-with-ssds-and-sas-drives.153737/post-699712

For now I have rebuilt my proxmox server on a single RAID10 array across 6 x spinning SAS drives and left the SSD special device off the pool until I know what to do with it. I would like to use the special device but I don't want to waste 800GB of SSD storage.

From https://openzfs.github.io/openzfs-docs/man/master/7/zfsprops.7.html (the doco you referred me to):

1725403541231.png
The default size is 0 which means no small file blocks will be allocated in the special class.
 
Last edited:
So I have now verified that zfs set special_small_blocks=0 rpool/SSD does not send VM HDD blocks to the SSD. By setting special_small_blocks=1M rpool/SSD I was able to send any data copied to /rpool/SSD to the SSD's.

Unfortunately adding the rpool/SSD ZFS storage to Proxmox in the Datacenter > Storage field does not allow my to send VM disks sent to this storage to the rpool/SSD special device - it only works if copying files to /rpool/SSD/. ie. the ZFS children datasets that Proxmox creates for the VM disks do not inherit the rpool/SSD special_small_blocks setting.

In fact, when I attempted to add the attribute to the vm-100-disk-1 dataset (as a test) it does not allow it:


Code:
root@pve01:~# zfs get special_small_blocks
NAME                     PROPERTY              VALUE                 SOURCE
rpool                    special_small_blocks  4K                    local
rpool/ROOT               special_small_blocks  4K                    inherited from rpool
rpool/ROOT/pve-1         special_small_blocks  4K                    inherited from rpool
rpool/SSD                special_small_blocks  0                     local
rpool/SSD/vm-100-disk-0  special_small_blocks  -                     -
rpool/SSD/vm-100-disk-1  special_small_blocks  -                     -
rpool/SSD/vm-100-disk-2  special_small_blocks  -                     -
rpool/data               special_small_blocks  4K                    inherited from rpool
rpool/var-lib-vz         special_small_blocks  4K                    inherited from rpool
root@pve01:~#

root@pve01:~# zfs set special_small_blocks=0 rpool/SSD/vm-100-disk-1
cannot set property for 'rpool/SSD/vm-100-disk-1': 'special_small_blocks' does not apply to datasets of this type
root@pve01:~#

Either way... setting special_small_blocks=0 does not do what this threads says it does in relation to sending all blocks to the SSD, it must be set to a higher value ie. in my testing I set to 1M to send any copied blocks to the special device. VM HDD's pointed to the SSD child dataset were not sent to the special device.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!