ZFS and over provisioning?

Dunuin

Distinguished Member
Jun 30, 2020
14,793
4,630
258
Germany
Hi,

When creating the boot discs I was able to set the size so I was able to keep 20% of the SSD unused for over provisioning.
How to do that with other ZFS pools?

I tried to find any hints on how to partitioning the disks myself but wasn't able to find anything.
I found out that I could use "zpool create POOLNAME mirror /dev/sda1 /dev/sdb1" to create a ZFS pool inside partitions instead of the whole drive but found no infos on how to format and partition the SSDs for ZFS.
I see that ZoL is creating 2 partitions. Partition 1 for ZFS and a small partition 9 for solaris compatibility which isn't really needed otherwise. And "zpool create" seems not to offer any optional parameters to define the desired size.
 
able to keep 20% of the SSD unused for over provisioning.
How does some free space at the end of the disc help you with over provisioning?

What is usually meant with over provisioning is that you you can have VM disks which when summed up are larger than the underlying storage. This works as long as they don't all use all their space. Once you hit the size of the underlying storage you will be in trouble.

It is a property of the underlying storage if it allows for that. ZFS, LVM-Thin, Ceph an qcow2 files can do it.


Regarding the whole partitioning thing with ZFS: if you give ZFS the full disk, e.g. /dev/sdc, it usually will create 2 partitions on it as you saw. Partiton 9 is there to have some wiggle room should you need to replace a disk. It is possible that disks of the same nominal size might not be 100% the same size if it a different model/vendor.

If you want to only use a smaller part of the disk for ZFS you will have to partition it yourself and then when creating the zpool tell it which partition, e.g. /dev/sdc2. You can partition disks quite easily with the parted/icode] CLI utility.
 
How does some free space at the end of the disc help you with over provisioning?

After doing an secure erase the whole SSD should be free from user data. If I am only partitioning 80% of the free space, the remaining 20% space will keep unused and the controller of the SSD will use this free space for internal operations like garbage collection, bad block mapping and wear-leveling. This way I should be able to increase the flash capacity reserve for replacing bad blocks and the SSD should wear down less.


I also thought about setting up a host protected area but hdparm which could do that doesn't work with NVMe drives.
In windows I could use the tools from the vendors to setup over provisioning but it needs to alter the last partition for that so I think they are just shrinking the size of the last partition to keep some unused space on the drive.
 
ah okay, that's what you mean. Well, as I said, you can create partitions yourself and use those to create the zpool. But honestly, if you fear that your SSDS won't last, check out the TBW or DWPD values in the specs of your SSDs to get an idea how long they will probably last. Additionally you can keep and eye on the 'Wear Out' or 'Wear leveling count' in the SMART values.
 
But honestly, if you fear that your SSDS won't last, check out the TBW or DWPD values in the specs of your SSDs to get an idea how long they will probably last. Additionally you can keep and eye on the 'Wear Out' or 'Wear leveling count' in the SMART values.
I will do that.

Well, as I said, you can create partitions yourself and use those to create the zpool.
Yes, but I wasn't sure what partition types to choose and what options to use. There is no explanation how to partition them yourself with fdisk for example so they are fine to be used with "zpool create".

Edit:
I looked at the source code how the ProxmoxVE installer partitions the disks with space left but I'm not sure what it es exactly doing.
It is using sgdisk to partition the disk but I don't know which partition type was used:

Code:
sub partition_bootable_disk {
    my ($target_dev, $maxhdsizegb, $ptype) = @_;

    die "too dangerous" if $opt_testmode;

    die "unknown partition type '$ptype'"
    if !($ptype eq '8E00' || $ptype eq '8300' || $ptype eq 'BF01');

    my $hdsize = hd_size($target_dev); # size in KB (1024 bytes)

    my $restricted_hdsize_mb = 0; # 0 ==> end of partition
    if ($maxhdsizegb) {
    my $maxhdsize = $maxhdsizegb * 1024 * 1024;
    if ($maxhdsize < $hdsize) {
        $hdsize = $maxhdsize;
        $restricted_hdsize_mb = int($hdsize/1024) . 'M';
    }
    }

    my $hdgb = int($hdsize/(1024*1024));
    die "hardisk '$target_dev' too small (${hdgb}GB)\n" if $hdgb < 8;

    syscmd("sgdisk -Z ${target_dev}");

    # 1 - BIOS boot partition (Grub Stage2): first free 1M
    # 2 - EFI ESP: next free 512M
    # 3 - OS/Data partition: rest, up to $maxhdsize in MB

    my $grubbootdev = get_partition_dev($target_dev, 1);
    my $efibootdev = get_partition_dev($target_dev, 2);
    my $osdev = get_partition_dev ($target_dev, 3);

    my $pcmd = ['sgdisk'];

    my $pnum = 2;
    push @$pcmd, "-n${pnum}:1M:+512M", "-t$pnum:EF00";

    $pnum = 3;
    push @$pcmd, "-n${pnum}:513M:${restricted_hdsize_mb}", "-t$pnum:$ptype";

    push @$pcmd, $target_dev;

    my $os_size = $hdsize - 513*1024; # 512M efi + 1M bios_boot + 1M alignment

    syscmd($pcmd) == 0 ||
    die "unable to partition harddisk '${target_dev}'\n";

    my $blocksize = logical_blocksize($target_dev);

    if ($blocksize != 4096) {
    $pnum = 1;
    $pcmd = ['sgdisk', '-a1', "-n$pnum:34:2047", "-t$pnum:EF02" , $target_dev];

    syscmd($pcmd) == 0 ||
        die "unable to create bios_boot partition '${target_dev}'\n";
    }

    &$udevadm_trigger_block();

    foreach my $part ($efibootdev, $osdev) {
    syscmd("dd if=/dev/zero of=$part bs=1M count=256") if -b $part;
    }

    return ($os_size, $osdev, $efibootdev);
}
 
Last edited:
Yes, but I wasn't sure what partition types to choose and what options to use. There is no explanation how to partition them yourself with fdisk for example so they are fine to be used with "zpool create".

That is because you normally don't partition ZFS. ZFS works best if it uses and manages the whole space itself. If you want to partition it yourself, just do the create not not on the whole disk, but the partition. Partition types are for Windows users, in Linux, you don't technically need them.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!