ZFS Block Size recommendation for IO optimisation?

Mar 7, 2022
19
1
1
39
What is the recommended "Block Size" for ZFS utilizing Windows Server 2019/2022 Clients (KVM)??

The standard for NTFS Blocksize is 4K, Guidance for Exchange and SQl-Server is 64K. I don't know what the Guidance for File-Servers is. This is 2019 and maybe 2022 servers.


I have 8 NVME's running in ZFS based Raid10.

Bash:
zpool status
  pool: PRX01-ZFSRaid10
 state: ONLINE
config:

        NAME                               STATE     READ WRITE CKSUM
        PRX01-ZFSRaid10                    ONLINE       0     0     0
          mirror-0                         ONLINE       0     0     0
            nvme-WUS4BB038D7P3E3_A05746A1  ONLINE       0     0     0
            nvme-WUS4BB038D7P3E3_A05746C9  ONLINE       0     0     0
          mirror-1                         ONLINE       0     0     0
            nvme-WUS4BB038D7P3E3_A067AA2C  ONLINE       0     0     0
            nvme-WUS4BB038D7P3E3_A057469C  ONLINE       0     0     0
          mirror-2                         ONLINE       0     0     0
            nvme-WUS4BB038D7P3E3_A05746C7  ONLINE       0     0     0
            nvme-WUS4BB038D7P3E3_A0557E82  ONLINE       0     0     0
          mirror-3                         ONLINE       0     0     0
            nvme-WUS4BB038D7P3E3_A05746B0  ONLINE       0     0     0
            nvme-WUS4BB038D7P3E3_A05746D6  ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
config:

        NAME                                                  STATE     READ WRITE CKSUM
        rpool                                                 ONLINE       0     0     0
          mirror-0                                            ONLINE       0     0     0
            ata-INTEL_SSDSC2KG240G8_BTYG026209JP240AGN-part3  ONLINE       0     0     0
            ata-INTEL_SSDSC2KG240G8_PHYG0064006A240AGN-part3  ONLINE       0     0     0

The NVME's Sector size is 4K


1. Question: If i set NTFS Blocksize to 64K, should i set ZFS Blocksize to
  • 16K ? (4K Sector Size * 4 Raid-0 Mirrors )
  • 64K ? (same as NTFS Blocksize, which gets written as 4x 16K chunks to the NVME's and and split into 4 sectors on the NVME)
My inkling is, that the 64K option is correct.

2. Is there any guidance for Windows File-Servers (think couple megs per excel file and the odd iso) and logservers like e.g. grafana ?
 
Yeah bump the volblocksize to 64k to match NTFS, that will also give you better compression as well vs the stock 8k.

Make sure ashift is 4k (ashift=12).
 
Last edited:
Yeah bump the volblocksize to 64k to match NTFS, that will also give you better compression as well vs the stock 8k.
Just keep in mind that PVE will use the same volblocksize for all zvols you create or restore from a backup. If you are sure that all workloads are using a 64K blocksize or higher that should be fine. But performance will be horrible as soon as some workload wants to read/write 4K to 32K blocks. So using something like posgres with its 8K blocksize, MySQL with 16K blocksize and most linux filesystems with 4K blocksize should be terrible slow.
By the way...I tried to install a Win10/Win11 with a 64K clustersize for the system partition...that is really a pain because the Windows Installer always uses a 4K clustersize with no GUI to change that. But maybe that might be easier with an Windows Server 2019/2022 and custom install ISOs.
Make sure ashift is 4k (ashift=9).
Ashift=12 would be 4k. Make sure not to use "ashift=9" unless you use HDDs with a physical/logical sector size of 512B/512B.
 
  • Like
Reactions: IsThisThingOn
2. Is there any guidance for Windows File-Servers (think couple megs per excel file and the odd iso) and logservers like e.g. grafana ?
That depends on the guest settings. There is no "best way" as a rule of thumb. Determine your guest blocksize, create the zvol with the correct volblocksize and align the partitions in your guest on this boundary and use the aforedetermined blocksize. Test if the performance is good.
 
Just keep in mind that PVE will use the same volblocksize for all zvols you create or restore from a backup. If you are sure that all workloads are using a 64K blocksize or higher that should be fine. But performance will be horrible as soon as some workload wants to read/write 4K to 32K blocks. So using something like posgres with its 8K blocksize, MySQL with 16K blocksize and most linux filesystems with 4K blocksize should be terrible slow.
By the way...I tried to install a Win10/Win11 with a 64K clustersize for the system partition...that is really a pain because the Windows Installer always uses a 4K clustersize with no GUI to change that. But maybe that might be easier with an Windows Server 2019/2022 and custom install ISOs.

Ashift=12 would be 4k. Make sure not to use "ashift=9" unless you use HDDs with a physical/logical sector size of 512B/512B.
Edited the post thanks for correcting my mistake.
 
So using something like posgres with its 8K blocksize, MySQL with 16K blocksize and most linux filesystems with 4K blocksize should be terrible slow. [...]
That depends on the guest settings. There is no "best way" as a rule of thumb. Determine your guest blocksize, create the zvol with the correct volblocksize and align the partitions in your guest on this boundary and use the aforedetermined blocksize. Test if the performance is good.

That leaves the following question: As far as performance is concerned: Would you rather

Option 1: use 4 disks per zfsRaid10 pool (1x 8k and 1x 64k pool)
Option 2: use NVME-CLI to create seperate namespaces for the 16K pool and 64K pool (and use them for seperate zfsRaid10's that each have 8 name spaces )
Option 3: Stick all 8 NVME's into a single zfsRaid10 Pool and create separate Storage Objects under the Datacenter > Storage View with 16K, 32K and 64K Blocksizes

The way i understand it is Blocksize / #Disks >= 4K

So if i use 4 Disks (that leaves a stripe of 2 Disks, that would accomodate >=8K)
If i use 8 Disks (that leaves a stripe of 4 Disks, that would accomodate >=16k Blocksize)

My workloads (as far as space requierments are concerned) are 30% Exchange, 50% SQL and Large Files, 10% general office Fileserver 5% "Drive C" and TMP files and 5% random linux-software (mostly monitoring,timeseries and log aggregation)


By the way...I tried to install a Win10/Win11 with a 64K clustersize for the system partition...that is really a pain because the Windows Installer always uses a 4K clustersize with no GUI to change that. But maybe that might be easier with an Windows Server 2019/2022 and custom install ISOs.
[...]
You need to do it on install using Diskpart.

basically open the command promt on the installer (afaik it is Shift +F10)

Bash:
diskpart
list disk
select disk #
list partition
select partition #
format fs=ntfs unit=<ClusterSize>
 
Last edited:
  • Like
Reactions: werter
That leaves the following question: As far as performance is concerned: Would you rather

Option 1: use 4 disks per zfsRaid10 pool (1x 8k and 1x 64k pool)
Option 2: use NVME-CLI to create seperate namespaces for the 16K pool and 64K pool (and use them for seperate zfsRaid10's that each have 8 name spaces )
Option 3: Stick all 8 NVME's into a single zfsRaid10 Pool and create separate Storage Objects under the Datacenter > Storage View with 16K, 32K and 64K Blocksizes

The way i understand it is Blocksize / #Disks >= 4K

So if i use 4 Disks (that leaves a stripe of 2 Disks, that would accomodate >=8K)
If i use 8 Disks (that leaves a stripe of 4 Disks, that would accomodate >=16k Blocksize)

My workloads (as far as space requierments are concerned) are 30% Exchange, 50% SQL and Large Files, 10% general office Fileserver 5% "Drive C" and TMP files and 5% random linux-software (mostly monitoring,timeseries and log aggregation)



You need to do it on install using Diskpart.

basically open the command promt on the installer (afaik it is Shift +F10)

Bash:
diskpart
list disk
select disk #
list partition
select partition #
format fs=ntfs unit=<ClusterSize>
Hey @Wolff What did you end up doing?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!