[SOLVED] ZFS: Is "Block Size" Actually "Record Size"?

May 18, 2019
231
15
38
Varies
A commonly cited reference for ZFS states that "The block size is set by the ashift value at time of vdev creation, and is immutable. The recordsize, on the other hand, is individual to each dataset(although it can be inherited from parent datasets), and can be changed at any time you like."

In PVE, Datacenter>Storage lets me change the block size (I just did so on an online zfs pool)

PHCOxeT.png


I'm confused here. Did I just change recordsize and the UI calls it block size or ...?
 
Last edited:
You know ashift is immutable so Proxmox is not invented nothing with it.

edit: This named blocks is more closer to VM filesystem block naming.
 
Last edited:
I think the setting is for the volblocksize, which is in fact a blocksize for a volume (in contrast to the recordsize, which is for a filesystem).
 
  • Like
Reactions: Proxygen
I think the setting is for the volblocksize, which is in fact a blocksize for a volume (in contrast to the recordsize, which is for a filesystem).

Slide 11 on this presentation shows this:

A6sUbwi.png


Last line is "cannot change after creation". This throws me off. But I will go with your assumption that the UI's Block Size is actually volblocksize.

Confirming what you said, here you see that "Zvols have a volblocksize property that is analogous to record size."

After some research I see that messing with the recordsize is hardly ever worth it (unless your workload is a DB (set it to 16k for example), or in a qcow file (set it to 64k)), I will leave it at 128k.

What should I use for volblocksize? Most of the data and read/write operation inside the container is on files that range from 2MB to 8MB. Should I set volblocksize to 2048k? Or should it be 2000k? Or something else?
 
What should I use for volblocksize? Most of the data and read/write operation inside the container is on files that range from 2MB to 8MB. Should I set volblocksize to 2048k? Or should it be 2000k? Or something else


Containers do not have volblocksize. Only VM use this.
 
But,

Maybe you was not pay attention, and you referred at VM. The first thing to think before you chose volbloksize is the zpool design. This blocksize(x) will be write to the pool, so this size must be eventualy split according with the vdevs (for mirror is no split).
For a 3x hdd raidz this volblocksize will be split in 2 equal parts for 2 of data hdd (the 3rd will be for parity). Then the size of x will be splited with ashift value (4k = ashift 12 for example) = y. If y is not multiple of ashift, then you need to write more data (integer of ashift). More problems are when you need to rewrite a block, when you need to read - modify - write (rmw) --> bad iops, bad performance.

So all this must be take in account before decide what value muste use.

The second criteria is what pagesize of appl inside VM is using. DataBases use a fixed value like 16k for mysql, and so on.

But in real life, you will have many application, and it is possibe to use different pagesize. So it is hard to find the optimum value. As a rule of dump, bigger values are better.... Sometimes a solution is to use different zvol vdisk with different volblocksize.
As you see, are so many variables to take in account.

Good luck!
 
  • Like
Reactions: maxprox
Confirming what you said, here you see that "Zvols have a volblocksize property that is analogous to record size."

Yes, I got my information from the manpage, beside what is written "on the internet", the manpage is always your go-to documentation in any unix-based OS.

As a rule of dump, bigger values are better

Yes, especially with a big ashift of 12. If an eight kilobyte block is compressible to 4200 bytes, you will still use two blocks on disk due to the ashift. If you would have an ashift of 9 (preferably native disk support) only 9 blocks of a total of 4352 bytes.
 
yes, blocksize in the storage.cfg is volblocksize for newly created zvols on that storage. it simply adds '-b <blocksize>' to the 'zfs create' command, which is equivalent to '-o volblocksize=<blocksize>'..
 
Will this work if I create a new 64K zfs pool. Move my exisitng VM disks to the new pool. Change the block size for the old pool, and then move them back? if I want to keep the same name of zfs pool and structure.

I am still trying to wrap my head around everything.
Thanks.

Apparently, my default block size was 8k on SSDs and it resulting in high writes on the drives.
 
Will this work if I create a new 64K zfs pool. Move my exisitng VM disks to the new pool. Change the block size for the old pool, and then move them back? if I want to keep the same name of zfs pool and structure.

I am still trying to wrap my head around everything.
Thanks.

Apparently, my default block size was 8k on SSDs and it resulting in high writes on the drives.

yes, that should work for changing volblocksize
 
yes, that should work for changing volblocksize

I was informed that the vblocksize of the zfs pool needs to match the block size of the OS on the VM is that correct?
What would happen if they don't? For example 64k block size zfs pool and something like 4k or 8k ntfs or xfs?
Thanks.
 
I was informed that the vblocksize of the zfs pool needs to match the block size of the OS on the VM is that correct?


No is not correct entirely. The os block size could be any multiple of volblocksize. As a general rule at the higher level to the down like hdd you can have multiply block size comperad with the next down layer.

My english is very bad, .... so if your curent layer is let say 4x=1 os block(os level as a example) and the underlying level (zfs) is 2x is not any problem because zfs can write 2 blocks. Then zfs will write to his underlying layer (disk itself) 2 block(2 x 2x). if this zfs block is multiply then a hdd block(like 1x) then all is ok(on hdd will land 4 x 1 x blocks)

But are many other facts that could change this: zfs compresion , zfs geometry (mirror, raidz, etc)

So thinking that only zfs volblocksize and os block mater is a mistake!

Good luck / Bafta
 
Last edited:
I was informed that the vblocksize of the zfs pool needs to match the block size of the OS on the VM is that correct?
What would happen if they don't? For example 64k block size zfs pool and something like 4k or 8k ntfs or xfs?
Thanks.

you will read - modify - write 64k for every (small) block you want to modify, just like when doing small writes on physical disks where the physical block size is bigger than your I/O size.
 
you will read - modify - write 64k for every (small) block you want to modify, just like when doing small writes on physical disks where the physical block size is bigger than your I/O size.

Is this a good thing? I guess if the files are small, then it would still require 64k to be written.
My goal is to reduce SSD wear. right now at 4k, it seems like my endurance is decreasing quite quickly with the ZFS setup for Proxmox I have.

I'm trying to find out what is the best practice or most beneficial setting for Proxmox running a Windows Server 2019 VM with an SQL Server Express 2008/2012...

Thanks.
 
Is this a good thing? I guess if the files are small, then it would still require 64k to be written.
My goal is to reduce SSD wear. right now at 4k, it seems like my endurance is decreasing quite quickly with the ZFS setup for Proxmox I have.

no, usually it's not a good thing (it's called read or write ampfliciation). if you increase the volblocksize, you usually also tune the OS/FS/.. inside the VM to match that blocksize.

I'm trying to find out what is the best practice or most beneficial setting for Proxmox running a Windows Server 2019 VM with an SQL Server Express 2008/2012...

databases usually want to write many small records. you can try tuning your DB to use 8k blocks and keep volblocksize at 8k. you'll probably find a lot of articles with more specific tuning guidance for your use case.
 
then you should see an improvement with 64k volblock size, if you tune the FS inside the VM for that as well.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!