Proxmox VE 8.1 released!

Is there an explanation somewhere of why this was changed and what the implications are?

(I think the default was 128K before?)
No, PVE uses the ZFS defaults. Those were 8K for volblocksize which is now 16K (which affects zvols, so VMs). 128K is the default recordsize (which only affects datasets, so LXCs).

Blocksizes are always a difficult topic and you have to benchmark your specific workload+hardware+pool layout yourself to get the most optimized results. Benefit of 16K volblocksize would be better compression ratios and less metadata overhead. Downside is that IO that is smaller than 16K wil be worse, so not great when running something like a PostgreSQL DB.

Biggest benefit for me personsally is, that all those people running 3-disk raidz1 or 4-disk raidz2 will stop asking why their disks are always full ;)
 
Last edited:
Biggest benefit for me personsally is, that all those people running 3-disk raidz1 or 4-disk raidz2 will stop asking why their disks are always full ;)
Not really, because it will not change the volblocksize retroactively ;) Just kidding.

Is there a reason why Linux VMs like Ubuntu use 512 instead of 4k? Or is that also adjusted with this Update?
 
512 is what spinning rust used historically (as sector size), so it's baked into a lot of things.
 
I get that, but is that due to Ubuntu deciding that or Proxmox?
Or to ask differently, what is the sector size on a bare metal Ubuntu?
My guess is that it is Ubuntus decision and not Proxmox, because Windows is 4k on both, bare metal and VM.

Sorry for that off-topic question, I could not find any good resource online on why Linux does not use 4k as default.
 
Last edited:
this is starting to get off topic ;)

anyhow, you have to differentiate between
- physical block size (512/4k for spinning rust, a lot more for flash)
- logical block size
- "block size" used by the file system (usually 1k for Linux, but can be higher if adapted via parameters)

having a smaller size higher up in the stack means write amplification (you think you are writing a small chunk, but the disk has to read and write a big chunk). having a bigger size higher up in the stack can improve this, but potentially waste space (something needs to do smart aggregation, else writing stuff smaller than the block size means a lot of "dead" space). but usually there is lots of layers and factors at play - especially once things like redundancy/parity/compression/.. are involved.
 
  • Like
Reactions: IsThisThingOn
this is starting to get off topic ;)

anyhow, you have to differentiate between
- physical block size (512/4k for spinning rust, a lot more for flash)
- logical block size
- "block size" used by the file system (usually 1k for Linux, but can be higher if adapted via parameters)

having a smaller size higher up in the stack means write amplification (you think you are writing a small chunk, but the disk has to read and write a big chunk). having a bigger size higher up in the stack can improve this, but potentially waste space (something needs to do smart aggregation, else writing stuff smaller than the block size means a lot of "dead" space). but usually there is lots of layers and factors at play - especially once things like redundancy/parity/compression/.. are involved.
And whats about the sector size of the QEMU disk? This defaulted to 512B/512B logical/physical if I'm not wrong, unless you force it to use 4K which is only possible by directly editing the VM config files and setting aomething like args: -global scsi-hd.logical_block_size=4096,scsi-hd.physical_block_size=4096.

Does this actually matter when using 4K filesystems writing to that 512B virtual disk? In theory it should but I once benchmarked it and wasn't seeing a noticable difference.
 
Thanks, these are the questions I was looking for :)
This defaulted to 512B/512B logical/physical if I'm not wrong, unless you force it to use 4K which is only possible by directly editing the VM config files and setting aomething like
For Windows VMs it is 4k, so I assume it is not QEMU but the VM setting the size? But that is just a wild guess.
 
Last edited:
Thanks, these the question I was looking for :)

For Windows VMs it is 4k, so I assume it is not QEMU but the VM setting the size? But that is just a wild guess.
If it's the VM (guest OS) setting the block size it wants to work with, my thougth is that it shouldn't be messed with. Optimizing Proxmox not to get in the way (e.g., by avoiding write amplification) is one thing, but I don't think we should be expected to tinker that deeply with the guest OS without a specific use case, particularly if that's how it would behave bare metal.

Or am I missing something?
 
  • Like
Reactions: IsThisThingOn
Thank you for the volblock clarifications.

One last question, now that we have 16k as a default, what happens if my old 8k host goes belly up and I need to restore backups from a NFS share?
Will it use the new 16k default?
 
Hi,
Thank you for the volblock clarifications.

One last question, now that we have 16k as a default, what happens if my old 8k host goes belly up and I need to restore backups from a NFS share?
Will it use the new 16k default?
yes, or whatever value you have set in the storage configuration, because during restore, new zvols are created.
 
  • Like
Reactions: IsThisThingOn
Excellent! I was worried that I am unable to restore the VMs due to a mismatch!
Guess it is time to recreate my Proxmox host and clear out ZFS fragmentation :)
 
Excellent! I was worried that I am unable to restore the VMs due to a mismatch!
Guess it is time to recreate my Proxmox host and clear out ZFS fragmentation :)
During restore, you can select any target storage you like ;) The backup does not contain the zvol structure/metadata, just the actual data.
 
  • Like
Reactions: IsThisThingOn
No, PVE uses the ZFS defaults. Those were 8K for volblocksize which is now 16K (which affects zvols, so VMs). 128K is the default recordsize (which only affects datasets, so LXCs).

Blocksizes are always a difficult topic and you have to benchmark your specific workload+hardware+pool layout yourself to get the most optimized results. Benefit of 16K volblocksize would be better compression ratios and less metadata overhead. Downside is that IO that is smaller than 16K wil be worse, so not great when running something like a PostgreSQL DB.

Biggest benefit for me personsally is, that all those people running 3-disk raidz1 or 4-disk raidz2 will stop asking why their disks are always full ;)
Is there a guide somewhere re: when we should override these defaults, pointed a newbies and or practical applications specifically?

I know database storage in MariaDB, for example, wants a 16K dataset, and that's easy enough to do.

I realize that the correct answer is "tune according to your specific workflow," but for those of use just starting out, we don't always have the experience to do that, and end up tuning based on whatever random ZFS optimization guides we find that may or may not be out of date. ;)

For VMs specifically, are there any good rules of thumb for when to override the default volblocksize? I'm not as worried about recordsize, as my understanding is that is a maximum value, and Proxmox (or QEMU? ZFS itself) tunes it somehow? I'm still learning about that.

At some point I got it in my head that on SSD storage with ashift 12, Linux VMs should live on a volblocksize of 64k, but now I can't remember where I actually picked that up or what the justification was. :p
 
Is there a guide somewhere re: when we should override these defaults, pointed a newbies and or practical applications specifically?
I don't think this is needed in most cases, since 16k is pretty good default. Still I agree that some refresh or more noob friendly doc would help.

I know database storage in MariaDB, for example, wants a 16K dataset, and that's easy enough to do.
Isn't lz4 compression enabled by default? MariaDB probably performs better without compression, but will use way more storage.

I'm not as worried about recordsize, as my understanding is that is a maximum value
That is definitely true. Recordsize applies dataset, so stuff like ISOs, and is not fixed. Volblocksize applies to zvol and stuff like VM disks.

At some point I got it in my head that on SSD storage with ashift 12, Linux VMs should live on a volblocksize of 64k, but now I can't remember where I actually picked that up or what the justification was. :p
Probably because most disks work best with ashift 12 and 64k works great with RAIDZ storage efficiency and compression. But I would argue that this is only a good setting if you know you don't have a lot of small writes. A DB on a 64k will fragment like hell and have read and write amplification. 16k is a saver bet, worst thing that could happen is you loose a little bit performance and compression.
 
  • Like
Reactions: SInisterPisces
Isn't lz4 compression enabled by default? MariaDB probably performs better without compression, but will use way more storage.
It is. I doubt the performance hit is that meaningful in a home server environment, but if I have problems I'll try turning it off. Once I move the dbStore to a bigger storage (it's on a shared 1 TB pool), I'll definitely turn it off.

Probably because most disks work best with ashift 12 and 64k works great with RAIDZ storage efficiency and compression. But I would argue that this is only a good setting if you know you don't have a lot of small writes. A DB on a 64k will fragment like hell and have read and write amplification. 16k is a saver bet, worst thing that could happen is you loose a little bit performance and compression.

My VM boot disks live on a ZFS mirror (two disks). Is 64k still a benefit there?

My database VM has the OS installed on a 64k virtual disk (Debian server), and another virtual disk mounted that lives on 16k storage where the actual database files live. I got to learn how to incorporate storages with different block sizes into the same VM. :p
 
I am no expert, so take everything I say with a huge grain of salt :)

I doubt the performance hit is that meaningful
I could even imagine it being a performance benefit. Maybe thanks to compression, more data fits into ARC?

My VM boot disks live on a ZFS mirror (two disks). Is 64k still a benefit there?
Yes and no. Depends on what factor we are looking at.
- Compression gains are there.
- I don't really get the IO gains, because I don't understand how that ZFS and ext4 of the VM play together. But my guess is, if the VM wants to read a 64k file, on a 64k volblocksize it has to read only one block and on a 16k volblocksize it has to read 4 blocks. Does that save some IO effects?
- Wasted storage due to pool geometry is not a problem with mirrors. A RAIDZ1 with 4 drives can jump from 66% storage efficiency under 16k to 72% under 64k. You don't get that reduced disadvantage with mirror, because the problem never existed for mirror :)
I got to learn how to incorporate storages with different block sizes into the same VM.
Biggest problem I see currently, is that you can't restore such config. At least last time I checked.
 
  • Like
Reactions: SInisterPisces
For VMs specifically, are there any good rules of thumb for when to override the default volblocksize?
With the old 8K default, in case you use a raidz1/2/3 to store VMs, you always have to increase the volblocksize.
With the new 16K default, using a raidz1/2/3 you only have to increase it once you use more than 3 disks. ;)

The great blog article is gone, but there is still the table of Matt Ahrens breaking down capacity loss based on volblocksize, raidz type and number of disks: https://docs.google.com/spreadsheets/d/1tf4qx1aMJp8Lo_R6gpT689wTjHv6CGVElrPqTA0w_ZY/

Keep in mind that the table is using "block size in sectors" and not "volblocksize". So you have to multiply the "block size in sectors" by 512B for ashift=9, by 4K for ashift=12, by 8K for ashift=13 and so on to get the corresponding volblocksize.
 
Last edited:
I do not want to be annoying, but is this still planned or not anymore?
Yes, it got a bit more delayed than we initially expected, I'm afraid. Partially due to holidays and a few bugs here and there, from upstream projects and our side (like some kernel HW issues like aacraid one, so nothing all to big), that some liked to have the fix included in the ISO refresh.

But we now finally made a cut (as there's always something coming up) and after QA reported no regressions, an updated ISO is now finally available to download via our CDN or as torrent, see the respective links over at our website.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!