Proxmox VE 8.1 released!

Dunuin · Jan 17, 2024

SInisterPisces said:
Is there an explanation somewhere of why this was changed and what the implications are?

(I think the default was 128K before?)

No, PVE uses the ZFS defaults. Those were 8K for volblocksize which is now 16K (which affects zvols, so VMs). 128K is the default recordsize (which only affects datasets, so LXCs).

Blocksizes are always a difficult topic and you have to benchmark your specific workload+hardware+pool layout yourself to get the most optimized results. Benefit of 16K volblocksize would be better compression ratios and less metadata overhead. Downside is that IO that is smaller than 16K wil be worse, so not great when running something like a PostgreSQL DB.

Biggest benefit for me personsally is, that all those people running 3-disk raidz1 or 4-disk raidz2 will stop asking why their disks are always full

IsThisThingOn · Jan 23, 2024

Dunuin said:
Biggest benefit for me personsally is, that all those people running 3-disk raidz1 or 4-disk raidz2 will stop asking why their disks are always full

Not really, because it will not change the volblocksize retroactively

Just kidding.

Is there a reason why Linux VMs like Ubuntu use 512 instead of 4k? Or is that also adjusted with this Update?

fabian · Jan 23, 2024

512 is what spinning rust used historically (as sector size), so it's baked into a lot of things.

IsThisThingOn · Jan 23, 2024

I get that, but is that due to Ubuntu deciding that or Proxmox?
Or to ask differently, what is the sector size on a bare metal Ubuntu?
My guess is that it is Ubuntus decision and not Proxmox, because Windows is 4k on both, bare metal and VM.

Sorry for that off-topic question, I could not find any good resource online on why Linux does not use 4k as default.

fabian · Jan 23, 2024

this is starting to get off topic

anyhow, you have to differentiate between
- physical block size (512/4k for spinning rust, a lot more for flash)
- logical block size
- "block size" used by the file system (usually 1k for Linux, but can be higher if adapted via parameters)

having a smaller size higher up in the stack means write amplification (you think you are writing a small chunk, but the disk has to read and write a big chunk). having a bigger size higher up in the stack can improve this, but potentially waste space (something needs to do smart aggregation, else writing stuff smaller than the block size means a lot of "dead" space). but usually there is lots of layers and factors at play - especially once things like redundancy/parity/compression/.. are involved.

Dunuin · Jan 23, 2024

fabian said:
this is starting to get off topic

anyhow, you have to differentiate between
- physical block size (512/4k for spinning rust, a lot more for flash)
- logical block size
- "block size" used by the file system (usually 1k for Linux, but can be higher if adapted via parameters)

having a smaller size higher up in the stack means write amplification (you think you are writing a small chunk, but the disk has to read and write a big chunk). having a bigger size higher up in the stack can improve this, but potentially waste space (something needs to do smart aggregation, else writing stuff smaller than the block size means a lot of "dead" space). but usually there is lots of layers and factors at play - especially once things like redundancy/parity/compression/.. are involved.

And whats about the sector size of the QEMU disk? This defaulted to 512B/512B logical/physical if I'm not wrong, unless you force it to use 4K which is only possible by directly editing the VM config files and setting aomething like args: -global scsi-hd.logical_block_size=4096,scsi-hd.physical_block_size=4096.

Does this actually matter when using 4K filesystems writing to that 512B virtual disk? In theory it should but I once benchmarked it and wasn't seeing a noticable difference.

IsThisThingOn · Jan 23, 2024

Thanks, these are the questions I was looking for

Dunuin said:
This defaulted to 512B/512B logical/physical if I'm not wrong, unless you force it to use 4K which is only possible by directly editing the VM config files and setting aomething like

For Windows VMs it is 4k, so I assume it is not QEMU but the VM setting the size? But that is just a wild guess.

SInisterPisces · Jan 23, 2024

IsThisThingOn said:
Thanks, these the question I was looking for

For Windows VMs it is 4k, so I assume it is not QEMU but the VM setting the size? But that is just a wild guess.

If it's the VM (guest OS) setting the block size it wants to work with, my thougth is that it shouldn't be messed with. Optimizing Proxmox not to get in the way (e.g., by avoiding write amplification) is one thing, but I don't think we should be expected to tinker that deeply with the guest OS without a specific use case, particularly if that's how it would behave bare metal.

Or am I missing something?

Starrbuck · Jan 25, 2024

I just wanted to say I used to be an ESXi guy, but I recently tried Proxmox VE and I L-O-V-E it! So easy to setup /configure and it does everything I throw at it. Thank you for such an awesome product!

Neobin · Feb 1, 2024

t.lamprecht said:
For that purpose we're also in the early stages of refreshing the ISO

I do not want to be annoying, but is this still planned or not anymore?

IsThisThingOn · Feb 6, 2024

Thank you for the volblock clarifications.

One last question, now that we have 16k as a default, what happens if my old 8k host goes belly up and I need to restore backups from a NFS share?
Will it use the new 16k default?

fiona · Feb 6, 2024

Hi,

IsThisThingOn said:
Thank you for the volblock clarifications.

One last question, now that we have 16k as a default, what happens if my old 8k host goes belly up and I need to restore backups from a NFS share?
Will it use the new 16k default?

yes, or whatever value you have set in the storage configuration, because during restore, new zvols are created.

IsThisThingOn · Feb 6, 2024

Excellent! I was worried that I am unable to restore the VMs due to a mismatch!
Guess it is time to recreate my Proxmox host and clear out ZFS fragmentation

fiona · Feb 6, 2024

IsThisThingOn said:
Excellent! I was worried that I am unable to restore the VMs due to a mismatch!
Guess it is time to recreate my Proxmox host and clear out ZFS fragmentation

During restore, you can select any target storage you like

The backup does not contain the zvol structure/metadata, just the actual data.

SInisterPisces · Feb 6, 2024

Dunuin said:
No, PVE uses the ZFS defaults. Those were 8K for volblocksize which is now 16K (which affects zvols, so VMs). 128K is the default recordsize (which only affects datasets, so LXCs).

Blocksizes are always a difficult topic and you have to benchmark your specific workload+hardware+pool layout yourself to get the most optimized results. Benefit of 16K volblocksize would be better compression ratios and less metadata overhead. Downside is that IO that is smaller than 16K wil be worse, so not great when running something like a PostgreSQL DB.

Biggest benefit for me personsally is, that all those people running 3-disk raidz1 or 4-disk raidz2 will stop asking why their disks are always full

Is there a guide somewhere re: when we should override these defaults, pointed a newbies and or practical applications specifically?

I know database storage in MariaDB, for example, wants a 16K dataset, and that's easy enough to do.

I realize that the correct answer is "tune according to your specific workflow," but for those of use just starting out, we don't always have the experience to do that, and end up tuning based on whatever random ZFS optimization guides we find that may or may not be out of date.

For VMs specifically, are there any good rules of thumb for when to override the default volblocksize? I'm not as worried about recordsize, as my understanding is that is a maximum value, and Proxmox (or QEMU? ZFS itself) tunes it somehow? I'm still learning about that.

At some point I got it in my head that on SSD storage with ashift 12, Linux VMs should live on a volblocksize of 64k, but now I can't remember where I actually picked that up or what the justification was.

IsThisThingOn · Feb 6, 2024

SInisterPisces said:
Is there a guide somewhere re: when we should override these defaults, pointed a newbies and or practical applications specifically?

I don't think this is needed in most cases, since 16k is pretty good default. Still I agree that some refresh or more noob friendly doc would help.

SInisterPisces said:
I know database storage in MariaDB, for example, wants a 16K dataset, and that's easy enough to do.

Isn't lz4 compression enabled by default? MariaDB probably performs better without compression, but will use way more storage.

SInisterPisces said:
I'm not as worried about recordsize, as my understanding is that is a maximum value

That is definitely true. Recordsize applies dataset, so stuff like ISOs, and is not fixed. Volblocksize applies to zvol and stuff like VM disks.

SInisterPisces said:
At some point I got it in my head that on SSD storage with ashift 12, Linux VMs should live on a volblocksize of 64k, but now I can't remember where I actually picked that up or what the justification was.

Probably because most disks work best with ashift 12 and 64k works great with RAIDZ storage efficiency and compression. But I would argue that this is only a good setting if you know you don't have a lot of small writes. A DB on a 64k will fragment like hell and have read and write amplification. 16k is a saver bet, worst thing that could happen is you loose a little bit performance and compression.

SInisterPisces · Feb 6, 2024

IsThisThingOn said:
Isn't lz4 compression enabled by default? MariaDB probably performs better without compression, but will use way more storage.

It is. I doubt the performance hit is that meaningful in a home server environment, but if I have problems I'll try turning it off. Once I move the dbStore to a bigger storage (it's on a shared 1 TB pool), I'll definitely turn it off.

Probably because most disks work best with ashift 12 and 64k works great with RAIDZ storage efficiency and compression. But I would argue that this is only a good setting if you know you don't have a lot of small writes. A DB on a 64k will fragment like hell and have read and write amplification. 16k is a saver bet, worst thing that could happen is you loose a little bit performance and compression.

My VM boot disks live on a ZFS mirror (two disks). Is 64k still a benefit there?

My database VM has the OS installed on a 64k virtual disk (Debian server), and another virtual disk mounted that lives on 16k storage where the actual database files live. I got to learn how to incorporate storages with different block sizes into the same VM.

IsThisThingOn · Feb 7, 2024

I am no expert, so take everything I say with a huge grain of salt

SInisterPisces said:
I doubt the performance hit is that meaningful

I could even imagine it being a performance benefit. Maybe thanks to compression, more data fits into ARC?

SInisterPisces said:
My VM boot disks live on a ZFS mirror (two disks). Is 64k still a benefit there?

Yes and no. Depends on what factor we are looking at.
- Compression gains are there.
- I don't really get the IO gains, because I don't understand how that ZFS and ext4 of the VM play together. But my guess is, if the VM wants to read a 64k file, on a 64k volblocksize it has to read only one block and on a 16k volblocksize it has to read 4 blocks. Does that save some IO effects?
- Wasted storage due to pool geometry is not a problem with mirrors. A RAIDZ1 with 4 drives can jump from 66% storage efficiency under 16k to 72% under 64k. You don't get that reduced disadvantage with mirror, because the problem never existed for mirror

SInisterPisces said:
I got to learn how to incorporate storages with different block sizes into the same VM.

Biggest problem I see currently, is that you can't restore such config. At least last time I checked.

Dunuin · Feb 7, 2024

SInisterPisces said:
For VMs specifically, are there any good rules of thumb for when to override the default volblocksize?

With the old 8K default, in case you use a raidz1/2/3 to store VMs, you always have to increase the volblocksize.
With the new 16K default, using a raidz1/2/3 you only have to increase it once you use more than 3 disks.

The great blog article is gone, but there is still the table of Matt Ahrens breaking down capacity loss based on volblocksize, raidz type and number of disks: https://docs.google.com/spreadsheets/d/1tf4qx1aMJp8Lo_R6gpT689wTjHv6CGVElrPqTA0w_ZY/

Keep in mind that the table is using "block size in sectors" and not "volblocksize". So you have to multiply the "block size in sectors" by 512B for ashift=9, by 4K for ashift=12, by 8K for ashift=13 and so on to get the corresponding volblocksize.

t.lamprecht · Feb 7, 2024

Neobin said:
I do not want to be annoying, but is this still planned or not anymore?

Yes, it got a bit more delayed than we initially expected, I'm afraid. Partially due to holidays and a few bugs here and there, from upstream projects and our side (like some kernel HW issues like aacraid one, so nothing all to big), that some liked to have the fix included in the ISO refresh.

But we now finally made a cut (as there's always something coming up) and after QA reported no regressions, an updated ISO is now finally available to download via our CDN or as torrent, see the respective links over at our website.

Proxmox VE 8.1 released!

Distinguished Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Distinguished Member

Member

Active Member

New Member

Famous Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Active Member

Member

Active Member

Member

Distinguished Member

Proxmox Staff Member