VM virtual disks and ZFS's "80-90% rule"

F.F.

New Member
Jun 20, 2025
5
1
3
Hi everyone!
I have a Proxmox VE setup with two zpools, a smaller one (1TB) and a bigger one (8TB). I have to define a Windows 11 VM with two virtual disks, one for the OS and one for storage. Can I allocate all the bigger zpool, or should I leave some free space? (the "80-90% rule" of ZFS)
Thanks for your time!
 
Do you have any indicator that this is not recommended anymore?

If performance is completely irrelevant you can go higher. ZFS will not stop hard when reaching that level, but it might slow down. But please, make really, really sure to never reach 99+% - you may have problems to clean up that mess. (There are solutions to keep some headroom, starting from setting some "refquota" on a dummy data set to other "tricks".)

From my own limited perspective the recommendation is still valid. The classic "problem" with copy-on-write did not vanish, right? Perhaps for Solid State it might be a little bit higher than for classic Rotating Rust, but...

...for PVE-workload (with random I/O from several VMs) I try hard to keep a lot of unused space!

(Note also that "80 to 90%" is already high; for databases I've read some articles recommending staying below "50%" ...)
 
Do you have any indicator that this is not recommended anymore?

Well no :) I thought that if you use a zfs mount directly (for example as /home) the 80-90% rule would definitely be valid, but since I'd be using it as a virtual disk, maybe other considerations would have been necessary...
Many thanks for clearing this up!
 
  • Like
Reactions: UdoB
but since I'd be using it as a virtual disk, maybe other considerations would have been necessary...

If you configure -let's say- a 100 GB virtual disk it is not created on disk as a single continuous 100 GB disk space - and stay this way. This behavior is possibly implemented by LVM thick.

ZFS is usually configured sparse. Only actually required space is occupied when the user (the VM) writes some data. Each and every write of some data is a new transaction and will be put "somewhere" on the physical disk layout. (There is some optimization for this, but let's keep it simple.) And this is also true if you would configure the ZFS ZVL not sparse but order it to occupy all space at once. The only thing that changes is the presented number of "free space".

Writing actual data will immediately start to fragment free space as the fundamental concept is copy-on-write. If you had a "thick" and ideally placed 100 GB at the beginning and you modify a small textfile with 1 kB size you'll get a) a new fragment of "volblocksize" (which is 16 kB by default) and b) a now-declared-empty "hole" in the area of the 100 GB space.

Adding features like snapshots, clones, de-duplication, compression etc. it gets really messy very quickly :-)