ZFS: Storage space for zvols almost doubles when transferred to raid-z3 pool

Andreas Piening

Well-Known Member
Mar 11, 2017
81
11
48
44
I'm running a ZFS raid-z3 zfs pool on a Proxmox node that serves as a backup host. I create snapshots on different Proxmox hosts with ZFS mirror pools and pull them to the backup node.
I noticed that the same snapshots of zVOLs are occupying almost twice the storage space on the backup node compared to the source server. I have read in several places that this is supposed to be caused by volblocksize in conjunction with the ashift=12 option. But I still don't understand how I can mitigate this or what my best option is.

I'm aware that this issue does not occur with zfs filesystems and backups with PBS are not affected as well. But my question is specifically related on transferring existing zVOLs from an existing ZFS mirror pool to a raid-z3 pool.

Here are my requirements or considerations:
  • I want to maximize on data security, especially fault tolerance. I would like the pool to be still online even if at least two disks are failing at the same time.
  • I'm willing to sacrifice on storage space, that's why I've chosen raid-z3. But only being able to use half of the disk space left after subtracting 3 disks from the array for redundancy is not acceptable to me.
What are my options?
Would switching from raid-z3 to raid-z2 make the situation better? If so, how much?
 
The more disks your pool consists of, the bigger your volblocksize will have to be when using raidz1/2/3, otherwise everything will be bigger because of the padding overhead.
How much disks does your raidz3 consist of?

1680456541548.png
Example:
9 disk raidz3 with ashift=12 would mean you lose 75% of the raw capacity when using the default 8K volblocksize, you lose only 50% when using a 16K or 32K volblocksize and only 34% using a 1M volblocksize.
 
Last edited:
Hi @Dunuin, thank you very much for your detailed and lightning fast response.

I have 10 disks with a capacity of 15 TBs each. In raidz3-0 mode, I can use the capacity equivalent of 7 disks (after subtracting the three redundancy disks). The volblocksize of the zvols in question is 8K which is determined by the fact that the zvols of the VMs on the source node are set up this way.
When I look this up in your spreadsheet (X=10, Y=2 [8K/4K]) I get a percentage of 75% for wasted space. When I look at one example volume, it uses 529GB on the source and 968GB on the raidz3 backup pool, which is even more than 75% of additional occupied space.

Since I can't change the volblocksize to a higher value, since the volumes are replicated as is from the source node, I don't see any chance for me to optimize this.
Your spreadsheet is for raidz3 specifically, correct? Would the storage utilization be more effective with raid2?
 
When I look at one example volume, it uses 529GB on the source and 968GB on the raidz3 backup pool, which is even more than 75% of additional occupied space.
Not just +75% size of your zvols. You lose 75% of your raw capacity, so 75% of those 150TB. 25% of those 150TB is actually usable, 30% lost because of parity and 45% lost because of padding overhead. So in theory your zvols should be +180% in size.

Your spreadsheet is for raidz3 specifically, correct?
Yes
Would the storage utilization be more effective with raid2?
No, all raidz1/raidz2/raidz3 got this padding overhead. A raidz2 won't help.
 
Thanks again @Dunuin, I mixed up numbers. You're in fact correct: I have an overhead of nearly +180% in total.
That's actually much worse than I thought. So much so, that I want to find a way to improve this.

Here are my current takeaways or ideas:
  • Preferring PBS over syncing VMs to a RAIDZ pool for backups wherever possible. This would be much more space efficient. But backups would take longer, especially on big volumes.
  • Considering changing the volblocksize to 16k for new Proxmox installations (can't be changed on existing pools). This would mitigate the waste of space, but I'm unsure about the performance impact on the source system where the actual VMs are running.
  • Preferring LXC containers over VMs when possible: Especially for big amount of data, an LXC container could provide a network share which could be mounted by a VM. This may be possible for file shares.
  • Reconsidering using stripes of mirrors instead of raidz. The wasted space on a 10 disk pool with a stripe over 5 mirrors would be actually less compared to raidz-3 while offering significantly better performance. As a bonus: Such a pool can also be extended by adding additional mirror-vdevs. But this would lower the resilience because in the worst case two failed disks would be enough to destroy the pool.
Are there other / better options?
Comments ans suggestions are welcome.
 
  • Considering changing the volblocksize to 16k for new Proxmox installations (can't be changed on existing pools). This would mitigate the waste of space, but I'm unsure about the performance impact on the source system where the actual VMs are running.
You can change that later. But the volblocksize can only be set at creation of a zvol. A backup and restore for example would destroy the existing VM and create a new one from the backup using the new volblocksize. And yes, increasing the volblocksize can hurt performance, especially when your workload is doing IO smaller than the volblocksize.
  • Reconsidering using stripes of mirrors instead of raidz. The wasted space on a 10 disk pool with a stripe over 5 mirrors would be actually less compared to raidz-3 while offering significantly better performance. As a bonus: Such a pool can also be extended by adding additional mirror-vdevs. But this would lower the resilience because in the worst case two failed disks would be enough to destroy the pool.
If you want good IOPS performance, a small blocksize and better reliability it is also possible to use a stripe of 3-disk-mirrors. With that any 2 of the 3 disks of a mirror might fail and the 66% loss of raw capacity is still better than the 75% you are losing now ;)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!