Windows VM shows MUCH less space used than ZFS pool its on

eexodus

Active Member
Jan 25, 2017
17
0
41
33
I created a Windows Server 2019 VM with a 120GB C: drive. After the VM was created I added a 60TB drive and formatted it with NTFS. This 60TB drive is a raw file on a ZFS pool. The issue I'm having is that even though in Windows the D: drive is showing 11TB used and 49TB free when I look at the zpool in Proxmox's web UI it is showing 38TB allocated and 58 TB free. There are no other VM disks on the zpool. Why is there such a large discrepancy? lz4 compression is enabled on the zpool but this doesn't explain the discrepancy because it also shows the compression ratio as 1.00x; in other words 0% saved via compression, as expected with a single raw file.

This is a RAIDZ2 zpool of 12x 8TB drives. It shows in Proxmox web UI:
Size: 96.02TB, Free: 58.06TB, Allocated 37.96%, Fragmented: 0%

When I run zfs list -r zpool I get:
zpool/vm-100-disk-0, Used: 26.3T, Avail: 38.1T, Refer: 219K

I'm not sure which numbers to trust.
 
Last edited:
That is totally normal. You loose most of your storage due to bad padding because you are probably using the default 8K volblocksize and didn't increased it to atleast 40K. Look at this table. With 8K volblocksize you are loosing 67% of your raw storage due to parity and padding. With a volblocksize of 40K only 17%. As long as you don't change the blocksize of your pool (Datacenter -> Storage -> YourZFSPool -> Edit -> Blocksize) as well as destroy and recreate all virtual disks (restore from backup would do this too) everything will be around 300% in size. Volblockisze of a zvol can only be set at creation.
 
Following your linked chart it shows 33%. I am familiar with ZFS as a file server but not familiar with it as VM storage. I am not familiar with volblock size, but I guess its different than the 128K record size ZFS defaults to? My data is backed up so I typically use RAIDZ2 or RAIDZ1 for the most capacity vs. mirrors. If you were in my shoes what would you do if 99% of the data being stored on my 12-disk zpool were 2.6 GB video files? This is for a security camera recording storage.

I have a few more questions, where are you getting 300% from?

Why is ZFS in the Proxmox web UI reporting 96TB from a 12 disk 8TB RAIDZ2 zpool? Shouldn't it show what the zfs command shows: 26.3TB + 38.1TB?
 
Following your linked chart it shows 33%.
No, you read the table wrong. Rows are volblocksize in sectors. If you got a disk with 512B LBA and have chosen ashift=9 for the pool the 8K volblocksize is the row with "16" (8K volblocksize / 512B sectorsize = 16 sectors). But your pool was probably created with a ashift=12 so your sectors are 4K each. So 8K volblocksize / 4K sectorsize = 2 sectors. So you are now using a volblocksize of 2 sectors = 67% raw storage lost.
I am familiar with ZFS as a file server but not familiar with it as VM storage. I am not familiar with volblock size, but I guess its different than the 128K record size ZFS defaults to?
Datasets are using the recordsize and ignore the volblocksize. Zvols are using volblocksize and ignore the recordsize. Recordsize defaults to 128K and volblocksize to 8K.
My data is backed up so I typically use RAIDZ2 or RAIDZ1 for the most capacity vs. mirrors. If you were in my shoes what would you do if 99% of the data being stored on my 12-disk zpool were 2.6 GB video files? This is for a security camera recording storage.
If you just store big videos a bigger volblocksize and raidz shouldn't be that problematic. But you will still get some overhead because your VM is reading/writing everything as 512B blocks (512B LBA is the default for a virtio SCSI controller) from/to a storage that should use a volblocksize of atleast 40K.
I have a few more questions, where are you getting 300% from?
If you got 12 drives and use raidz2 you will loose 2 drives for parity data. So in theory 10 drives should be usable for data. So you loose 2/12 or 17% of the raw storage to parity. ZFS will tell you that 10/12 of your raw storage can be used. But that isn't true in reality. Because of the too low volblocksize there is alot of padding overhead. This should be at the moment around 50% of your raw storage or 6 drives. So of your 12 drives in reality only 4 drives or 33% of your raw storage can be effectivly used. 300% because for every 1TB of data you write to the pool additional 0.5TB will be lost for parity and additional 1.5TB due to padding overhead. You can take the parity out of the calculation because your pool size already accounted that and is only showing the space of 10 drives. But you still got the padding overhead of 1.5TB per 1TB of data written, so if you write 1TB of data, 2.5TB (+ parity) will be stored on the pool.
Why is ZFS in the Proxmox web UI reporting 96TB from a 12 disk 8TB RAIDZ2 zpool? Shouldn't it show what the zfs command shows: 26.3TB + 38.1TB?
That depends what you use to look at the pool. If you use the zpool command the size of the pool size is reported as raw storage so 12*8TB = 96TB. If you use the zfs command the pool size is reported with already subtracted space for parity, so 10*8TB=80TB.
 
Last edited:
OK thank you. It sounds like for RAIDZ2 with 12 disks that ashift=9 is a reasonable choice I should go with.

I have one more question though, if volblocksize is also part of the equation why does no one seem to recommend reducing it from the default 8K to something smaller? Performance?
 
OK thank you. It sounds like for RAIDZ2 with 12 disks that ashift=9 is a reasonable choice I should go with.
You probably can't do that. Your overhead will explode if you try to use ashift=9 if the physical drive isn't build for 512B LBA. I would bet your 8TB HDD are using 4K LBA so you can't go lower than ashift=12 and are forced to use a sectorsize of 4K or higher.
I have one more question though, if volblocksize is also part of the equation why does no one seem to recommend reducing it from the default 8K to something smaller? Performance?
Most people are using ashift 12 or 13 for the pool. So the volblocksize can't be lowered more than to 4K or 8K. And lower it too much and you get alot of padding overhead.
 
So I should stick to mirrored pairs so there's no parity or maybe RAIDZ1? Or if I have the option to format the presented volume as something other than 512B in Windows. Or just give up on ZFS and pass the HBA to Windows and let it handle it via Windows Storage, although I'd hate that.
 
Like I already said, increase the volblocksize to something like 40K and recreate the zvols (or create a backup and restore it) and you should only loose 17% of the raw capacity to parity instead of 67% (17% parity+50%padding) of the raw capacity.
And this has nothing to do with 512B in Win or with parity. Your problem is the bad padding because of a too small volblocksize. Switching to raidz1 wouldn't help because you get the same padding problem.
 
  • Like
Reactions: eexodus
Yes I was just reviewing our conversation and I think I understand now. With mirrored pairs I'd lose 50% to parity so unless performance was necessary (its not) I'd be better served going with RAIDZ2 with 40K. Thanks so much for taking the time to explain everything. Your patience is really appreciated.
 
Like I already said, increase the volblocksize to something like 40K and recreate the zvols (or create a backup and restore it) and you should only loose 17% of the raw capacity to parity instead of 67% (17% parity+50%padding) of the raw capacity.
And this has nothing to do with 512B in Win or with parity. Your problem is the bad padding because of a too small volblocksize. Switching to raidz1 wouldn't help because you get the same padding problem.
Actually, no luck, because setting the block size to 40k then recreating the hard disks in the VM results in the error:
Code:
zfs error: cannot create 'bpool/vm-100-disk-0': 'volblocksize' must be power of 2 from 512B to 1M
 
The next best with 2^x would be 64K with 18% or 16K with 24% raw capacity lost..
 
Last edited:
The next best with 2^x would be 64K with 18% or 16K with 24% raw capacity lost..
Yes I see that the chart is just theoretical now and only the bottom portion is directly relevant.

Would there be any downsides to just using 256 / 1M volblocksize to get that extra 1% at 17% loss? At least in my use case I don't think there would be because I am not running a database or something with tiny less than 1MB files. All files will be 2GB+ videos.

Or maybe it is better to just be safer and do 64 / 256K volblocksize since 1% isn't a big deal.
 
Last edited:
I think I would go with the 64K for less write amplification if you really only use that pool as video storage. You also might consider switching virtio SCSI from 512B to 4K blocks for even less overhead. That can be done by adding the line args: -global scsi-hd.physical_block_size=4k to the VMs config file.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!