ZFS pool not showing correct usage

qgrasso

Member
Jul 31, 2013
27
4
23
Queensland, Australia
Hi,

We are running a new install of Proxmox VE 4.3-12

Our configuration is Dual E5-2630v4 w/ 256GB RAM & 8 x Samsung PM863a Drives.

We did our install using the Proxmox 4.3 ISO installer and completed a RAID Z+2 Configuration with our 8 Drives and all seem to go smoothly.

We've found some strange stats regarding disk usage.

We have moved around a number of vm's onto local storage

We have moved 2,740GB of volumes to our host however its showing up as almost twice the amount of storage has been used. We have no snapshots or anything like that being used as well.

The moving was done via live migration from a Ceph cluster to Proxmox local-zfs.

The 2 volumes which are very strange are
rpool/data/vm-425-disk-1 894G 819G 894G -
rpool/data/vm-425-disk-2 866G 819G 866G -

according to the UI, vm-425-disk-1 is only 500G & vm-425-disk-2 is 400G

I'm unsure what to make of it. However you can see below we have used nearly 4TB of storage with only 2740GB of volumes.

Another example
rpool/data/vm-426-disk-1 170G 819G 170G -
In the UI this volume is only 100G

Same with
rpool/data/vm-426-disk-2 175G 819G 175G -
In the UI This volume is only 100G as well.

Please see the attachment for easier formatting, So Any ideas whats going on here?

Thanks,
Quenten

zpool.png
 
What is the output of

# pvesm list local-zfs

Hi Dietmar,

Please see below. Thanks!

pvesm list local-zfs
local-zfs:vm-176-disk-1 raw 193273528320 176
local-zfs:vm-425-disk-1 raw 536870912000 425
local-zfs:vm-425-disk-2 raw 429496729600 425
local-zfs:vm-425-disk-3 raw 107374182400 425
local-zfs:vm-426-disk-1 raw 107374182400 426
local-zfs:vm-426-disk-2 raw 107374182400 426
local-zfs:vm-427-disk-1 raw 429496729600 427
local-zfs:vm-427-disk-2 raw 161061273600 427
local-zfs:vm-428-disk-1 raw 42949672960 428
local-zfs:vm-429-disk-1 raw 85899345920 429
local-zfs:vm-430-disk-1 raw 85899345920 430
local-zfs:vm-435-disk-1 raw 85899345920 435
local-zfs:vm-436-disk-1 raw 128849018880 436
local-zfs:vm-441-disk-1 raw 85899345920 441
local-zfs:vm-441-disk-2 raw 171798691840 441
local-zfs:vm-442-disk-1 raw 182536110080 442
 
could you also post the output of "zpool get all rpool"
 
missed that you are using raidz2, sorry. this is a sideeffect of a bad ashift (pool) and volblocksize (zvol) combination when using raidz. see https://github.com/zfsonlinux/zfs/issues/1807 for an upstream discussion of why this happens.

I think that introducing a config option for ZFS storages on the PVE side that allows setting a default volblocksize might make sense - it basically needs to be set at zvol creation time because it's not changeable once you write to the zvol.. until that hits the repositories, you can manually create new zvols with a higher volblocksize and identical size, and while the VM is offline dd it across, then rename the old one to and the new one and start the VM, assuming you have no snapshots or clones on those zvols. if everything works, you could delete the old zvol

eg:

Code:
# zfs list -o name,used,refer,volsize,volblocksize,written -r testpool
NAME                      USED  REFER  VOLSIZE  VOLBLOCK  WRITTEN
testpool                 11.5G   205K        -         -     205K
testpool/smalltestvol8k  1.08G  1.08G     512M        8K    1.08G

# zfs create -V 512M -s -o volblocksize=16k testpool/smalltestvol16k
# zfs list -o name,used,refer,volsize,volblocksize,written -r testpool
NAME                       USED  REFER  VOLSIZE  VOLBLOCK  WRITTEN
testpool                  1.08G   205K        -         -     205K
testpool/smalltestvol16k   136K   136K     512M       16K     136K
testpool/smalltestvol8k   1.08G  1.08G     512M        8K    1.08G

# #STOP VM HERE

# dd if=/dev/zvol/testpool/smalltestvol8k of=/dev/zvol/testpool/smalltestvol16k bs=16k
32768+0 records in
32768+0 records out
536870912 bytes (537 MB) copied, 8.80659 s, 61.0 MB/s

# zfs list -o name,used,refer,volsize,volblocksize,written -r testpool
NAME                       USED  REFER  VOLSIZE  VOLBLOCK  WRITTEN
testpool                  1.62G   205K        -         -     205K
testpool/smalltestvol16k   550M   550M     512M       16K     550M
testpool/smalltestvol8k   1.08G  1.08G     512M        8K    1.08G

# zfs rename testpool/smalltestvol8k testpool/smalltestvol8k_old
# zfs rename testpool/smalltestvol16k testpool/smalltestvol8k

# zfs list -o name,used,refer,volsize,volblocksize,written -r testpool
NAME                          USED  REFER  VOLSIZE  VOLBLOCK  WRITTEN
testpool                     1.62G   205K        -         -     205K
testpool/smalltestvol8k       550M   550M     512M       16K     550M
testpool/smalltestvol8k_old  1.08G  1.08G     512M        8K    1.08G

# #START VM AGAIN
 
  • Like
Reactions: chrone
Hi Fabian,

Thanks, I thought i'd update this post as it may help someone in the future. After I saw the volblocksize after running "zfs get all rpool/data/vm-426-disk-1".. It got me thinking..

So I did a bit of a search around and it seems ashift=12 is correct for advanced sector 4k disks which is what our pool is configured for at install which was the default.

I also found heaps of posts around this over usage issue of volblocksize when using raidz2 configurations.

In short so far I've found 32k volblocksize seems to be ideal for our 8 Disk SSD raidz2 configurations on 4K advance sector disks when using ZoL/Proxmox.

I also found a page on the proxmox wiki https://pve.proxmox.com/wiki/Storage:_ZFS
Which talks about changing the default blocksize for proxmox for new zvol's as well which I've since modified my /etc/pve/storage.cfg as follows.

zfspool: local-zfs
pool rpool/data
content rootdir,images
blocksize 32k
sparse

I did some experimenting with different block sizes and I found some interesting results as per below.


In Summary, so far it seems 32k volblocksizes give me the best performance and compression using a larger volblocksize does
seem degrade the performance of the benchmarks however the trade off is considerably higher compression ratios (if your data is compressible of course)


Now for some results.

The dataset i'm testing with is a windows 2012 R2 fresh install with crystal mark on the desktop.

Using a 8k volblocksize the default it seems we get the following, on 9.85G of data blows out to 18.7G..
zfs get all -- 8k Blocks.png


Using a 16k volblocksize it seems we get the following, on 9.83G of data blows out to 10.1G which is better but still not ideal.
zfs get all -- 16k Blocks.png

Using a 32k volblocksize it seems we get the following, on 9.81G of data compression is starting to take effect and were only using 8.5G for the same dataset.
zfs get all -- 32k Blocks.png

Using a 64k volblocksize it seems we get the following, on 9.81G of data compression is really getting some results and were now only using 7.71G for the same dataset!.

zfs get all -- 64k Blocks.png

Using a 128k volblocksize it seems we get the following, on 9.81G of data compression is really getting some even better results and were now only using 7.07G for the same dataset.
zfs get all -- 128k Blocks.png

Thanks,
Quenten
 
  • Like
Reactions: chrone
I think that introducing a config option for ZFS storages on the PVE side that allows setting a default volblocksize might make sense - it basically needs to be set at zvol creation time because it's not changeable once you write to the zvol.. until that hits the repositories, you can manually create new zvols with a higher volblocksize and identical size, and while the VM is offline dd it across, then rename the old one to and the new one and start the VM, assuming you have no snapshots or clones on those zvols. if everything works, you could delete the old zvol

Yes, setting the block size for zvol during creation via Web-GUI would be nice. It is planned for v4.5?

Anyway... I thought that the default block size for ZFS is 128k (it can even be set up to 1M). Why the default blocksize for zvol is only 8K?
Does zvol's not support blocksize up to 1M? The tests which qgrasso did shows that a higher zvol-blocksize will reduce the used space (for raidz) so maybe 1M would even more reduce the used space ?!?!
 
Hey bogo22

Based on my testing results to date, higher block sizes reduces overall performance but increase compression ratios.

So larger volblocksizes are only going to help if the data is compressible for example, jpeg images or zip files will most likely have little to no gains from larger volblocksizes other than streaming throughput might be a little better.

Cheers,
Quenten
 
  • Like
Reactions: chrone
zvols have a default of 8k for performance reasons, it's just the mix of raid-z2 and 8k with ashift=12 (and some other, similar combinations) that waste a lot of space. making the volblocksize for allocated zvols configurable should not be too much work - could you file an enhancement request at bugzilla.proxmox.com (for pve-storage)
 
  • Like
Reactions: chrone
So larger volblocksizes are only going to help if the data is compressible for example, jpeg images or zip files will most likely have little to no gains from larger volblocksizes other than streaming throughput might be a little better.
If you are running DBMS only on the zvol a block size of 4k would be the optimum since DBMS read and write in 4k blocks only.
 
zvols have a default of 8k for performance reasons, it's just the mix of raid-z2 and 8k with ashift=12 (and some other, similar combinations) that waste a lot of space. making the volblocksize for allocated zvols configurable should not be too much work - could you file an enhancement request at bugzilla.proxmox.com (for pve-storage)

I just realized we already have such a config option - it's called "blocksize": http://pve.proxmox.com/pve-docs/pve-admin-guide.html#_local_zfs_pool_backend
so a backup/restore after setting this option (or a move disk to another storage and back) should lower your disk usage to the expected value
 
  • Like
Reactions: chrone
I just realized we already have such a config option - it's called "blocksize": http://pve.proxmox.com/pve-docs/pve-admin-guide.html#_local_zfs_pool_backend
so a backup/restore after setting this option (or a move disk to another storage and back) should lower your disk usage to the expected value

I see. Is that blocksize option also available on web-GUI or should i have to setup it manually ? If not yet possible via GUI should i open a bugticket like you said?
 
I see. Is that blocksize option also available on web-GUI or should i have to setup it manually ? If not yet possible via GUI should i open a bugticket like you said?

not available on the GUI, you'd need to to set it in the storage.cfg

I am not sure if we want to enable this on the GUI, but feel free to open an enhancement request if it matters to you.
 
not available on the GUI, you'd need to to set it in the storage.cfg
Just remember this is a global option so you need to apply the new block setting to all disks on this storage. Changing block size for a non empty storage pool could result in data loss.
 
Just remember this is a global option so you need to apply the new block setting to all disks on this storage. Changing block size for a non empty storage pool could result in data loss.

huh? it is a global option for the storage, but only affects newly allocated zvols (and only those created by PVE).. it is not possible to change it for existing zvols (at least after the first byte has been written):
Code:
# zfs set volblocksize=16k testpool/testvol
cannot set property for 'testpool/testvol': 'volblocksize' is readonly

but that point is a bit moot since PVE will never attempt such an operation anyway.

there is no chance of data corruption or loss that I can see? I am not sure why the option is set to "fixed" for the ZFS over iSCSI plugin, but since that code is mostly written by you maybe you can tell ;)
 
  • Like
Reactions: chrone

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!