How to reclaim space in ZFS sparse volume?

carsten2

Well-Known Member
Mar 25, 2017
249
20
58
55
I have a Windows 2003 VM in Proxmox 4.4, which resides on a ZVOL. The disk is 80GB but Windows uses only 30GB. How can I reclaim the storage on the ZVOL?

sdelete inside the VM zeros out the free space but this seems to be no reclaimed by ZFS.
 
ZVOL is thin provisioned. I used live move disk in proxmox which internally uses drive-mirror and after that zfs list shows that the whole 80GB are used. When I use offline move disk, zfs list shows only 30GB of space used.

In VMVWARE there was a "shrink disk", which zeros the unused space from within the VM and then releases all zero-space in the sparse disk.

How can this be done in Proxmox? Once the blocks has been written to in ZVOL, the blocks seem not to get freed, even then they are zeroed out from inside the VM afterwards. This seems to be similar to sparce files: if you write zeros to parts of the file, these zeros take up space until to issue a "fallocate --dig-holes" command. Is there a "ZVOL dig-holes" command?

So how can I shrink a VM like in VMWARE?
 
Last edited:
If I add another thin ZVOL hard disk to a VM it consumes almost no memory. However, if I clear the space on that harddisk with zeros (e.g. with sdelete) it consumes the whole size of the disk, even though, only zeros are stored. How to reclaum this space in the ZVOL?
 
Same problem with virtio-scsi and Windows 2008 R2. Zeroed disk soace doesn't get deallocated in ZVOL.
Is there a ZFS command or other tool which forces zero blocks to be deallocated in the ZVOL?

With qow2 format, I cound use qemu-img to compress the file. Is there no possibililty with ZVOLs? If so, that using qow2 files seems to be better than using ZVOLs.
 
Using any other OS than Windows works flawlessly here so I guess this is a Windows only problem. If the OS does not behave as the standard expects then you are left to your own device. Apparently the scsi unmap command gets blocked in Windows. Maybe you will have better luck with 2012 or a newer version of the qemu guest tools?
 
If I use no VM but do it on the proxmox server shell itsself I can get the same results.

zfs create -V 10G data1/test
zfs list => data1/test 10.3G 4.54T 64K
dd if=/dev/zero of=/dev/zvol/data1/test
zfs list => data1/test => data1/test 10.3G 4.54T 10.3G

How can I get rid of the 10GB of zeros in the ZVOL?
 
Same effect when I use -s option. The zeroes do not go away. How to reclaim these?

Also even without -s the allocated size is less then the ZVOL size, as can be seen in the 64K refer value after create compared to the 10G refer value after writing data to it .
 
You seem confused as to how zfs operates. Once you write to a block it becomes zfs's job to remember that data. It doesn't care if it is a zero or a one as long as that doesn't change outside of a write operation. By writing to the entire disk you have filled the thin provisioned device from zfs prospective. If you did this and it is not the desired condition, move the disk to qcow and use compress as you have already alluded to. Move it back when you are done (or not). I suspect you will find that writing zeros to any thin provisioned disk (regardless of technology used) the outcome will be the same.
 
I do understand how ZFS thin-provisioning works. Its is pretty much the same, as it works in VMWARE ESXI, Workstation or the QCOW2 format. However, with the latter, there is also a command to free the unused space occupied by zeros to free space after you delete lots of files in the VM and zeroed the free space.

A similar thing is in Linux sparse files, where you can dig holes into an existing file by "fallocate --digholes myfile.xyz". This command scans the file for zeroes ranges and deallocates these range from the file. When these ranges are read again, there contain "virtual zeros" instead of "physical zeroes". This way, the file is logical the same, but occupies only space for the non-zero part in the file system.

Also with SSDs, many operating systems have a "deallocate" command called TRIM, which where the OS deallocates the space of unused file system blocks from the SSD to support the garbage collection and space management of the SSD.

I would expect that there is also a similar tool or command in ZFS, where it is possible to deallocate zeroed blocks in a ZVOL. Otherwise the whole concept of thin-provisioning in ZFS is almost useless, as sooner are later all the sectors get written and convert the ZVOL in a thick-provisioned ZVOL over time. So thin-provisioning makes sense only, if there is a means to deallocate unused space from the thin-provionioned ZVOLs regulary and this also must be possible in real time. Technically there is no problem to do this, because no client can distinguish between "virtuel zeroes" and "physical zeroes".

The ugly workaround to convert first to qcow2 and than back to zvol is very slow, produces large offline times and is unusable in a production environment.

So: I am pretty sure, that there IS such a means to deallocate physical storage from a ZVOL, the question is only, how.
 
ZFS respects SATA TRIM and SCSI unmap commands so when the OS marks a block deleted in will be released in the ZVOL. So all you need to do is to use a block driver which supports TRIM or unmap. In the case of Proxmox this means using the virtio-scsi driver.
 
I don't think that you really understand how thin provisioning works with zvols. The only difference between a thin-provisioned zvol and a regular one is whether the full size is reserved:
Code:
# zfs create -V 10G fastzfs/test_full
# zfs create -V 10G -s fastzfs/test_sparse
# zfs list -o name,used,usedbydataset,usedbyrefreservation,logicalused,logicalreferenced,refreservation fastzfs/test_sparse fastzfs/test_full
NAME                  USED  USEDDS  USEDREFRESERV  LUSED  LREFER  REFRESERV
fastzfs/test_full    10.3G     64K          10.3G    30K     30K      10.3G
fastzfs/test_sparse    64K     64K              0    30K     30K       none

it does not change the handling of zeroes at all:
Code:
# dd if=/dev/zero of=/dev/zvol/fastzfs/test_full
dd: writing to ‘/dev/zvol/fastzfs/test_full’: No space left on device
20971521+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 23.3931 s, 459 MB/s
# dd if=/dev/zero of=/dev/zvol/fastzfs/test_sparse
dd: writing to ‘/dev/zvol/fastzfs/test_sparse’: No space left on device
20971521+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 30.8197 s, 348 MB/s

# zfs list -o name,used,usedbydataset,usedbyrefreservation,logicalused,logicalreferenced,refreservation fastzfs/test_sparse fastzfs/test_full
NAME                  USED  USEDDS  USEDREFRESERV  LUSED  LREFER  REFRESERV
fastzfs/test_full    10.3G   10.1G           241M  10.0G   10.0G      10.3G
fastzfs/test_sparse  10.1G   10.1G              0  10.0G   10.0G       none

what does change the handling of zeroes is enabling compression (which is on by default in PVE):
Code:
# zfs create -V 10G fastzfs/test_full
# zfs set compress=on fastzfs/test_full
# zfs create -V 10G -s fastzfs/test_sparse
# zfs set compress=on fastzfs/test_sparse
# zfs list -o name,used,usedbydataset,usedbyrefreservation,logicalused,logicalreferenced,refreservation fastzfs/test_sparse fastzfs/test_full
NAME                  USED  USEDDS  USEDREFRESERV  LUSED  LREFER  REFRESERV
fastzfs/test_full    10.3G     64K          10.3G    30K     30K      10.3G
fastzfs/test_sparse    64K     64K              0    30K     30K       none
# dd if=/dev/zero of=/dev/zvol/fastzfs/test_full
dd: writing to ‘/dev/zvol/fastzfs/test_full’: No space left on device
20971521+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 21.7149 s, 494 MB/s
1.41s user 20.12s system 99% cpu 21.719s total
# dd if=/dev/zero of=/dev/zvol/fastzfs/test_sparsedd: writing to ‘/dev/zvol/fastzfs/test_sparse’: No space left on device
20971521+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 21.1612 s, 507 MB/s
1.24s user 19.40s system 97% cpu 21.165s total
# zfs list -o name,used,usedbydataset,usedbyrefreservation,logicalused,logicalreferenced,refreservation fastzfs/test_sparse fastzfs/test_full
NAME                  USED  USEDDS  USEDREFRESERV  LUSED  LREFER  REFRESERV
fastzfs/test_full    10.3G     64K          10.3G    30K     30K      10.3G
fastzfs/test_sparse    64K     64K              0    30K     30K       none

writing other highly-compressible data (32MB of 'a' in this case ;)) shows a similar effect, also irregardless of whether the volume is sparse or not:
Code:
# zfs create -V 10G fastzfs/test_full
# zfs set compress=on fastzfs/test_full
# zfs create -V 10G -s fastzfs/test_sparse
# zfs set compress=on fastzfs/test_sparse
# zfs list -o name,used,usedbydataset,usedbyrefreservation,logicalused,logicalreferenced,refreservation fastzfs/test_sparse fastzfs/test_full
NAME                  USED  USEDDS  USEDREFRESERV  LUSED  LREFER  REFRESERV
fastzfs/test_full    10.3G     64K          10.3G    30K     30K      10.3G
fastzfs/test_sparse    64K     64K              0    30K     30K       none
# aaaaas=a; for i in {0..24}; do aaaaas="$aaaaas$aaaaas"; done; echo $aaaaas > /dev/zvol/fastzfs/test_sparse
# aaaaas=a; for i in {0..24}; do aaaaas="$aaaaas$aaaaas"; done; echo $aaaaas > /dev/zvol/fastzfs/test_full
nora# zfs list -o name,used,usedbydataset,usedbyrefreservation,logicalused,logicalreferenced,refreservation fastzfs/test_sparse fastzfs/test_full
NAME                  USED  USEDDS  USEDREFRESERV  LUSED  LREFER  REFRESERV
fastzfs/test_full    10.3G    336K          10.3G  32.2M   32.2M      10.3G
fastzfs/test_sparse   336K    336K              0  32.2M   32.2M       none

or with 2GB of 'a':
Code:
# zfs list -o name,used,usedbydataset,usedbyrefreservation,logicalused,logicalreferenced,refreservation fastzfs/test_sparse fastzfs/test_full
NAME                  USED  USEDDS  USEDREFRESERV  LUSED  LREFER  REFRESERV
fastzfs/test_full    10.3G   16.2M          10.3G  2.01G   2.01G      10.3G
fastzfs/test_sparse  16.2M   16.2M              0  2.01G   2.01G       none

IMHO there are two possibilities:
  • your VM is not trimming/discarding, AND you either don't have compression enabled or your VM is not zero-ing out the blocks
  • your vm is not trimming properly (fstrim/discard works just fine on formatted zvols, with and without compression, with and without reserveration. windows may do some funky other stuff instead of a regular trim?).
please also note that (in general) discard/fstrim is not the same as writing zeros.. and also note that ZFS (on Linux) does not support actually issuing TRIM/UNMAP to the underlying vdevs, but it looks like it will be merged soonish: https://github.com/zfsonlinux/zfs/pull/5925 https://github.com/openzfs/openzfs/pull/172
 
It is a little strange that although I explained the problem, people keep stating, that they suppose I didn't understand ZFS-(thin-provisioning). I just want to repeat that I DID fully understand ZFS in this respect. Unfortunately they didn't get the problem.

To repeat (assume compression disabled):
1) A new thin-provosioned ZVOL occupies (almost) no real space in ZFS. When reading from such a ZVOL, ZFS delivers zeros for every non-existing block, i.e. it delivers "virtual zeros" for "virtual blocks".

2) Only by writing to the ZVOL blocks get allocated. Further ZVOL (unfortunately) makes no difference between bllocks filled with real data and blocks filled with zeros, so even filling a ZVOL with only zeros allocated the full size of the ZVOL from the pool. From a filesystem and application view however the ZVOL didn't change, because it contains only zeroes, which are "virtual zeros" before filling the volume and "stored zeroes" after filling the volume with zeros.

3) In effect to much space is consumed by the ZVOL filled with zeroes, destroying the thing-provisining effect. This is pretty much the same with qcow2, VMDK VMs or files on filesystems with sparse file support. Therefore for each of these formats there is a utility, which convert "stored zeros" occupying real space and to "virtual zeroes" occupying no space.
a) For VMDK it is "Consolidate Free space", or "Clean up disk" or "vmkfstools -K" or"
esxcli storage vmfs unmap -l iscsi"
b) For QEMU it is "qemu-img convert -O"
c) For linux files it os "fallocate --dig-holes"

d) Also for many storage systems, which have the same problem, there is a utility, which converts "stored zeroes" to "virtual zeroes". E.g."EMC StorReclaim".

e) Some storage systems, like HP 3PAR, have a "zero-detect" feature, which directly unmaps blocks filled with zeroes.

Now to the initial questions:
Which is the analogon the the commands in 3)a) to 3)e for ZFS?

What is really astonishing to me is, that it seems, that there is no such command, which is really a pity, because this makes thin-provisioned ZVOLs worse than qcow2 and VMDK formats in resepect respect to space reclamation. The only workaround in ZFS is to use compression=on, so zeroed blocks get compressed to almost zero, but compression also has its negative points.

With SCSI UMAP there is a solution but this needs guest VM support.

While there seems to be no means deallocating the zero blocks in place there IS a solution if you copy ZVOL with "dd --sparse". This only copies non-zero parts of the ZVOL so the target ZVOL will have no "stored zeroes". The big disadvantage however is, that this method is not live and also required lots of temporary space copying the ZVOL.

It should be possible to write a "iscsi-imap" tool which scans a ZVOL and UMAPS the zero blocks, but I was not able to find a free tool (esxi from VMWARE is such a tool but only works on ESXI), and also this tool could not work online.

Until ZFS gets an utility like "zfs storereclaim" (like EMC) or a "zero-detect"-option, ZFS compression is the only means to have zeros space, at least almost (by compression), deallocated.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!