ZFS space consumption

r.traini-ETIMIA · Jul 1, 2022

Hello,
I’m posting hear because we have some weird information on ours proxmox cluster.

We use 4 identical OVH servers in a cluster. The servers are in use from 10 months. The cluster is composed of 2 “Main” servers and 2 “replications” servers. All the servers have been upgraded from proxmox v6 to v7. The replications append every 2 hours. A daily backup whit Proxmox backup server is made every night at 23h.

We don’t understand why whit the thin provision disabled (it was never activated) the space on ours ZFS storage keep decreasing. Without any interaction, no snapshoot, no vm creation.

We don’t understand to all the information on the space used/left on the ZFS are conflicting. The some of the VM HDD (1.6 To) does not correspond to the Usage (2.5 To) neither to the ZFS Allocated (772 Go). Screen shot of the proxmox web interface joined.

Can you have some hint for why the space is reduced on the ZFS?
Can you tell us what is the remaining space value to consider?

Thanks in advance,
Raphael TRAINI

aaron · Jul 1, 2022

Can you post the outputs of the following commands inside [code][/code] tags?

Code:

zpool status
zfs list

r.traini-ETIMIA · Jul 1, 2022

Thank for your help.

Her the requested informations :

Code:

root@prodcloud02:~# zpool status
  pool: datastore
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:27:22 with 0 errors on Sun Jun 12 00:51:23 2022
config:

        NAME           STATE     READ WRITE CKSUM
        datastore      ONLINE       0     0     0
          raidz1-0     ONLINE       0     0     0
            nvme0n1p4  ONLINE       0     0     0
            nvme1n1p4  ONLINE       0     0     0
            nvme2n1p4  ONLINE       0     0     0

errors: No known data errors

root@prodcloud02:~# zfs list
NAME                         USED  AVAIL     REFER  MOUNTPOINT
datastore                   2.27T  1.05T      139K  /var/lib/vz
datastore/vm-200001-disk-0  4.39G  1.05T     4.37G  -
datastore/vm-200002-disk-0   147G  1.18T     10.8G  -
datastore/vm-200003-disk-0  1.48G  1.05T      117M  -
datastore/vm-200004-disk-0  54.6G  1.09T     11.0G  -
datastore/vm-200005-disk-0  77.8G  1.09T     34.2G  -
datastore/vm-200006-disk-0  56.8G  1.09T     13.1G  -
datastore/vm-200007-disk-0  67.4G  1.09T     23.7G  -
datastore/vm-200008-disk-0  55.1G  1.09T     11.4G  -
datastore/vm-200009-disk-0  53.2G  1.09T     9.55G  -
datastore/vm-200010-disk-0  55.2G  1.09T     11.6G  -
datastore/vm-200011-disk-0  49.5G  1.09T     5.84G  -
datastore/vm-200702-disk-0  1.53T  2.24T      334G  -
datastore/vm-200996-disk-0  46.7G  1.09T     3.04G  -
datastore/vm-200997-disk-0  1.41G  1.05T     45.2M  -
datastore/vm-200998-disk-0  46.1G  1.09T     2.46G  -
datastore/vm-200999-disk-0  46.3G  1.09T     2.59G  -

Best regards,
Raphael TRAINI

leesteken · Jul 3, 2022

This post might be relevant to your raidz-1 pool. @Dunuin is celebrated on this forum for lots of testing and research on ZFS padding (and write amplification and performance).

Dunuin · Jul 3, 2022

First I would run zfs list -o space so see how much of your pool is used up by snapshots and refreservation.
Then you should check what your pools ashift and the volblocksize of your zvols are:
zpool get ashift datastore
zfs get volblocksize

I guess you use defaults, so your ashift is 12 and your volblocksize is 8K. With a raidz1 of 3 disks you loose 33% raw capacity because of parity. But because your volblocksize is too small you loose an additional 17% of your raw capacity due to padding overhead. In other word: Everything you write to a zvol should be 133% in size, because for every 1GB of data there is 333MB of padding overhead that needs to be written too. If you don't want to loose capacity because of padding overhead you would need to increase your volblocksize to atleast 16K. That can be done by setting "WebUI: Datacenter -> Storage -> YourDatastoreZFSStorage -> Edit -> Blocksize: 16K" and then destroying and recreating all your zvols, as the volblocksize can only be set at creation.
Easiest way to destroy and recreate VMs would be to backup+restore or migrate them.

See here for a detailed padding overhead explanation: https://www.delphix.com/blog/delphi...or-how-i-learned-stop-worrying-and-love-raidz

r.traini-ETIMIA · Jul 4, 2022

Thanks for your respond.

Her the requested informations :

Bash:

root@prodcloud02:~# zfs list -o space
NAME                        AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
datastore                   1.05T  2.27T        0B    139K             0B      2.27T
datastore/vm-200001-disk-0  1.05T  4.38G     6.04M   4.37G             0B         0B
datastore/vm-200002-disk-0  1.18T   147G     7.54M   10.8G           136G         0B
datastore/vm-200003-disk-0  1.05T  1.48G     1.82M    117M          1.36G         0B
datastore/vm-200004-disk-0  1.09T  54.7G     4.69M   11.0G          43.7G         0B
datastore/vm-200005-disk-0  1.09T  77.8G     22.6M   34.2G          43.6G         0B
datastore/vm-200006-disk-0  1.09T  56.8G     7.66M   13.1G          43.7G         0B
datastore/vm-200007-disk-0  1.09T  67.4G     16.5M   23.7G          43.7G         0B
datastore/vm-200008-disk-0  1.09T  55.1G     5.38M   11.4G          43.7G         0B
datastore/vm-200009-disk-0  1.09T  53.2G     5.29M   9.55G          43.7G         0B
datastore/vm-200010-disk-0  1.09T  55.2G     29.6M   11.6G          43.6G         0B
datastore/vm-200011-disk-0  1.09T  49.5G     3.12M   5.84G          43.7G         0B
datastore/vm-200702-disk-0  2.24T  1.53T     1.55G    334G          1.20T         0B
datastore/vm-200996-disk-0  1.09T  46.7G        0B   3.04G          43.7G         0B
datastore/vm-200997-disk-0  1.05T  1.41G        0B   45.2M          1.37G         0B
datastore/vm-200998-disk-0  1.09T  46.1G        0B   2.46G          43.7G         0B
datastore/vm-200999-disk-0  1.09T  46.3G        0B   2.59G          43.7G         0B
root@prodcloud02:~# zpool get ashift datastore
NAME       PROPERTY  VALUE   SOURCE
datastore  ashift    12      local
root@prodcloud02:~# zfs get volblocksize
NAME                                                          PROPERTY      VALUE     SOURCE
datastore                                                     volblocksize  -         -
datastore/vm-200001-disk-0                                    volblocksize  8K        default
datastore/vm-200001-disk-0@__replicate_200001-0_1656914401__  volblocksize  -         -
datastore/vm-200002-disk-0                                    volblocksize  8K        default
datastore/vm-200002-disk-0@__replicate_200002-0_1656914405__  volblocksize  -         -
datastore/vm-200003-disk-0                                    volblocksize  8K        default
datastore/vm-200003-disk-0@__replicate_200003-0_1656914408__  volblocksize  -         -
datastore/vm-200004-disk-0                                    volblocksize  8K        default
datastore/vm-200004-disk-0@__replicate_200004-0_1656914412__  volblocksize  -         -
datastore/vm-200005-disk-0                                    volblocksize  8K        default
datastore/vm-200005-disk-0@__replicate_200005-0_1656914415__  volblocksize  -         -
datastore/vm-200006-disk-0                                    volblocksize  8K        default
datastore/vm-200006-disk-0@__replicate_200006-0_1656914419__  volblocksize  -         -
datastore/vm-200007-disk-0                                    volblocksize  8K        default
datastore/vm-200007-disk-0@__replicate_200007-0_1656914422__  volblocksize  -         -
datastore/vm-200008-disk-0                                    volblocksize  8K        default
datastore/vm-200008-disk-0@__replicate_200008-0_1656914426__  volblocksize  -         -
datastore/vm-200009-disk-0                                    volblocksize  8K        default
datastore/vm-200009-disk-0@__replicate_200009-0_1656914430__  volblocksize  -         -
datastore/vm-200010-disk-0                                    volblocksize  8K        default
datastore/vm-200010-disk-0@__replicate_200010-0_1656914433__  volblocksize  -         -
datastore/vm-200011-disk-0                                    volblocksize  8K        default
datastore/vm-200011-disk-0@__replicate_200011-0_1656914437__  volblocksize  -         -
datastore/vm-200702-disk-0                                    volblocksize  8K        default
datastore/vm-200702-disk-0@__replicate_200702-0_1656914440__  volblocksize  -         -
datastore/vm-200996-disk-0                                    volblocksize  8K        default
datastore/vm-200996-disk-0@__replicate_200996-0_1656914473__  volblocksize  -         -
datastore/vm-200997-disk-0                                    volblocksize  8K        default
datastore/vm-200997-disk-0@__replicate_200997-0_1656914476__  volblocksize  -         -
datastore/vm-200998-disk-0                                    volblocksize  8K        default
datastore/vm-200998-disk-0@__replicate_200998-0_1656914480__  volblocksize  -         -
datastore/vm-200999-disk-0                                    volblocksize  8K        default
datastore/vm-200999-disk-0@__replicate_200999-0_1656914482__  volblocksize  -         -
root@prodcloud02:~#

If I understand correctly to remove the lost space in padding overhead on my servers.

I need to:

Change Block size to 16K on the ZFS storage configuration.
For each VM:
Disable the replication on the VM
Destroy the VM disk on the replication server
Migrate the VM to the replication server
Destroy the remaining HDD on the main server
Migrate back the VM

Is it better to fully empty the ZFS storage during this procedure?

Best regards,
Raphael TRAINI

Dunuin · Jul 4, 2022

r.traini-ETIMIA said:
I need to:

Change Block size to 16K on the ZFS storage configuration.

For each VM:

Disable the replication on the VM

Destroy the VM disk on the replication server

Migrate the VM to the replication server

Destroy the remaining HDD on the main server

Migrate back the VM

Is it better to fully empty the ZFS storage during this procedure?

Not sure because of your replication. But you want to destroy all the virtual disks of VMs (zvols) and recreate them after "Change Block size to 16K on the ZFS storage configuration", so the zvols will be created with a volblocksize of 16K instead of 8k.

r.traini-ETIMIA · Jul 4, 2022

I will try on a new VM without any client data.

Beside the space lost in the padding, do you know any other effect of changing the block size that I should pay attention to?

Dunuin · Jul 4, 2022

r.traini-ETIMIA said:
I will try on a new VM without any client data.

Beside the space lost in the padding, do you know any other effect of changing the block size that I should pay attention to?

You will get horrible performance doing all write/read operation (but not that bad for sequential reads in case the ARC is allowed to cache data) that are smaller than the volblocksize. So a 16K volblocksize would be fine for MySQL DB doing 16K writes but really bad for a posgres DB doing 8K writes, as ZFS then can't read/write anything smaller than a 16K block. And in case you use SSDs they might wear faster.

Search

Search

ZFS space consumption

r.traini-ETIMIA

Member

Attachments

aaron

Proxmox Staff Member

r.traini-ETIMIA

Member

leesteken

Distinguished Member

Dunuin

Distinguished Member

r.traini-ETIMIA

Member

Dunuin

Distinguished Member

r.traini-ETIMIA

Member

Dunuin

Distinguished Member