ZFS big overhead

decibel83

Renowned Member
Oct 15, 2008
210
1
83
Hi,
I have a ZFS rpool with 4 x 10 Tb drives on raidz1-0:

Code:
root@pve:~# zpool status -v
  pool: rpool
state: ONLINE
  scan: scrub repaired 0B in 1 days 04:24:17 with 0 errors on Mon Mar 15 04:48:19 2021
config:

    NAME                                         STATE     READ WRITE CKSUM
    rpool                                        ONLINE       0     0     0
      raidz1-0                                   ONLINE       0     0     0
        ata-ST10000NM0568-2H5110_ZHZ3T0LT-part3  ONLINE       0     0     0
        ata-HGST_HUH721010ALE600_JEJYHTEZ-part3  ONLINE       0     0     0
        ata-ST10000NM0568-2H5110_ZHZ4HY5K-part3  ONLINE       0     0     0
        ata-ST10000NM0568-2H5110_ZHZ4EVM9-part3  ONLINE       0     0     0

Code:
root@pve:~# zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
rpool  36.4T  35.2T  1.14T        -         -    26%    96%  1.00x    ONLINE  -

One virtual machine is using a 20 Tb volume which I realised is using 25.5 Tb:

Code:
root@pve:~# zfs list
NAME                       USED  AVAIL     REFER  MOUNTPOINT
rpool                     25.6T  63.0M      140K  /rpool
rpool/ROOT                2.13G  63.0M      140K  /rpool/ROOT
rpool/ROOT/pve-1          2.13G  63.0M     2.13G  /
rpool/data                25.6T  63.0M      140K  /rpool/data
rpool/data/vm-101-disk-0  54.7G  63.0M     54.7G  -
rpool/data/vm-101-disk-1  33.0G  63.0M     33.0G  -
rpool/data/vm-101-disk-2  25.5T  63.0M     25.5T  -

The volume rpool/data/vm-101-disk-2 is mounted on the virtual machine as /data and it has 6 Tb of free space, so I'm wondering why it's using 25.5 Tb and I'm out of space!

There are no snapshots at all.

These are the properties of the ZFS dataset:

Code:
root@pve:~# zfs get all rpool/data/vm-101-disk-2
NAME                      PROPERTY              VALUE                  SOURCE
rpool/data/vm-101-disk-2  type                  volume                 -
rpool/data/vm-101-disk-2  creation              Sun Mar 22 15:45 2020  -
rpool/data/vm-101-disk-2  used                  25.5T                  -
rpool/data/vm-101-disk-2  available             63.0M                  -
rpool/data/vm-101-disk-2  referenced            25.5T                  -
rpool/data/vm-101-disk-2  compressratio         1.02x                  -
rpool/data/vm-101-disk-2  reservation           none                   default
rpool/data/vm-101-disk-2  volsize               20T                    local
rpool/data/vm-101-disk-2  volblocksize          8K                     default
rpool/data/vm-101-disk-2  checksum              on                     default
rpool/data/vm-101-disk-2  compression           on                     inherited from rpool
rpool/data/vm-101-disk-2  readonly              off                    default
rpool/data/vm-101-disk-2  createtxg             39415                  -
rpool/data/vm-101-disk-2  copies                1                      default
rpool/data/vm-101-disk-2  refreservation        none                   default
rpool/data/vm-101-disk-2  guid                  8050741626803556288    -
rpool/data/vm-101-disk-2  primarycache          all                    default
rpool/data/vm-101-disk-2  secondarycache        all                    default
rpool/data/vm-101-disk-2  usedbysnapshots       0B                     -
rpool/data/vm-101-disk-2  usedbydataset         25.5T                  -
rpool/data/vm-101-disk-2  usedbychildren        0B                     -
rpool/data/vm-101-disk-2  usedbyrefreservation  0B                     -
rpool/data/vm-101-disk-2  logbias               latency                default
rpool/data/vm-101-disk-2  objsetid              848                    -
rpool/data/vm-101-disk-2  dedup                 off                    default
rpool/data/vm-101-disk-2  mlslabel              none                   default
rpool/data/vm-101-disk-2  sync                  standard               inherited from rpool
rpool/data/vm-101-disk-2  refcompressratio      1.02x                  -
rpool/data/vm-101-disk-2  written               25.5T                  -
rpool/data/vm-101-disk-2  logicalused           18.0T                  -
rpool/data/vm-101-disk-2  logicalreferenced     18.0T                  -
rpool/data/vm-101-disk-2  volmode               default                default
rpool/data/vm-101-disk-2  snapshot_limit        none                   default
rpool/data/vm-101-disk-2  snapshot_count        none                   default
rpool/data/vm-101-disk-2  snapdev               hidden                 default
rpool/data/vm-101-disk-2  context               none                   default
rpool/data/vm-101-disk-2  fscontext             none                   default
rpool/data/vm-101-disk-2  defcontext            none                   default
rpool/data/vm-101-disk-2  rootcontext           none                   default
rpool/data/vm-101-disk-2  redundant_metadata    all                    default
rpool/data/vm-101-disk-2  encryption            off                    default
rpool/data/vm-101-disk-2  keylocation           none                   default
rpool/data/vm-101-disk-2  keyformat             none                   default
rpool/data/vm-101-disk-2  pbkdf2iters           0                      default

I cannot start the virtual machine because it runs on I/O error some minutes after start.

Could you help me please to understand why the dataset is using so many disk space and how I could fix the problem freeing some space?

Thank you very much for your help!
 
Last edited:
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_zfs_raid_considerations

zvols on raidz have a non-neglible space overhead unless the volblocksize is increased. increasing the volblocksize does have other downsides (write amplification -> lower performance) and can only be done up-front.

So do you expect more than 50% overhead (the volume is occupied by about 15 Tb and it's taking 25,5 Tb of effective disk space)?
It's 66% overhead!

Even if the ZFS pool is thin provisioned into PVE?

How I can exit of this bad situation?

Thank you very much!
 
no, the volume references 18TB of data (you have to keep in mind that not everything the OS inside the VM sees as free is also free as seen by ZFS). so the overhead right now is about 40%. and yes, that is to be expected (see the link I posted for details).

there are two solutions:
- don't use raidz, but mirrored vdevs
- use a bigger volblocksize and tune the OS inside to make the best of it

both require you to redo the pool (move to mirror) or the zvol (move to bigger volblocksize), so you need space to store the data somewhere else temporarily.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!