ZFS Pool full even though VM limited to 50% of the pools size

Nov 28, 2016
108
25
93
Hamburg
Heya!

I've just had one of my VMs stalled because of a Proxmox "io-error". Well, seems the pool usage is 100%.

Code:
[root@~]# zfs get -p volsize,used,logicalused,compressratio vms/vm-11000-disk-0

NAME                 PROPERTY       VALUE    SOURCE

vms/vm-11000-disk-0  volsize        536870912000  local
vms/vm-11000-disk-0  used           965481422848  -
vms/vm-11000-disk-0  logicalused    503621775360  -
vms/vm-11000-disk-0  compressratio  1.00     -

Code:
[root@~]# qm config 11000 | egrep '^(scsi|virtio|sata|ide)'

ide2: none,media=cdrom
scsi0: local-zfs:vm-11000-disk-1,discard=on,iothread=1,size=15G,ssd=1
scsi1: vms:vm-11000-disk-0,backup=0,discard=on,iothread=1,size=500G,ssd=1
scsihw: virtio-scsi-single

Code:
[root@ ~]# zfs list -o space vms
NAME  AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
vms      0B   899G        0B     96K             0B       899G

Code:
[root@ ~]# zfs list -r -o name,used,avail,refer,usedbysnapshots,usedbyrefreservation vms
NAME                  USED  AVAIL  REFER  USEDSNAP  USEDREFRESERV
vms                   899G     0B    96K        0B             0B
vms/vm-11000-disk-0   899G     0B   899G        0B             0B


Code:
[root@ ~]# zfs get -r refreservation,reservation,volsize,used,copies vms
NAME                 PROPERTY        VALUE      SOURCE
vms                  refreservation  none       default
vms                  reservation     none       default
vms                  volsize         -          -
vms                  used            899G       -
vms                  copies          1          default
vms/vm-11000-disk-0  refreservation  508G       local
vms/vm-11000-disk-0  reservation     none       default
vms/vm-11000-disk-0  volsize         500G       local
vms/vm-11000-disk-0  used            899G       -
vms/vm-11000-disk-0  copies          1          default

Code:
[root@ ~]# zfs get all vms/vm-11000-disk-0
NAME                 PROPERTY              VALUE                  SOURCE
vms/vm-11000-disk-0  type                  volume                 -
vms/vm-11000-disk-0  creation              Tue May 20 23:33 2025  -
vms/vm-11000-disk-0  used                  899G                   -
vms/vm-11000-disk-0  available             0B                     -
vms/vm-11000-disk-0  referenced            899G                   -
vms/vm-11000-disk-0  compressratio         1.00x                  -
vms/vm-11000-disk-0  reservation           none                   default
vms/vm-11000-disk-0  volsize               500G                   local
vms/vm-11000-disk-0  volblocksize          16K                    default
vms/vm-11000-disk-0  checksum              on                     default
vms/vm-11000-disk-0  compression           on                     inherited from vms
vms/vm-11000-disk-0  readonly              off                    default
vms/vm-11000-disk-0  createtxg             14                     -
vms/vm-11000-disk-0  copies                1                      default
vms/vm-11000-disk-0  refreservation        508G                   local
vms/vm-11000-disk-0  guid                  5020797456205485381    -
vms/vm-11000-disk-0  primarycache          all                    default
vms/vm-11000-disk-0  secondarycache        all                    default
vms/vm-11000-disk-0  usedbysnapshots       0B                     -
vms/vm-11000-disk-0  usedbydataset         899G                   -
vms/vm-11000-disk-0  usedbychildren        0B                     -
vms/vm-11000-disk-0  usedbyrefreservation  0B                     -
vms/vm-11000-disk-0  logbias               latency                default
vms/vm-11000-disk-0  objsetid              643                    -
vms/vm-11000-disk-0  dedup                 off                    default
vms/vm-11000-disk-0  mlslabel              none                   default
vms/vm-11000-disk-0  sync                  standard               default
vms/vm-11000-disk-0  refcompressratio      1.00x                  -
vms/vm-11000-disk-0  written               899G                   -
vms/vm-11000-disk-0  logicalused           469G                   -
vms/vm-11000-disk-0  logicalreferenced     469G                   -
vms/vm-11000-disk-0  volmode               default                default
vms/vm-11000-disk-0  snapshot_limit        none                   default
vms/vm-11000-disk-0  snapshot_count        none                   default
vms/vm-11000-disk-0  snapdev               hidden                 default
vms/vm-11000-disk-0  context               none                   default
vms/vm-11000-disk-0  fscontext             none                   default
vms/vm-11000-disk-0  defcontext            none                   default
vms/vm-11000-disk-0  rootcontext           none                   default
vms/vm-11000-disk-0  redundant_metadata    all                    default
vms/vm-11000-disk-0  encryption            off                    default
vms/vm-11000-disk-0  keylocation           none                   default
vms/vm-11000-disk-0  keyformat             none                   default
vms/vm-11000-disk-0  pbkdf2iters           0                      default
vms/vm-11000-disk-0  prefetch              all                    default


Pretty late already. Solved it myself by finding the reason in write amplification through ZFS. Will resetup in this particular case
I'm still not 100% certain what causes this problem.

Ashift 12 -> 4K
volblocksize 16K

Everything is fine.
:)
 
Last edited:
This is - at least - surprising to me.
You have a 500 GB disk on a zpool with 900 GB capacity. Youre using no snapshots, which could take up additional space.
So whats eating the 399 GB here?

Write amplification refers to each little change in a file requiring writing a new block (or blocks with RAIDZ) - which takes longer and reduces the lifespan of an SSD. But after writing the new blocks the old blocks should be marked "free" so they can be written again.

What am I missing here?
 
  • Like
Reactions: leesteken
This is - at least - surprising to me.
You have a 500 GB disk on a zpool with 900 GB capacity. Youre using no snapshots, which could take up additional space.
So whats eating the 399 GB here?

Write amplification refers to each little change in a file requiring writing a new block (or blocks with RAIDZ) - which takes longer and reduces the lifespan of an SSD. But after writing the new blocks the old blocks should be marked "free" so they can be written again.

What am I missing here?
I was thinking the exact same and i'm still not certain if i figured out the correct reason just yet.
1765786092385.pngUsed to be 545G throughout the whole time. Then i filled the partition up and it just kept growing nonstop.
 
Please share
Bash:
zpool status -v
zfs list -rt all -ospace,reservation,refreservation vms
 
Can you run zfs trim? Does this command free space or does it abort instantly, because running trim needs free space to work.

VM gets trimmed once every day. All filesystems are trimmed.

Please share
Bash:
zpool status -v
zfs list -rt all -ospace,reservation,refreservation vms

Your wish may be fullfilled:

Code:
[root@ ~]# zpool status -v
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:29 with 0 errors on Sun Dec 14 00:24:30 2025
config:

        NAME                                                 STATE     READ WRITE CKSUM
        rpool                                                ONLINE       0     0     0
          mirror-0                                           ONLINE       0     0     0
            nvme-eui.e8238fa6bf530001001b444a41d1ad6f-part3  ONLINE       0     0     0
            nvme-eui.e8238fa6bf530001001b444a41d59903-part3  ONLINE       0     0     0

errors: No known data errors

  pool: vms
 state: ONLINE
  scan: scrub repaired 0B in 00:28:08 with 0 errors on Sun Dec 14 23:03:21 2025
config:

        NAME                                           STATE     READ WRITE CKSUM
        vms                                            ONLINE       0     0     0
          ata-Samsung_SSD_860_EVO_1TB_S3Z9NB0M210773H  ONLINE       0     0     0

errors: No known data errors

while "rpool" is a mirrored nvme pool of 2 discs (for proxmox and important vms) while "vms" is just a single ssd for development or file-shares

and

Code:
[root@ ~]# zfs list -rt all -ospace,reservation,refreservation vms
NAME                 AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD  RESERV  REFRESERV
vms                  98.2G   815G        0B     96K             0B       815G    none       none
vms/vm-11000-disk-0  98.2G   815G        0B    815G             0B         0B    none       508G

i was able to clear up some space in the vm once i adjusted the spa_slop_shift and thus was able to start the vm again. But - the problematic VM still is 300GB "overweight" (780GB <> 500 GB)

I don't see any mention of how the pool is built. Can you post the output of zpool status?
If it is a RAIDZ pool, this might not be too surprising. Background: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_zfs_raid_considerations

Though with larger volblocksizes, this should be less of an issue.
Instead of asking for specific properties, you might just post them all:

zfs get all vms/vm-11000-disk-0

I'd have understood if it would be a RaidZ-Pool. Was my very first thought because of the fact that the usage is close to 2x. But its a single-disc pool and thus drives me crazy as i am not familiar enough with the core mechanics of zfs to figure that state out by myself just yet. Learned a lot throughout the last 2 days though. Maybe we can get to the bottom of it.

Code:
[root@ ~]# zfs get all vms/vm-11000-disk-0
NAME                 PROPERTY              VALUE                  SOURCE
vms/vm-11000-disk-0  type                  volume                 -
vms/vm-11000-disk-0  creation              Tue May 20 23:33 2025  -
vms/vm-11000-disk-0  used                  815G                   -
vms/vm-11000-disk-0  available             98.2G                  -
vms/vm-11000-disk-0  referenced            815G                   -
vms/vm-11000-disk-0  compressratio         1.00x                  -
vms/vm-11000-disk-0  reservation           none                   default
vms/vm-11000-disk-0  volsize               500G                   local
vms/vm-11000-disk-0  volblocksize          16K                    default
vms/vm-11000-disk-0  checksum              on                     default
vms/vm-11000-disk-0  compression           on                     inherited from vms
vms/vm-11000-disk-0  readonly              off                    default
vms/vm-11000-disk-0  createtxg             14                     -
vms/vm-11000-disk-0  copies                1                      default
vms/vm-11000-disk-0  refreservation        508G                   local
vms/vm-11000-disk-0  guid                  5020797456205485381    -
vms/vm-11000-disk-0  primarycache          all                    default
vms/vm-11000-disk-0  secondarycache        all                    default
vms/vm-11000-disk-0  usedbysnapshots       0B                     -
vms/vm-11000-disk-0  usedbydataset         815G                   -
vms/vm-11000-disk-0  usedbychildren        0B                     -
vms/vm-11000-disk-0  usedbyrefreservation  0B                     -
vms/vm-11000-disk-0  logbias               latency                default
vms/vm-11000-disk-0  objsetid              643                    -
vms/vm-11000-disk-0  dedup                 off                    default
vms/vm-11000-disk-0  mlslabel              none                   default
vms/vm-11000-disk-0  sync                  standard               default
vms/vm-11000-disk-0  refcompressratio      1.00x                  -
vms/vm-11000-disk-0  written               815G                   -
vms/vm-11000-disk-0  logicalused           411G                   -
vms/vm-11000-disk-0  logicalreferenced     411G                   -
vms/vm-11000-disk-0  volmode               default                default
vms/vm-11000-disk-0  snapshot_limit        none                   default
vms/vm-11000-disk-0  snapshot_count        none                   default
vms/vm-11000-disk-0  snapdev               hidden                 default
vms/vm-11000-disk-0  context               none                   default
vms/vm-11000-disk-0  fscontext             none                   default
vms/vm-11000-disk-0  defcontext            none                   default
vms/vm-11000-disk-0  rootcontext           none                   default
vms/vm-11000-disk-0  redundant_metadata    all                    default
vms/vm-11000-disk-0  encryption            off                    default
vms/vm-11000-disk-0  keylocation           none                   default
vms/vm-11000-disk-0  keyformat             none                   default
vms/vm-11000-disk-0  pbkdf2iters           0                      default
vms/vm-11000-disk-0  prefetch              all                    default
 
Last edited:
Hmm, so the pool setup is a simple as possible: single disk. That means no additional overhead for parity.

Only 1 copy and no snapshots, but still used and written twice as much as logically used.
The used / written is almost double that of the volblocksize / refreservation, which, without snapshots, is quite unexpected, IMHO.

Just to make sure we don't miss anything, could you post more details?
zfs get all vms
zpool get all vms
 
Hmm, so the pool setup is a simple as possible: single disk. That means no additional overhead for parity.

Only 1 copy and no snapshots, but still used and written twice as much as logically used.
The used / written is almost double that of the volblocksize / refreservation, which, without snapshots, is quite unexpected, IMHO.

Just to make sure we don't miss anything, could you post more details?
zfs get all vms
zpool get all vms

Highly appreciated your time aaron (as always!). You may be served:

Code:
[root@ ~]# zfs get all vms
NAME  PROPERTY              VALUE                  SOURCE
vms   type                  filesystem             -
vms   creation              Tue May 20 23:33 2025  -
vms   used                  815G                   -
vms   available             98.2G                  -
vms   referenced            96K                    -
vms   compressratio         1.00x                  -
vms   mounted               yes                    -
vms   quota                 none                   default
vms   reservation           none                   default
vms   recordsize            128K                   default
vms   mountpoint            /vms                   default
vms   sharenfs              off                    default
vms   checksum              on                     default
vms   compression           on                     local
vms   atime                 on                     default
vms   devices               on                     default
vms   exec                  on                     default
vms   setuid                on                     default
vms   readonly              off                    default
vms   zoned                 off                    default
vms   snapdir               hidden                 default
vms   aclmode               discard                default
vms   aclinherit            restricted             default
vms   createtxg             1                      -
vms   canmount              on                     default
vms   xattr                 on                     default
vms   copies                1                      default
vms   version               5                      -
vms   utf8only              off                    -
vms   normalization         none                   -
vms   casesensitivity       sensitive              -
vms   vscan                 off                    default
vms   nbmand                off                    default
vms   sharesmb              off                    default
vms   refquota              none                   default
vms   refreservation        none                   default
vms   guid                  10649419890694770014   -
vms   primarycache          all                    default
vms   secondarycache        all                    default
vms   usedbysnapshots       0B                     -
vms   usedbydataset         96K                    -
vms   usedbychildren        815G                   -
vms   usedbyrefreservation  0B                     -
vms   logbias               latency                default
vms   objsetid              54                     -
vms   dedup                 off                    default
vms   mlslabel              none                   default
vms   sync                  standard               default
vms   dnodesize             legacy                 default
vms   refcompressratio      1.00x                  -
vms   written               96K                    -
vms   logicalused           411G                   -
vms   logicalreferenced     42K                    -
vms   volmode               default                default
vms   filesystem_limit      none                   default
vms   snapshot_limit        none                   default
vms   filesystem_count      none                   default
vms   snapshot_count        none                   default
vms   snapdev               hidden                 default
vms   acltype               off                    default
vms   context               none                   default
vms   fscontext             none                   default
vms   defcontext            none                   default
vms   rootcontext           none                   default
vms   relatime              on                     default
vms   redundant_metadata    all                    default
vms   overlay               on                     default
vms   encryption            off                    default
vms   keylocation           none                   default
vms   keyformat             none                   default
vms   pbkdf2iters           0                      default
vms   special_small_blocks  0                      default
vms   prefetch              all                    default

Code:
[root@ ~]# zpool get all vms
NAME  PROPERTY                       VALUE                          SOURCE
vms   size                           928G                           -
vms   capacity                       87%                            -
vms   altroot                        -                              default
vms   health                         ONLINE                         -
vms   guid                           2927317872457232263            -
vms   version                        -                              default
vms   bootfs                         -                              default
vms   delegation                     on                             default
vms   autoreplace                    off                            default
vms   cachefile                      none                           local
vms   failmode                       wait                           default
vms   listsnapshots                  off                            default
vms   autoexpand                     off                            default
vms   dedupratio                     1.00x                          -
vms   free                           113G                           -
vms   allocated                      815G                           -
vms   readonly                       off                            -
vms   ashift                         12                             local
vms   comment                        -                              default
vms   expandsize                     -                              -
vms   freeing                        0                              -
vms   fragmentation                  30%                            -
vms   leaked                         0                              -
vms   multihost                      off                            default
vms   checkpoint                     -                              -
vms   load_guid                      8463444106636980413            -
vms   autotrim                       off                            default
vms   compatibility                  off                            default
vms   bcloneused                     0                              -
vms   bclonesaved                    0                              -
vms   bcloneratio                    1.00x                          -
vms   feature@async_destroy          enabled                        local
vms   feature@empty_bpobj            active                         local
vms   feature@lz4_compress           active                         local
vms   feature@multi_vdev_crash_dump  enabled                        local
vms   feature@spacemap_histogram     active                         local
vms   feature@enabled_txg            active                         local
vms   feature@hole_birth             active                         local
vms   feature@extensible_dataset     active                         local
vms   feature@embedded_data          active                         local
vms   feature@bookmarks              enabled                        local
vms   feature@filesystem_limits      enabled                        local
vms   feature@large_blocks           enabled                        local
vms   feature@large_dnode            enabled                        local
vms   feature@sha512                 enabled                        local
vms   feature@skein                  enabled                        local
vms   feature@edonr                  enabled                        local
vms   feature@userobj_accounting     active                         local
vms   feature@encryption             enabled                        local
vms   feature@project_quota          active                         local
vms   feature@device_removal         enabled                        local
vms   feature@obsolete_counts        enabled                        local
vms   feature@zpool_checkpoint       enabled                        local
vms   feature@spacemap_v2            active                         local
vms   feature@allocation_classes     enabled                        local
vms   feature@resilver_defer         enabled                        local
vms   feature@bookmark_v2            enabled                        local
vms   feature@redaction_bookmarks    enabled                        local
vms   feature@redacted_datasets      enabled                        local
vms   feature@bookmark_written       enabled                        local
vms   feature@log_spacemap           active                         local
vms   feature@livelist               enabled                        local
vms   feature@device_rebuild         enabled                        local
vms   feature@zstd_compress          enabled                        local
vms   feature@draid                  enabled                        local
vms   feature@zilsaxattr             enabled                        local
vms   feature@head_errlog            active                         local
vms   feature@blake3                 enabled                        local
vms   feature@block_cloning          enabled                        local
vms   feature@vdev_zaps_v2           active                         local
 
Just to complete my mental distress, i've tried to verify the behaviour:


Code:
[root@ ~]# zfs get -p used,logicalused,compressratio,copies,volblocksize vms/vm-11000-disk-0
NAME                 PROPERTY       VALUE     SOURCE
vms/vm-11000-disk-0  used           874623336448  -
vms/vm-11000-disk-0  logicalused    441626849280  -
vms/vm-11000-disk-0  compressratio  1.00      -
vms/vm-11000-disk-0  copies         1         default
vms/vm-11000-disk-0  volblocksize   16384     default

[root@ ~]# zfs create -V 2G vms/testvol
[root@ ~]# dd if=/dev/urandom of=/dev/zvol/vms/testvol bs=1M count=1024 status=progress
999292928 bytes (999 MB, 953 MiB) copied, 3 s, 333 MB/s
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.31795 s, 249 MB/s

[root@ ~]# sync
[root@ ~]# zfs get -p used,logicalused,compressratio,copies vms/testvol
NAME         PROPERTY       VALUE   SOURCE
vms/testvol  used           2183135232  -
vms/testvol  logicalused    1076658176  -
vms/testvol  compressratio  1.00    -
vms/testvol  copies         1       default

Fresh volume, copied 1GB in, have 2GB used. Copies=1. Single Disk. Even though i am not a huge fan of it and i feel bad for doing so, i asked GPT. It has no idea either. Lol. Im mad.
 
Hmm, how did you create that ZFS pool? Which version of PVE?

I tried to recreate that in a test VM and even after writing almost 10G into a 10G volume, I get the following result:
Code:
root@pvezfstest:~# zfs get all vms/vm-100-disk-0
NAME               PROPERTY              VALUE                  SOURCE
vms/vm-100-disk-0  type                  volume                 -
vms/vm-100-disk-0  creation              Mon Dec 15 12:06 2025  -
vms/vm-100-disk-0  used                  10.2G                  -
vms/vm-100-disk-0  available             76.9G                  -
vms/vm-100-disk-0  referenced            9.82G                  -
vms/vm-100-disk-0  compressratio         1.00x                  -
vms/vm-100-disk-0  reservation           none                   default
vms/vm-100-disk-0  volsize               10G                    local
vms/vm-100-disk-0  volblocksize          16K                    default
vms/vm-100-disk-0  checksum              on                     default
vms/vm-100-disk-0  compression           on                     inherited from vms
vms/vm-100-disk-0  readonly              off                    default
vms/vm-100-disk-0  createtxg             14                     -
vms/vm-100-disk-0  copies                1                      default
vms/vm-100-disk-0  refreservation        10.2G                  local
vms/vm-100-disk-0  guid                  14129021757601767831   -
vms/vm-100-disk-0  primarycache          all                    default
vms/vm-100-disk-0  secondarycache        all                    default
vms/vm-100-disk-0  usedbysnapshots       0B                     -
vms/vm-100-disk-0  usedbydataset         9.82G                  -
vms/vm-100-disk-0  usedbychildren        0B                     -
vms/vm-100-disk-0  usedbyrefreservation  348M                   -
vms/vm-100-disk-0  logbias               latency                default
vms/vm-100-disk-0  objsetid              76                     -
vms/vm-100-disk-0  dedup                 off                    default
vms/vm-100-disk-0  mlslabel              none                   default
vms/vm-100-disk-0  sync                  standard               default
vms/vm-100-disk-0  refcompressratio      1.00x                  -
vms/vm-100-disk-0  written               9.82G                  -
vms/vm-100-disk-0  logicalused           9.79G                  -
vms/vm-100-disk-0  logicalreferenced     9.79G                  -
vms/vm-100-disk-0  volmode               default                default
vms/vm-100-disk-0  snapshot_limit        none                   default
vms/vm-100-disk-0  snapshot_count        none                   default
vms/vm-100-disk-0  snapdev               hidden                 default
vms/vm-100-disk-0  context               none                   default
vms/vm-100-disk-0  fscontext             none                   default
vms/vm-100-disk-0  defcontext            none                   default
vms/vm-100-disk-0  rootcontext           none                   default
vms/vm-100-disk-0  redundant_metadata    all                    default
vms/vm-100-disk-0  encryption            off                    default
vms/vm-100-disk-0  keylocation           none                   default
vms/vm-100-disk-0  keyformat             none                   default
vms/vm-100-disk-0  pbkdf2iters           0                      default
vms/vm-100-disk-0  prefetch              all                    default
vms/vm-100-disk-0  volthreading          on                     default

Zpool is just a single disk, created via the PVE weg UI on PVE 9.1.
The
Code:
zpool get all <pool>
results are the same as for you.

Something is off that we are missing.