What is wrong with ZFS reported AVAIL and REFRESERV on ZVOLs?

esi_y

Renowned Member
Nov 29, 2023
1,997
308
63
github.com
I am making this post to track a suspected issue that I think I found in another thread [1] - stay with me here first please.

It appeared that the OP ran out of space in his zvols despite there was plenty to spare, the only suspicious part was that since everything was used up to the last byte, the AVAIL for the pool itself was showing up as 0.

The issue was miraculously resolved by simply lowering REFRESERVE value on a zvol so that there's something left for the pool, this worked despite there were no snapshots or other suspects at play.

I now performed a test with ZFS and zvols (because nothing strange was happening with regular datasets and manually set REFSERVE and pool left with AVAIL 0). However, with a single blockdev vdev with 7 zvols filled up to the brim (almost - it was not possible to use the exact byte value shown as leftover for the pool to create the zv7):

Code:
zpool create -R /mnt sh1 /dev/disk/by-partlabel/sh1

zfs create sh1/zv1 -V 2T
zfs create sh1/zv2 -V 2T
zfs create sh1/zv3 -V 1T
zfs create sh1/zv4 -V 690G
zfs create sh1/zv5 -V 7G
zfs create sh1/zv6 -V 1000MB
zfs create sh1/zv7 -V 12.5M


Code:
# zfs list sh1 -r -o space

NAME     AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD

sh1       344K  5.86T        0B     96K             0B      5.86T
sh1/zv1  2.06T  2.06T        0B     56K          2.06T         0B
sh1/zv2  2.06T  2.06T        0B     56K          2.06T         0B
sh1/zv3  1.03T  1.03T        0B     56K          1.03T         0B
sh1/zv4   712G   712G        0B     56K           712G         0B
sh1/zv5  7.22G  7.22G        0B     56K          7.22G         0B
sh1/zv6  1.01G  1.01G        0B     56K          1.01G         0B
sh1/zv7  15.3M    15M        0B     72K          14.9M         0B

# zfs list sh1 -r -o space -p

NAME             AVAIL           USED  USEDSNAP  USEDDS  USEDREFRESERV      USEDCHILD
sh1             352256  6442450591744         0   98304              0  6442450493440
sh1/zv1  2267812290560  2267811995648         0   57344  2267811938304              0
sh1/zv2  2267812290560  2267811995648         0   57344  2267811938304              0
sh1/zv3  1133907369984  1133907075072         0   57344  1133907017728              0
sh1/zv4   764059672576   764059377664         0   57344   764059320320              0
sh1/zv5     7753523200     7753228288         0   57344     7753170944              0
sh1/zv6     1083793408     1083498496         0   57344     1083441152              0
sh1/zv7       16007168       15728640         0   73728       15654912              0

All of these zvols got mkfs.ext4 on them, got mounted and started to get dd if=/dev/random of=/mnt/zvX/output.bin being written to them till they fill up, starting with the smallest backwards to the first (*EDIT: I am only now starting dd on raw zv1 for a change).

Now, the strangest thing. The AVAIL on the pool is dancing around, both up and down during this process.

Code:
NAME             AVAIL           USED  USEDSNAP        USEDDS  USEDREFRESERV      USEDCHILD
----- >8 -----

# progressively in time
# zfs list -p -o space sh1

sh1             303104  6442450640896         0         98304              0  6442450542592

sh1             290816  6442450653184         0         98304              0  6442450554880

sh1             241664  6442450702336         0         98304              0  6442450604032

sh1             143360  6442450800640         0         98304              0  6442450702336

sh1             192512  6442450751488         0         98304              0  6442450653184

sh1              94208  6442450849792         0         98304              0  6442450751488

sh1              57344  6442450886656         0         98304              0  6442450788352

sh1              45056  6442450898944         0         98304              0  6442450800640

sh1              20480  6442450923520         0         98304              0  6442450825216

Now this was around 300GB having been written to the ext4 zvols (which should not really matter to the pool, especially with REFRESERV) when ...

Code:
sh1                  0  6442450948096         0         98304              0  6442450849792

If you think that's the end, it is actually not, it bounced back up from zero and continued. At this point I got impatient and...

Code:
zfs destroy sh1/zv7

zfs create sh1/ds1
zfs create sh1/ds2
zfs create sh1/ds3
zfs create sh1/ds4
zfs create sh1/ds5

zfs set refreservation=15175680 sh1/ds1
zfs set refreservation=147456 sh1/ds2
zfs set refreservation=73728 sh1/ds3
...

You get the idea.

I would end up with something like:

Code:
NAME             AVAIL           USED  USEDSNAP        USEDDS  USEDREFRESERV      USEDCHILD
sh1                  0  6442450980864         0        106496              0  6442450874368
sh1/ds1       15077376       15175680         0         98304       15077376              0
sh1/ds2          49152         147456         0         98304          49152              0
sh1/ds3              0          98304         0         98304              0              0
sh1/ds4              0          98304         0         98304              0              0
sh1/ds5          12288         110592         0         98304          12288              0
sh1/zv1  2267811880960  2267811938304         0         57344  2267811880960              0
sh1/zv2  2213419253760  2267811938304         0   54392684544  2213419253760              0
sh1/zv3  1015882592256  1133907017728         0  118024425472  1015882592256              0
sh1/zv4   594064797696   764059320320         0  169994522624   594064797696              0
sh1/zv5      392945664     7753170944         0    7360225280      392945664              0
sh1/zv6       76566528     1083441152         0    1006874624       76566528              0

Code:
NAME             AVAIL           USED  USEDSNAP        USEDDS  USEDREFRESERV      USEDCHILD
sh1                  0  6442450993152         0        106496              0  6442450886656
sh1/ds1       15077376       15175680         0         98304       15077376              0
sh1/ds2          49152         147456         0         98304          49152              0
sh1/ds3              0          98304         0         98304              0              0
sh1/ds4              0          98304         0         98304              0              0
sh1/ds5          12288         110592         0         98304          12288              0
sh1/zv1  2267811880960  2267811938304         0         57344  2267811880960              0
sh1/zv2  2213137915904  2267811938304         0   54674022400  2213137915904              0
sh1/zv3  1015881961472  1133907017728         0  118025056256  1015881961472              0
sh1/zv4   593843482624   764059320320         0  170215837696   593843482624              0
sh1/zv5      392945664     7753170944         0    7360225280      392945664              0
sh1/zv6       76566528     1083441152         0    1006874624       76566528              0

Code:
NAME             AVAIL           USED  USEDSNAP        USEDDS  USEDREFRESERV      USEDCHILD
sh1                  0  6442451005440         0        106496              0  6442450898944
sh1/ds1       15077376       15175680         0         98304       15077376              0
sh1/ds2          49152         147456         0         98304          49152              0
sh1/ds3              0          98304         0         98304              0              0
sh1/ds4              0          98304         0         98304              0              0
sh1/ds5          12288         110592         0         98304          12288              0
sh1/zv1  2267811880960  2267811938304         0         57344  2267811880960              0
sh1/zv2  2212535922688  2267811938304         0   55276015616  2212535922688              0
sh1/zv3  1015699546112  1133907017728         0  118207471616  1015699546112              0
sh1/zv4   593631322112   764059320320         0  170427998208   593631322112              0
sh1/zv5      392945664     7753170944         0    7360225280      392945664              0
sh1/zv6       76566528     1083441152         0    1006874624       76566528              0

Code:
NAME             AVAIL           USED  USEDSNAP        USEDDS  USEDREFRESERV      USEDCHILD
sh1                  0  6442451017728         0        106496              0  6442450911232
sh1/ds1       15077376       15175680         0         98304       15077376              0
sh1/ds2          49152         147456         0         98304          49152              0
sh1/ds3              0          98304         0         98304              0              0
sh1/ds4              0          98304         0         98304              0              0
sh1/ds5          12288         110592         0         98304          12288              0
sh1/zv1  2267811880960  2267811938304         0         57344  2267811880960              0
sh1/zv2  2211383775232  2267811938304         0   56428163072  2211383775232              0
sh1/zv3  1014478495744  1133907017728         0  119428521984  1014478495744              0
sh1/zv4   592018845696   764059320320         0  172040474624   592018845696              0
sh1/zv5      392945664     7753170944         0    7360225280      392945664              0
sh1/zv6       76566528     1083441152         0    1006874624       76566528              0

As for now, it's still writing more and more GBs.

Note the REFRESERV values for the ds[3-5] did not quite get applied well.

Does anyone have an idea what's with this behaviour? I will update the post if the ext4 writes fail prematurely (as I had thought they would, but not so far).

[1] https://forum.proxmox.com/threads/m...fs-smart-passed-any-ideas.151260/#post-684621
 
Last edited:
I suspect there won't be much to say about this thread, will have to take it with OpenZFS folks, but nevertheless for completeness.

The fillups completed (without error, well "no space left on device one", but when expected, i.e. df shows 0 space left afterwards).

Midway through the process I removed the ds[1-5], later it was impossible to recreate them as the AVAIL for the pool got to 0 and stayed there.

Now the result is:

Code:
NAME     AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
sh1         0B  5.86T        0B    104K             0B      5.86T
sh1/zv1  42.0G  2.06T        0B   2.02T          42.0G         0B
sh1/zv2  46.0G  2.06T        0B   2.02T          46.0G         0B
sh1/zv3  36.2G  1.03T        0B   1020G          36.2G         0B
sh1/zv4  26.0G   712G        0B    686G          26.0G         0B
sh1/zv5   375M  7.22G        0B   6.85G           375M         0B
sh1/zv6  73.0M  1.01G        0B    960M          73.0M         0B

Code:
NAME           AVAIL           USED  USEDSNAP         USEDDS  USEDREFRESERV      USEDCHILD
sh1                0  6442452307968         0         106496              0  6442452201472
sh1/zv1  45141106688  2267811938304         0  2222670831616    45141106688              0
sh1/zv2  49388064768  2267811938304         0  2218423873536    49388064768              0
sh1/zv3  38912122880  1133907017728         0  1094994894848    38912122880              0
sh1/zv4  27935129600   764059320320         0   736124190720    27935129600              0
sh1/zv5    392945664     7753170944         0     7360225280      392945664              0
sh1/zv6     76566528     1083441152         0     1006874624       76566528              0

Note that the zv7 that was deleted also of course cannot be recreated with 0 AVAIL left. So something filled up 15M of space just during copying data into zvols.

Also, zv[2-7] were regular ext4s filled up with a file from dd, zv1 got filled up with dd as a blockdev. The ext4s show 0 free space left within, the dd on zv1 finished with no space left. However there's e.g. 46G wasted on ext4 with total size of 2T and 42G on the raw blockdev somehow.

Conclusion for now: The numbers reported from ZFS seem to be complete mess. Something is taking up pool space despite it does not appear anywhere USED and USEDDS remained where it peaked after creating all the zvols and datasets.

EDIT: Both dedup and compression were off.

Code:
# zfs get all sh1

NAME  PROPERTY               VALUE                  SOURCE

sh1   type                   filesystem             -
sh1   creation               Fri Jul 26 23:43 2024  -
sh1   used                   5.86T                  -
sh1   available              0B                     -
sh1   referenced             104K                   -
sh1   compressratio          1.00x                  -
sh1   mounted                yes                    -
sh1   quota                  none                   default
sh1   reservation            none                   default
sh1   recordsize             128K                   default
sh1   mountpoint             /mnt/sh1               default
sh1   sharenfs               off                    default
sh1   checksum               on                     default
sh1   compression            off                    local
sh1   atime                  on                     default
sh1   devices                on                     default
sh1   exec                   on                     default
sh1   setuid                 on                     default
sh1   readonly               off                    default
sh1   zoned                  off                    default
sh1   snapdir                hidden                 default
sh1   aclmode                discard                default
sh1   aclinherit             restricted             default
sh1   createtxg              1                      -
sh1   canmount               on                     default
sh1   xattr                  on                     default
sh1   copies                 1                      default
sh1   version                5                      -
sh1   utf8only               off                    -
sh1   normalization          none                   -
sh1   casesensitivity        sensitive              -
sh1   vscan                  off                    default
sh1   nbmand                 off                    default
sh1   sharesmb               off                    default
sh1   refquota               none                   default
sh1   refreservation         none                   default
sh1   guid                   13881366558305729593   -
sh1   primarycache           all                    default
sh1   secondarycache         all                    default
sh1   usedbysnapshots        0B                     -
sh1   usedbydataset          104K                   -
sh1   usedbychildren         5.86T                  -
sh1   usedbyrefreservation   0B                     -
sh1   logbias                latency                default
sh1   objsetid               54                     -
sh1   dedup                  off                    default
sh1   mlslabel               none                   default
sh1   sync                   standard               default
sh1   dnodesize              legacy                 default
sh1   refcompressratio       1.00x                  -
sh1   written                104K                   -
sh1   logicalused            5.68T                  -
sh1   logicalreferenced      46K                    -
sh1   volmode                default                default
sh1   filesystem_limit       none                   default
sh1   snapshot_limit         none                   default
sh1   filesystem_count       none                   default
sh1   snapshot_count         none                   default
sh1   snapdev                hidden                 default
sh1   acltype                off                    default
sh1   context                none                   default
sh1   fscontext              none                   default
sh1   defcontext             none                   default
sh1   rootcontext            none                   default
sh1   relatime               off                    default
sh1   redundant_metadata     all                    default
sh1   overlay                on                     default
sh1   encryption             off                    default
sh1   keylocation            none                   default
sh1   keyformat              none                   default
sh1   pbkdf2iters            0                      default
sh1   special_small_blocks   0                      default
sh1   com.sun:auto-snapshot  false                  local
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!