[SOLVED] zfs usage extremely high - deleting VMs does not free up space

Bernie2020

New Member
Feb 13, 2020
10
0
1
33
Hi! I have been using proxmox for over a month now and am really excited about how good it is, I really appreciate the dedication to open source and the amazing work which has been done here. :)

My machine runs a 500 GB NVMe (970 Evo) with Proxmox installed onto it using zfs. Both local (pve) and local-zfs (pve) have were created during with zfs during the install. From what I thought to understand from reading in the forum and wikis, when I create a VM with a 64 GB of VitIO SCSI, and the e.g. Windows 10 guest only uses say, 20 GB of it, then the usage of local-zfs should only increase by 20 GB. The same would happen if I cloned that VM. This is what I thought thin-provisioning was about.
Snapshots or linked clones would only increase local-zfs usage by the amount of data which changed compared to the VM disk of the template or the most recent snapshot before.
So when I found out that after a couple more clonings of win10 VMs my local-zfs usage (viewed via pve>storage>local-zfs>Summary) was at around 400 GiB of 410 GiB, I was stumped. Even after deleting several of my VMs with 64G of local-zfs storage (scsi0), my usage did not decrease accordingly. I moved some more VMs to another zfs pool on usb drives, but the usage now is still only at 375GiB and not less.

I really do not know where the problem might be; I have discard enabled in all my VMs, tried to manually trim from within 3 of my win10 guests according to this forum post, tried to manually trim on the host with the commands qm agent VMID fstrim and zpool trim rpool, all without decreasing my usage at all.

Running zfs list results in:

Code:
root@pve:~# zfs list -ospace
NAME                                            AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
mirzpool                                        5.99T  1.05T        0B     96K             0B      1.05T
mirzpool/base-203-disk-0                        6.05T  78.0G        8K   12.0G          66.0G         0B
mirzpool/mirzdata                               5.99T   897G        0B    897G             0B         0B
mirzpool/subvol-901-disk-0                      31.1G   917M        0B    917M             0B         0B
mirzpool/vm-210-disk-0                          6.04T  66.0G        0B   15.0G          51.0G         0B
mirzpool/vm-210-disk-1                          5.99T     3M        0B     68K          2.93M         0B
mirzpool/vm-221-disk-0                          5.99T  16.5G        0B   10.6G          5.94G         0B
mirzpool/vm-222-disk-0                          6.00T  16.5G        0B   6.84G          9.67G         0B
mirzpool/vm-224-disk-0                          5.99T  2.06G        0B     56K          2.06G         0B
rpool                                           35.0G   411G        0B    104K             0B       411G
rpool/ROOT                                      35.0G  34.7G        0B     96K             0B      34.7G
rpool/ROOT/pve-1                                35.0G  34.7G        0B   34.7G             0B         0B
rpool/data                                      35.0G   376G        0B    144K             0B       376G
rpool/data/base-203-disk-0                      35.0G  14.1G        8K   14.1G             0B         0B
rpool/data/base-203-disk-1                      35.0G   200K        8K    192K             0B         0B
rpool/data/base-204-disk-0                      35.0G   756M        8K    756M             0B         0B
rpool/data/base-204-disk-1                      35.0G     8K        0B      8K             0B         0B
rpool/data/base-205-disk-0                      35.0G     0B        0B      0B             0B         0B
rpool/data/base-205-disk-1                      35.0G  5.24G        8K   5.24G             0B         0B
rpool/data/base-207-disk-0                      35.0G  7.24G        8K   7.24G             0B         0B
rpool/data/base-207-disk-1                      35.0G     0B        0B      0B             0B         0B
rpool/data/base-223-disk-0                      35.0G    56K        0B     56K             0B         0B
rpool/data/subvol-100-disk-0                    7.06G   963M        0B    963M             0B         0B
rpool/data/subvol-101-disk-0                    9.62G  25.3G     2.89G   22.4G             0B         0B
rpool/data/subvol-103-disk-0                    7.08G  1.01G     94.1M    941M             0B         0B
rpool/data/subvol-105-disk-0                    7.17G   852M        0B    852M             0B         0B
rpool/data/subvol-105-disk-1                    7.16G   861M        0B    861M             0B         0B
rpool/data/subvol-105-disk-2                    7.17G   852M        0B    852M             0B         0B
rpool/data/subvol-105-disk-3                    6.86G  1.14G        0B   1.14G             0B         0B
rpool/data/subvol-106-disk-0                    30.3G  1.70G        0B   1.70G             0B         0B
rpool/data/subvol-107-disk-0                    29.8G  2.24G        0B   2.24G             0B         0B
rpool/data/vm-102-disk-0                        35.0G  1.43G      320M   1.12G             0B         0B
rpool/data/vm-201-disk-0                        35.0G  82.7G     42.0G   40.7G             0B         0B
rpool/data/vm-201-disk-1                        35.0G  1.07M        8K   1.06M             0B         0B
rpool/data/vm-202-disk-0                        35.0G    56K        0B     56K             0B         0B
rpool/data/vm-206-disk-0                        35.0G  50.3G     35.1G   15.2G             0B         0B
rpool/data/vm-206-disk-1                        35.0G     0B        0B      0B             0B         0B
rpool/data/vm-208-disk-0                        35.0G   200K        8K    192K             0B         0B
rpool/data/vm-208-disk-1                        35.0G  21.7G     2.28G   19.4G             0B         0B
rpool/data/vm-209-disk-0                        35.0G  1.92G      287M   1.64G             0B         0B
rpool/data/vm-209-disk-1                        35.0G     0B        0B      0B             0B         0B
rpool/data/vm-214-disk-0                        35.0G  1017M        0B   1017M             0B         0B
rpool/data/vm-216-disk-0                        35.0G  7.47G     1.25G   6.22G             0B         0B
rpool/data/vm-216-state-beforeDesktopSelection  35.0G   320M        0B    320M             0B         0B
rpool/data/vm-218-disk-0                        35.0G    56K        0B     56K             0B         0B
rpool/data/vm-219-disk-0                        35.0G  5.23G        0B   5.23G             0B         0B
rpool/data/vm-220-disk-0                        35.0G  6.14G        0B   6.14G             0B         0B
rpool/data/vm-225-disk-0                        35.0G  5.60G     1.26G   4.34G             0B         0B
rpool/data/vm-225-disk-1                        35.0G     0B        0B      0B             0B         0B
rpool/data/vm-225-state-suspend-2020-02-09      35.0G    56K        0B     56K             0B         0B
rpool/data/vm-226-disk-0                        35.0G  1.26G        0B   1.26G             0B         0B
rpool/data/vm-226-disk-1                        35.0G     0B        0B      0B             0B         0B
rpool/data/vm-227-disk-0                        35.0G  1.57G        0B   1.57G             0B         0B
rpool/data/vm-227-disk-1                        35.0G  1.06M        0B   1.06M             0B         0B
rpool/data/vm-228-disk-0                        35.0G   192K        0B    192K             0B         0B
rpool/data/vm-228-disk-1                        35.0G  37.0G        0B   37.0G             0B         0B
rpool/data/vm-229-disk-0                        35.0G  37.0G        0B   37.0G             0B         0B
rpool/data/vm-229-disk-1                        35.0G   192K        0B    192K             0B         0B
rpool/data/vm-230-disk-0                        35.0G     0B        0B      0B             0B         0B
rpool/data/vm-230-disk-1                        35.0G  30.2G     1.58G   28.6G             0B         0B
rpool/data/vm-251-disk-0                        35.0G   192K        0B    192K             0B         0B
rpool/data/vm-251-disk-1                        35.0G  22.6G        0B   22.6G             0B         0B
 
(post part#2)

In another old thread in this forum I read something about snapshots in zfs taking up space like LVM, but I am not sure what to make of it. Some of the output of zfs list -t snapshot :
Code:
NAME                                                              USED  AVAIL     REFER  MOUNTPOINT
mirzpool/base-203-disk-0@__base__                                   8K      -     12.0G  -
rpool/data/base-203-disk-0@__base__                                 8K      -     14.1G  -
rpool/data/base-203-disk-1@__base__                                 8K      -      192K  -
rpool/data/base-204-disk-0@__base__                                 8K      -     14.2G  -
rpool/data/base-204-disk-1@__base__                                 0B      -      192K  -
rpool/data/base-205-disk-0@__base__                                 0B      -      192K  -
rpool/data/base-205-disk-1@__base__                                 8K      -     17.7G  -
rpool/data/base-207-disk-0@__base__                                 8K      -     15.5G  -
rpool/data/base-207-disk-1@__base__                                 0B      -      192K  -
rpool/data/base-223-disk-0@__base__                                 0B      -       56K  -

rpool/data/vm-230-disk-0@beforeTry1                                 0B      -      192K  -
rpool/data/vm-230-disk-0@updatesAndCIV6installed                    0B      -      192K  -
rpool/data/vm-230-disk-0@beforeGPUpassthroughandTestrun             0B      -      192K  -
rpool/data/vm-230-disk-0@usbPassedThrough_beforeGPU                 0B      -      192K  -
rpool/data/vm-230-disk-0@lastSnapBeforeGPUtry1                      0B      -      192K  -
rpool/data/vm-230-disk-0@afterGPUandNvidiaGamestreamSteam           0B      -      192K  -
rpool/data/vm-230-disk-1@beforeTry1                                 8K      -     15.5G  -
rpool/data/vm-230-disk-1@updatesAndCIV6installed                  960M      -     36.0G  -
rpool/data/vm-230-disk-1@beforeGPUpassthroughandTestrun             0B      -     35.2G  -
rpool/data/vm-230-disk-1@usbPassedThrough_beforeGPU                 0B      -     35.2G  -
rpool/data/vm-230-disk-1@lastSnapBeforeGPUtry1                      0B      -     35.2G  -
rpool/data/vm-230-disk-1@afterGPUandNvidiaGamestreamSteam           8K      -     41.0G  -

What does the REFER column mean exactly?

How do I fix my problem of excessive local-zfs usage?

(I would prefer to be able to use the snapshot feature of PVE on my local-zfs VMs and plan to use the backup feature to back them up to my usb zpool mirzpool once the usage problem is resolved, but first priority is to get the local-zfs usage down again to render it usable for new VMs again and to not fear breaking my setup in the near future.)

Thank you for any help with this issue!
 
All the ZVOLs seem to be still there. You might have deleted the VMs, but it seems not their disks.
See everything below the line
Code:
rpool                                           35.0G   411G        0B    104K             0B       411G
That is actually taking up the space. Have not done the maths but I think that this sums up (even small elements make a great amount, if there are enough of them.
Overprovisioning is just a dangerous thing. Once the guests are writing to the disk, you might end up using it all...
 
  • Like
Reactions: Bernie2020
All the ZVOLs seem to be still there. You might have deleted the VMs, but it seems not their disks.
See everything below the line
Code:
rpool                                           35.0G   411G        0B    104K             0B       411G
That is actually taking up the space. Have not done the maths but I think that this sums up (even small elements make a great amount, if there are enough of them.
Overprovisioning is just a dangerous thing. Once the guests are writing to the disk, you might end up using it all...

Thank you for your quick reply!

I use the "More>Remove" option of the PVE GUI to remove the VMs, e.g. VMID=111, 212, 213, ... like in attachment 1, and the local-zfs>Content shows no old vm disks (attachment 2).
I did use the "Purge" option, but am not sure what would (not) happen if I didn't.

EDIT:
I added up the USED values of zfs list -ospace and indeed, they add up to 375G.

Why is the win10 disc vm-201-disk-0 with a 64G scsi hard disk (local-zfs) 82.8G in size?
 

Attachments

  • destroy_purge.jpg
    destroy_purge.jpg
    17.5 KB · Views: 43
  • local-zfs_content.png
    local-zfs_content.png
    76 KB · Views: 41
Last edited:
Thank you for your quick reply!

I use the "More>Remove" option of the PVE GUI to remove the VMs, e.g. VMID=111, 212, 213, ... like in attachment 1, and the local-zfs>Content shows no old vm disks (attachment 2).
I did use the "Purge" option, but am not sure what would (not) happen if I didn't.

EDIT:
I added up the USED values of zfs list -ospace and indeed, they add up to 375G.

Why is the win10 disc vm-201-disk-0 with a 64G scsi hard disk (local-zfs) 82.8G in size?

I did clone VM #206 to create #201 without creating a template in between, could there be a lot of baggage from all the #206 setup snapshots carrying over to the new #201 VM? I cannot access snapshots of #206 in #201, only the ones I created during the further setup of #201 itself.

#201 currently has 18.2 GB free of 63.3 GB, with #206 it is similar. What is causing 82.8G of used for #201?
Is there a way to make that 82.8G to the ~35G the guest actually uses? (or the ~35G instead of the 50.3G the #206 uses)

I also do not understand why the AVAIL column for all discs (even uefi disks) exclusively shows 35.0G. I assumed all VMs had the maximum of 64GB I assigned during the creation of the original VM, and AVAIL showed how much of that 64G is available, i.e. 18.2 of 64 in case of#201.

Sorry if I am being a misinformed idiot here, I did my best trying to find out what obvious thing I am missing but can't seem to figure it out.
 
in your output, you have three columns:
  • USED: total used of this dataset, including snapshots and children
  • USEDSNAP: space accounted to all snapshots (i.e., data ONLY referenced by snapshots, not by the current state)
  • USEDDS: space accounted to the dataset itself (i.e., data written since the last snapshot)
the accounting can be a bit wonky for zvols with regards to data only in the dataset and the most recent snapshot, but that's the gist. so for your disk of VM 201, you have about the same amount of data in snapshots and in the dataset itself. the usage will go down if you delete (enough) snapshots. note that ZFS can only give you a limited view on how much data each snapshot contains, since data is usually referenced by multiple snapshots, and only freed once every reference is gone.
 
  • Like
Reactions: Bernie2020
(sorry for the delayed reply, I was away travelling)

in your output, you have three columns:
  • USED: total used of this dataset, including snapshots and children
  • USEDSNAP: space accounted to all snapshots (i.e., data ONLY referenced by snapshots, not by the current state)
  • USEDDS: space accounted to the dataset itself (i.e., data written since the last snapshot)
the accounting can be a bit wonky for zvols with regards to data only in the dataset and the most recent snapshot, but that's the gist. so for your disk of VM 201, you have about the same amount of data in snapshots and in the dataset itself. the usage will go down if you delete (enough) snapshots. note that ZFS can only give you a limited view on how much data each snapshot contains, since data is usually referenced by multiple snapshots, and only freed once every reference is gone.

That cleared up just about everything, thank you very much for the concise help! :D

I have resorted to cloning the important snapshots or states and removed the rest; the proxmox setup works splendidly again.

Though I do not quite understand how a few of the machines diverged so much from the previous states just by installing a few drivers and small programs, I assume it had to do with Windows shuffling around too much data during updates and light everyday usage, as tburger kindly alluded to with
Once the guests are writing to the disk, you might end up using it all...

Thank you once again for your outstanding support, I really appreciate it!

I hope all of you have a good weekend! :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!