Reclaiming free space in VMs and Containers with CEPH RBD images

Gert

Member
Jul 27, 2015
16
1
23
45
Centurion, South Africa
www.huge.co.za
I used to be able to reclaim free space of VM's when I was still using ZFS, but with RBD images on Ceph it does not seem to work properly, I manage to reclaim some of the free space but not all. I have the same issue with VM's and Containers. I have discard enabled and the follow the following steps:

First I fill the container's free space with zeros using the following command:

dd if=/dev/zero of=/tmp/bigfile bs=1M; rm -f /tmp/bigfile

Then I do:

fstrim -v /

and the output of that shows:

/: 66.7 GiB (71601565696 bytes) trimmed

and then "df -h" shows:

Filesystem Size Used Avail Use% Mounted on
/dev/rbd0 196G 99G 89G 53% /

But on proxmox cli "rbd du --pool ceph-ssd | grep 105" shows:

vm-105-disk-0 200 GiB 139 GiB


Why do I have 139GB used when only 99G is used in the Container? Am I doing something wrong? It works on images stored on ZFS.
 
On what version are you pveversion -v? And 99 + 67 != 139, either. The release of the data might be just delayed on Ceph. Did you check after some time if the size still shrank?
 
You mean 99 + 67 != 200? After filling the image with zeros the image size reported back from RBD was maxed out at 200GB, then while running fstrim it came down in realtime to 139, it is now days later and still its in 139.

"pveversion -v"

root@node1:~# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-5-pve)
pve-manager: 6.0-11 (running version: 6.0-11/2140ef37)
pve-kernel-helper: 6.0-12
pve-kernel-5.0: 6.0-11
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.21-2-pve: 5.0.21-7
pve-kernel-5.0.21-1-pve: 5.0.21-2
pve-kernel-5.0.18-1-pve: 5.0.18-3
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.4-pve1
ceph-fuse: 14.2.4-pve1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-3
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-7
libpve-guest-common-perl: 3.0-2
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-8
pve-cluster: 6.0-7
pve-container: 3.0-10
pve-docs: 6.0-8
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-4
pve-ha-manager: 3.0-3
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-13
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2
 
You mean 99 + 67 != 200? After filling the image with zeros the image size reported back from RBD was maxed out at 200GB, then while running fstrim it came down in realtime to 139, it is now days later and still its in 139.
Either way. 67 + 139 = 206, therefore Ceph seems to have discarded most of the data. Ceph may not be able to remove all the objects, as they may contain data blocks still in use by the image.
 
So it should help if I defrag the file system first?
Ext4, tries not to fragment data allocation. Defrag will not have the expected effect. Only a export/import will probably get rid of the remaining 40 GB. As all objects will be deleted and recreated.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!