Problem with trim/discard on Ceph storage

GabrieleV

Renowned Member
May 20, 2013
56
6
73
Hello,
I have problems with various VMS that seems do not release unused space.

Here is the VM Config:

agent: 1,fstrim_cloned_disks=1
boot: order=ide2;scsi0
cores: 2
ide2: none,media=cdrom
memory: 4096
meta: creation-qemu=7.2.0,ctime=1711617679
name: CRODC05
net0: virtio=22:9F:25:F3:22:96,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
scsi0: CRO-CEPH-SSD:vm-105-disk-0,aio=threads,cache=writeback,discard=on,iothread=1,size=32G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=fe50e879-9b53-4bb0-aa61-78c583d98000
sockets: 1
vmgenid: 1f92ae2c-0125-4cf8-a3c4-5f72e423a433
Here is the fstab:
# cat /etc/fstab | grep ext4
/dev/mapper/sys-root / ext4 discard,errors=remount-ro 0 1
UUID=5a77d9ea-9fc8-4ee2-9391-3429084d0f3d /boot ext4 discard,defaults 0 2

Here is the LVM config:
# pvs
PV VG Fmt Attr PSize PFree
/dev/sda2 sys lvm2 a-- <31.07g <16.17g

# vgs
VG #PV #LV #SN Attr VSize VFree
sys 1 1 0 wz--n- <31.07g <16.17g

# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
root sys -wi-ao---- <14.90g

# cat /etc/lvm/lvm.conf | grep issue_discards
# Configuration option devices/issue_discards.
issue_discards = 1


Here is the Filesystem free disk space:
# df -h -t ext4
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/sys-root 15G 3.2G 11G 23% /
/dev/sda1 920M 108M 749M 13% /boot

I've also issued fstrim manually after poweroff/poweron the VM:

# fstrim -av
/boot: 812.6 MiB (852033536 bytes) trimmed on /dev/sda1
/: 11.4 GiB (12247777280 bytes) trimmed on /dev/mapper/sys-root

# fstrim -av
/boot: 0 B (0 bytes) trimmed on /dev/sda1
/: 0 B (0 bytes) trimmed on /dev/mapper/sys-root

But when I investigate du on ceph, it syas that the VM disk is the full 32GB hdd size:
# rbd du -p rbd_ssd vm-105-disk-0
NAME PROVISIONED USED
vm-105-disk-0 32 GiB 32 GiB

What am I missing ?
 
What am I missing ?

I have really no idea, sorry. So this is just to give another random example, showing "thin"-behavior is the default for my installation: on my replication_rule pool I can see:

Code:
~# rbd du  -p ceph1   vm-2223-disk-0
NAME                                PROVISIONED  USED   
vm-2223-disk-0@auto-d-241104080715       32 GiB   23 GiB
vm-2223-disk-0@auto-d-241105080717       32 GiB  3.3 GiB
vm-2223-disk-0@auto-h-241105161000       32 GiB  1.8 GiB
vm-2223-disk-0@auto-h-241105170946       32 GiB  516 MiB
vm-2223-disk-0@auto-h-241105180953       32 GiB  544 MiB
vm-2223-disk-0                           32 GiB  392 MiB
<TOTAL>                                  32 GiB   30 GiB
 
The other strange thing is that on my HDD pool thin provisioning seems working:
Code:
# rbd du -p rbd
NAME                      PROVISIONED  USED   
base-501-disk-1@__base__       80 GiB  1.6 GiB
base-501-disk-1                80 GiB      0 B
vm-104-disk-1                  80 GiB   77 GiB
vm-106-disk-0                  32 GiB   15 GiB
vm-106-disk-1                  80 GiB   80 GiB
vm-110-disk-0                  32 GiB   30 GiB
vm-111-disk-0                  32 GiB   32 GiB
vm-111-disk-1                  80 GiB  5.6 GiB
vm-201-disk-1                   8 GiB  7.8 GiB
vm-205-disk-1                  80 GiB   35 GiB
vm-206-disk-2                 950 GiB  948 GiB
vm-301-disk-0                 512 GiB  508 GiB
vm-301-disk-1                  80 GiB   66 GiB
vm-301-disk-2                 512 GiB  508 GiB
vm-301-disk-3                 512 GiB  508 GiB
vm-301-disk-4                 512 GiB  508 GiB
vm-302-disk-0                  32 GiB  4.2 GiB
vm-302-disk-1                   1 TiB  178 GiB
vm-303-disk-1                  80 GiB   27 GiB
vm-304-disk-1                 320 GiB  320 GiB
vm-607-disk-0                  32 GiB   22 GiB
<TOTAL>                       5.0 TiB  3.8 TiB

But on the SSD pool, not:
Code:
# rbd du -p rbd_ssd
NAME           PROVISIONED  USED   
vm-102-disk-0       32 GiB   32 GiB
vm-105-disk-0       32 GiB   32 GiB
vm-107-disk-0       64 GiB   64 GiB
vm-108-disk-0       74 GiB   74 GiB
vm-108-disk-1       10 GiB  9.8 GiB
vm-206-disk-0       80 GiB   80 GiB
vm-304-disk-0       32 GiB   32 GiB
vm-601-disk-0       65 GiB   65 GiB
vm-602-disk-0       65 GiB   65 GiB
vm-603-disk-0       65 GiB   65 GiB
vm-604-disk-0       65 GiB   65 GiB
vm-605-disk-0       65 GiB   65 GiB
vm-606-disk-0       65 GiB   65 GiB
<TOTAL>            714 GiB  713 GiB
 
What am I missing ?
Maybe fragmenting? I've observed a similar problem. AFAIK, Ceph uses an internal allocation unit of 4 MB so if at least one 4k block is written, the whole 4 MB block is written. I tested this on Windows and Linux and after defragmenting the filesystem - not only files, but also move the files to the front after shrinking the filesystem, I could reclaim a lot more space. Not as much as I could reclaim e.g. on ZFS (I tested by live migration), yet more than without it.

Yet, I don't know how this could be only in your SSD pool and not in your HDD pool. Have you tried migrating the VM between the pools (or even other storage types)?
 
Yet, I don't know how this could be only in your SSD pool and not in your HDD pool. Have you tried migrating the VM between the pools (or even other storage types)?
The problem seems to start when migrating the image to and from the SSD pool.
This is the image on the HDD pool:

Code:
# rbd du -p rbd vm-302-disk-0
NAME           PROVISIONED  USED   
vm-302-disk-0       32 GiB  4.2 GiB

It's ok. Now I move it to the SSD pool: note the ending lines after reaching 32GB, the full size of the image:

create full clone of drive scsi0 (CRO-CEPH:vm-302-disk-0)
/dev/rbd5
drive mirror is starting for drive-scsi0
drive-scsi0: transferred 4.0 MiB of 32.0 GiB (0.01%) in 0s
drive-scsi0: transferred 259.0 MiB of 32.0 GiB (0.79%) in 1s
drive-scsi0: transferred 521.0 MiB of 32.0 GiB (1.59%) in 2s
drive-scsi0: transferred 1.1 GiB of 32.0 GiB (3.33%) in 3s
drive-scsi0: transferred 1.1 GiB of 32.0 GiB (3.53%) in 4s
drive-scsi0: transferred 1.2 GiB of 32.0 GiB (3.70%) in 5s
drive-scsi0: transferred 1.2 GiB of 32.0 GiB (3.87%) in 6s
drive-scsi0: transferred 1.3 GiB of 32.0 GiB (4.18%) in 7s
drive-scsi0: transferred 1.4 GiB of 32.0 GiB (4.47%) in 8s
drive-scsi0: transferred 1.5 GiB of 32.0 GiB (4.66%) in 9s
drive-scsi0: transferred 1.6 GiB of 32.0 GiB (4.89%) in 10s
drive-scsi0: transferred 1.7 GiB of 32.0 GiB (5.19%) in 11s
drive-scsi0: transferred 1.7 GiB of 32.0 GiB (5.39%) in 12s
drive-scsi0: transferred 1.8 GiB of 32.0 GiB (5.55%) in 13s
drive-scsi0: transferred 1.9 GiB of 32.0 GiB (5.85%) in 14s
drive-scsi0: transferred 2.0 GiB of 32.0 GiB (6.17%) in 15s
drive-scsi0: transferred 2.1 GiB of 32.0 GiB (6.59%) in 16s
drive-scsi0: transferred 2.2 GiB of 32.0 GiB (6.83%) in 17s
drive-scsi0: transferred 2.3 GiB of 32.0 GiB (7.05%) in 18s
drive-scsi0: transferred 2.3 GiB of 32.0 GiB (7.24%) in 19s
drive-scsi0: transferred 2.4 GiB of 32.0 GiB (7.53%) in 20s
drive-scsi0: transferred 2.5 GiB of 32.0 GiB (7.85%) in 21s
drive-scsi0: transferred 2.6 GiB of 32.0 GiB (8.08%) in 22s
drive-scsi0: transferred 2.8 GiB of 32.0 GiB (8.89%) in 23s
drive-scsi0: transferred 2.9 GiB of 32.0 GiB (9.19%) in 24s
drive-scsi0: transferred 3.1 GiB of 32.0 GiB (9.61%) in 25s
drive-scsi0: transferred 3.2 GiB of 32.0 GiB (9.87%) in 26s
drive-scsi0: transferred 3.3 GiB of 32.0 GiB (10.41%) in 27s
drive-scsi0: transferred 3.6 GiB of 32.0 GiB (11.29%) in 28s
drive-scsi0: transferred 3.8 GiB of 32.0 GiB (11.85%) in 29s
drive-scsi0: transferred 5.0 GiB of 32.0 GiB (15.61%) in 30s
drive-scsi0: transferred 6.5 GiB of 32.0 GiB (20.33%) in 31s
drive-scsi0: transferred 7.1 GiB of 32.0 GiB (22.30%) in 32s
drive-scsi0: transferred 7.2 GiB of 32.0 GiB (22.37%) in 33s
drive-scsi0: transferred 7.2 GiB of 32.0 GiB (22.44%) in 34s
drive-scsi0: transferred 8.0 GiB of 32.0 GiB (24.86%) in 35s
drive-scsi0: transferred 9.8 GiB of 32.0 GiB (30.67%) in 36s
drive-scsi0: transferred 11.0 GiB of 32.0 GiB (34.39%) in 37s
drive-scsi0: transferred 11.1 GiB of 32.0 GiB (34.72%) in 38s
drive-scsi0: transferred 11.2 GiB of 32.0 GiB (34.89%) in 39s
drive-scsi0: transferred 11.2 GiB of 32.0 GiB (35.11%) in 40s
drive-scsi0: transferred 11.3 GiB of 32.0 GiB (35.26%) in 41s
drive-scsi0: transferred 11.4 GiB of 32.0 GiB (35.50%) in 42s
drive-scsi0: transferred 11.4 GiB of 32.0 GiB (35.60%) in 43s
drive-scsi0: transferred 11.4 GiB of 32.0 GiB (35.69%) in 44s
drive-scsi0: transferred 11.5 GiB of 32.0 GiB (35.84%) in 45s
drive-scsi0: transferred 11.5 GiB of 32.0 GiB (36.04%) in 46s
drive-scsi0: transferred 12.9 GiB of 32.0 GiB (40.44%) in 47s
drive-scsi0: transferred 13.3 GiB of 32.0 GiB (41.50%) in 48s
drive-scsi0: transferred 15.4 GiB of 32.0 GiB (48.12%) in 49s
drive-scsi0: transferred 16.5 GiB of 32.0 GiB (51.56%) in 50s
drive-scsi0: transferred 18.5 GiB of 32.0 GiB (57.87%) in 51s
drive-scsi0: transferred 20.5 GiB of 32.0 GiB (63.99%) in 52s
drive-scsi0: transferred 20.7 GiB of 32.0 GiB (64.83%) in 53s
drive-scsi0: transferred 20.9 GiB of 32.0 GiB (65.22%) in 54s
drive-scsi0: transferred 21.0 GiB of 32.0 GiB (65.72%) in 55s
drive-scsi0: transferred 21.2 GiB of 32.0 GiB (66.34%) in 56s
drive-scsi0: transferred 21.5 GiB of 32.0 GiB (67.17%) in 57s
drive-scsi0: transferred 21.8 GiB of 32.0 GiB (68.07%) in 58s
drive-scsi0: transferred 22.1 GiB of 32.0 GiB (69.02%) in 59s
drive-scsi0: transferred 22.4 GiB of 32.0 GiB (70.08%) in 1m
drive-scsi0: transferred 22.8 GiB of 32.0 GiB (71.15%) in 1m 1s
drive-scsi0: transferred 23.1 GiB of 32.0 GiB (72.21%) in 1m 2s
drive-scsi0: transferred 23.5 GiB of 32.0 GiB (73.36%) in 1m 3s
drive-scsi0: transferred 23.8 GiB of 32.0 GiB (74.51%) in 1m 4s
drive-scsi0: transferred 24.2 GiB of 32.0 GiB (75.70%) in 1m 5s
drive-scsi0: transferred 24.6 GiB of 32.0 GiB (77.01%) in 1m 6s
drive-scsi0: transferred 25.1 GiB of 32.0 GiB (78.31%) in 1m 7s
drive-scsi0: transferred 25.5 GiB of 32.0 GiB (79.58%) in 1m 8s
drive-scsi0: transferred 25.9 GiB of 32.0 GiB (80.89%) in 1m 9s
drive-scsi0: transferred 26.3 GiB of 32.0 GiB (82.21%) in 1m 10s
drive-scsi0: transferred 26.7 GiB of 32.0 GiB (83.52%) in 1m 11s
drive-scsi0: transferred 27.2 GiB of 32.0 GiB (84.88%) in 1m 12s
drive-scsi0: transferred 27.6 GiB of 32.0 GiB (86.21%) in 1m 13s
drive-scsi0: transferred 28.0 GiB of 32.0 GiB (87.55%) in 1m 14s
drive-scsi0: transferred 28.4 GiB of 32.0 GiB (88.85%) in 1m 15s
drive-scsi0: transferred 28.9 GiB of 32.0 GiB (90.22%) in 1m 16s
drive-scsi0: transferred 29.3 GiB of 32.0 GiB (91.59%) in 1m 17s
drive-scsi0: transferred 29.7 GiB of 32.0 GiB (92.90%) in 1m 18s
drive-scsi0: transferred 30.1 GiB of 32.0 GiB (94.14%) in 1m 19s
drive-scsi0: transferred 30.5 GiB of 32.0 GiB (95.40%) in 1m 20s
drive-scsi0: transferred 30.9 GiB of 32.0 GiB (96.62%) in 1m 21s
drive-scsi0: transferred 31.3 GiB of 32.0 GiB (97.91%) in 1m 22s
drive-scsi0: transferred 31.7 GiB of 32.0 GiB (99.15%) in 1m 23s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 24s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 25s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 26s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 27s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 28s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 29s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 30s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 31s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 32s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 33s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 34s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 35s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 36s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 37s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 38s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 39s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 40s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 41s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 42s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 43s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 44s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 45s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 46s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 47s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 48s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 49s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 50s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 51s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 52s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 53s
drive-scsi0: transferred 32.0 GiB of 32.0 GiB (100.00%) in 1m 54s, ready
all 'mirror' jobs are ready
drive-scsi0: Completing block job_id...
drive-scsi0: Completed successfully.
drive-scsi0: mirror-job finished
Removing image: 1% complete...
Removing image: 2% complete...
Removing image: 3% complete...
Removing image: 4% complete...
Removing image: 5% complete...
Removing image: 6% complete...
Removing image: 7% complete...
Removing image: 8% complete...
Removing image: 9% complete...
Removing image: 10% complete...
Removing image: 11% complete...
Removing image: 12% complete...
Removing image: 13% complete...
Removing image: 14% complete...
Removing image: 15% complete...
Removing image: 16% complete...
Removing image: 17% complete...
Removing image: 18% complete...
Removing image: 19% complete...
Removing image: 20% complete...
Removing image: 21% complete...
Removing image: 22% complete...
Removing image: 23% complete...
Removing image: 24% complete...
Removing image: 25% complete...
Removing image: 26% complete...
Removing image: 27% complete...
Removing image: 28% complete...
Removing image: 29% complete...
Removing image: 30% complete...
Removing image: 31% complete...
Removing image: 32% complete...
Removing image: 33% complete...
Removing image: 34% complete...
Removing image: 35% complete...
Removing image: 36% complete...
Removing image: 37% complete...
Removing image: 38% complete...
Removing image: 39% complete...
Removing image: 40% complete...
Removing image: 41% complete...
Removing image: 42% complete...
Removing image: 43% complete...
Removing image: 44% complete...
Removing image: 45% complete...
Removing image: 46% complete...
Removing image: 47% complete...
Removing image: 48% complete...
Removing image: 49% complete...
Removing image: 50% complete...
Removing image: 51% complete...
Removing image: 52% complete...
Removing image: 53% complete...
Removing image: 54% complete...
Removing image: 55% complete...
Removing image: 56% complete...
Removing image: 57% complete...
Removing image: 58% complete...
Removing image: 59% complete...
Removing image: 60% complete...
Removing image: 61% complete...
Removing image: 62% complete...
Removing image: 63% complete...
Removing image: 64% complete...
Removing image: 65% complete...
Removing image: 66% complete...
Removing image: 67% complete...
Removing image: 68% complete...
Removing image: 69% complete...
Removing image: 70% complete...
Removing image: 71% complete...
Removing image: 72% complete...
Removing image: 73% complete...
Removing image: 74% complete...
Removing image: 75% complete...
Removing image: 76% complete...
Removing image: 77% complete...
Removing image: 78% complete...
Removing image: 79% complete...
Removing image: 80% complete...
Removing image: 81% complete...
Removing image: 82% complete...
Removing image: 83% complete...
Removing image: 84% complete...
Removing image: 85% complete...
Removing image: 86% complete...
Removing image: 87% complete...
Removing image: 88% complete...
Removing image: 89% complete...
Removing image: 90% complete...
Removing image: 91% complete...
Removing image: 92% complete...
Removing image: 93% complete...
Removing image: 94% complete...
Removing image: 95% complete...
Removing image: 96% complete...
Removing image: 97% complete...
Removing image: 98% complete...
Removing image: 99% complete...
Removing image: 100% complete...done.
TASK OK

Now this is the image after moving to the SSD pool: boom ! All space used.
Code:
# rbd du -p rbd_ssd vm-302-disk-0
NAME           PROVISIONED  USED  
vm-302-disk-0       32 GiB  32 GiB

Let's go back to the HDD pool: same behaviour and kaboom ! Still all space stil used

Code:
# rbd du -p rbd vm-302-disk-0
NAME           PROVISIONED  USED 
vm-302-disk-0       32 GiB  32 GiB

No trimming works anymore...
 
That is really weird and not what I see on my pools. Have you tried trimming the disk afterwards manually? Does this free up the space again?
 
What about write a zero file? Something like this:

Code:
dd if=/dev/zero of=zero bs=64k status=progress; sync; sync; sync; rm -f zero; sync; sync; sync; fstrim -v .
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!