Used space increases when move VM between CEPH pools

Hi,
I have a three nodes Proxmox cluster with Ceph storage using NVMe disks.

I have observed that when I restore a virtual machine from a previos backup, its virtual disk used space is different from the provisioned one:
# rbd du -p cephVM vm-105-disk-0
NAME PROVISIONED USED
vm-105-disk-0 32 GiB 14 GiB

But if I move this VM from its current Ceph pool (with the virtual machine on) to other Ceph pool, the used space equals the provisioned one:
# rbd du -p test vm-105-disk-0
NAME PROVISIONED USED
vm-105-disk-0 32 GiB 32 GiB

If repeat the procedure but now with the virtual machine stopped, its used space is nearly the provisioned one, but is not the same that was before the pool migration:
# rbd du -p test vm-105-disk-0
NAME PROVISIONED USED
vm-105-disk-0 32 GiB 30 GiB

If I make a backup of the virtual machine and restore it over any Ceph pool, the used space is the same that I had before the Ceph pool migration:
# rbd du -p cephVM vm-105-disk-0
NAME PROVISIONED USED
vm-105-disk-0 32 GiB 14 GiB


Why is happening this?
Why when I move a virtual disk from an alive virtual machine between Ceph pools the used space equals the provisioned one?
Is it possible avoid it?

I think that is better that the used space remains the same when I move disk between Ceph pools for avoid unexpected surprises like that the Ceph storage gets full after a pool migration.

Best regards.
========================================
SOLTECSIS SOLUCIONES TECNOLOGICAS, S.L.
Carles Xavier Munyoz Baldó
Departamento de I+D+I
Tel./Fax: 966 446 046
cmunyoz@soltecsis.com
www.soltecsis.com
========================================
 
Is it possible to avoid cloning empty space when moving virtual disks between ceph pools?

I understand that if it is possible avoid it when restoring from backup, it should be possible to avoid it when migrating virtual disks between ceph pools.

Keep in mind that if the VM donen't supports trimming the only solution to the problem is stop the VM, make a backup and then restore it.
 
Keep in mind that if the VM donen't supports trimming the only solution to the problem is stop the VM, make a backup and then restore it.
This will be an issue sooner or later. Since blocks that have been written will never be released. Even if the data doesn't exist anymore.

I understand that if it is possible avoid it when restoring from backup, it should be possible to avoid it when migrating virtual disks between ceph pools.
Backups are written spare, but it still will write blocks where data was deleted but not released yet. A TRIM is also in this case needed.

Is it possible to avoid cloning empty space when moving virtual disks between ceph pools?
No with move disk. A manual TRIM is needed afterwards.
 
I have made more tests and the TRIM inside the VM is not enough for free the unused space.
I'm going to explain it in detail ...

This is the ceph virtual disk usage before moving it between ceph pools:
# rbd du -p cephVM vm-301-disk-0
NAME PROVISIONED USED
vm-301-disk-0 64 GiB 14 GiB

After moving the disk to a new cep pool I get:
# rbd du -p cephVM vm-301-disk-0
NAME PROVISIONED USED
vm-301-disk-0 64 GiB 64 GiB

If now I run fstrim into the VM:
# fstrim -av
/boot: 0 B (0 bytes) trimmed
/var: 1.2 GiB (1252638720 bytes) trimmed
/: 595.4 MiB (624328704 bytes) trimmed

And see again the ceph virtual disk usage:
# rbd du -p test vm-301-disk-0
NAME PROVISIONED USED
vm-301-disk-0 64 GiB 63 GiB

As you can see I have only recovered something more than one GB.

I have two solutions for restore the used space to its original value before the ceph pool migration.
(1) Backup and restore as I previously explained in the origin of this post.

(2) Stop de virtual machine, start it and from the guest operative system run the fstrim command again:
# fstrim -av
/boot: 821.2 MiB (861126656 bytes) trimmed
/var: 2.5 GiB (2648477696 bytes) trimmed
/: 49.3 GiB (52930826240 bytes) trimmed

Now it goes fine, I have the correct value in the USED column:
# rbd du -p cephVM vm-301-disk-0
NAME PROVISIONED USED
vm-301-disk-0 64 GiB 14 GiB

But I don't understand why I must restart the virtual machine for it.
May you explain it?
 
How long did you wait in between the restart and the first TRIM command? The release of blocks may take some time.
 
One thine more ...
I have enabled the option "Run guest-trim after clone disk" expecting that with it, after moving the virtual disk between ceph pools, this will restore the used space but it has been useless.
 
The issue is that the first fstrim command (before shutting down the VM) reports only 1,7GB trimmed:
# fstrim -av
/boot: 0 B (0 bytes) trimmed
/var: 1.2 GiB (1252638720 bytes) trimmed
/: 595.4 MiB (624328704 bytes) trimmed

But the second one (after virtual machine restart) reports more than 50GB:
# fstrim -av
/boot: 821.2 MiB (861126656 bytes) trimmed
/var: 2.5 GiB (2648477696 bytes) trimmed
/: 49.3 GiB (52930826240 bytes) trimmed
 
I have enabled the option "Run guest-trim after clone disk" expecting that with it, after moving the virtual disk between ceph pools, this will restore the used space but it has been useless.
This only works if the qemu-guest-agent is installed und running inside the VM.

The issue is that the first fstrim command (before shutting down the VM) reports only 1,7GB trimmed:
This sound like an issue inside the VM. I couldn't reproduce this on a Proxmox VE 6.2 cluster with a debian buster VM.

man fstrim
-v, --verbose
Verbose execution. With this option fstrim will output the number of bytes passed from the filesystem down the block stack to the device for potential discard. This number is a maximum discard amount from the storage device's perspective, because FITRIM ioctl called repeated will keep sending the same sectors for discard repeatedly.

fstrim will report the same potential discard bytes each time, but only sectors which had been written to between the discards would actually be discarded by the storage device. Further, the kernel block layer reserves the right to adjust the discard ranges to fit raid stripe geometry, non-trim capable devices in a LVM setup, etc. These reductions would not be reflected in fstrim_range.len (the --length option).
Seems there will always be a difference though.
 
qemu-guest-agent is installed and running:
# ps axww|grep agen
1240 ? Ss 0:02 /usr/sbin/qemu-ga --daemonize -m virtio-serial -p /dev/virtio-ports/org.qemu.guest_agent.0

But after the VM ceph pool migration nothing happens.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!