Used space increases when move VM between CEPH pools

SOLTECSIS - Carles Munyoz · May 9, 2020

Hi,
I have a three nodes Proxmox cluster with Ceph storage using NVMe disks.

I have observed that when I restore a virtual machine from a previos backup, its virtual disk used space is different from the provisioned one:
# rbd du -p cephVM vm-105-disk-0
NAME PROVISIONED USED
vm-105-disk-0 32 GiB 14 GiB

But if I move this VM from its current Ceph pool (with the virtual machine on) to other Ceph pool, the used space equals the provisioned one:
# rbd du -p test vm-105-disk-0
NAME PROVISIONED USED
vm-105-disk-0 32 GiB 32 GiB

If repeat the procedure but now with the virtual machine stopped, its used space is nearly the provisioned one, but is not the same that was before the pool migration:
# rbd du -p test vm-105-disk-0
NAME PROVISIONED USED
vm-105-disk-0 32 GiB 30 GiB

If I make a backup of the virtual machine and restore it over any Ceph pool, the used space is the same that I had before the Ceph pool migration:
# rbd du -p cephVM vm-105-disk-0
NAME PROVISIONED USED
vm-105-disk-0 32 GiB 14 GiB

Why is happening this?
Why when I move a virtual disk from an alive virtual machine between Ceph pools the used space equals the provisioned one?
Is it possible avoid it?

I think that is better that the used space remains the same when I move disk between Ceph pools for avoid unexpected surprises like that the Ceph storage gets full after a pool migration.

Best regards.
========================================
SOLTECSIS SOLUCIONES TECNOLOGICAS, S.L.
Carles Xavier Munyoz Baldó
Departamento de I+D+I
Tel./Fax: 966 446 046
cmunyoz@soltecsis.com
www.soltecsis.com
========================================

Alwin · May 11, 2020

Moving disks is creates a full clone (including empty space). A TRIM (eg. discard) needs to be triggered inside the VM to free the unused space again.
https://pve.proxmox.com/pve-docs/chapter-qm.html#qm_hard_disk_discard

SOLTECSIS - Carles Munyoz · May 12, 2020

Is it possible to avoid cloning empty space when moving virtual disks between ceph pools?

I understand that if it is possible avoid it when restoring from backup, it should be possible to avoid it when migrating virtual disks between ceph pools.

Keep in mind that if the VM donen't supports trimming the only solution to the problem is stop the VM, make a backup and then restore it.

Alwin · May 13, 2020

SOLTECSIS - Carles Munyoz said:
Keep in mind that if the VM donen't supports trimming the only solution to the problem is stop the VM, make a backup and then restore it.

This will be an issue sooner or later. Since blocks that have been written will never be released. Even if the data doesn't exist anymore.

SOLTECSIS - Carles Munyoz said:
I understand that if it is possible avoid it when restoring from backup, it should be possible to avoid it when migrating virtual disks between ceph pools.

Backups are written spare, but it still will write blocks where data was deleted but not released yet. A TRIM is also in this case needed.

SOLTECSIS - Carles Munyoz said:
Is it possible to avoid cloning empty space when moving virtual disks between ceph pools?

No with move disk. A manual TRIM is needed afterwards.

SOLTECSIS - Carles Munyoz · May 14, 2020

Ok, thank you for your answers.
Best regards.

SOLTECSIS - Carles Munyoz · May 14, 2020

I have made more tests and the TRIM inside the VM is not enough for free the unused space.
I'm going to explain it in detail ...

This is the ceph virtual disk usage before moving it between ceph pools:
# rbd du -p cephVM vm-301-disk-0
NAME PROVISIONED USED
vm-301-disk-0 64 GiB 14 GiB

After moving the disk to a new cep pool I get:
# rbd du -p cephVM vm-301-disk-0
NAME PROVISIONED USED
vm-301-disk-0 64 GiB 64 GiB

If now I run fstrim into the VM:
# fstrim -av
/boot: 0 B (0 bytes) trimmed
/var: 1.2 GiB (1252638720 bytes) trimmed
/: 595.4 MiB (624328704 bytes) trimmed

And see again the ceph virtual disk usage:
# rbd du -p test vm-301-disk-0
NAME PROVISIONED USED
vm-301-disk-0 64 GiB 63 GiB

As you can see I have only recovered something more than one GB.

I have two solutions for restore the used space to its original value before the ceph pool migration.
(1) Backup and restore as I previously explained in the origin of this post.

(2) Stop de virtual machine, start it and from the guest operative system run the fstrim command again:
# fstrim -av
/boot: 821.2 MiB (861126656 bytes) trimmed
/var: 2.5 GiB (2648477696 bytes) trimmed
/: 49.3 GiB (52930826240 bytes) trimmed

Now it goes fine, I have the correct value in the USED column:
# rbd du -p cephVM vm-301-disk-0
NAME PROVISIONED USED
vm-301-disk-0 64 GiB 14 GiB

But I don't understand why I must restart the virtual machine for it.
May you explain it?

Alwin · May 14, 2020

How long did you wait in between the restart and the first TRIM command? The release of blocks may take some time.

SOLTECSIS - Carles Munyoz · May 14, 2020

One thine more ...
I have enabled the option "Run guest-trim after clone disk" expecting that with it, after moving the virtual disk between ceph pools, this will restore the used space but it has been useless.

SOLTECSIS - Carles Munyoz · May 14, 2020

The issue is that the first fstrim command (before shutting down the VM) reports only 1,7GB trimmed:
# fstrim -av
/boot: 0 B (0 bytes) trimmed
/var: 1.2 GiB (1252638720 bytes) trimmed
/: 595.4 MiB (624328704 bytes) trimmed

But the second one (after virtual machine restart) reports more than 50GB:
# fstrim -av
/boot: 821.2 MiB (861126656 bytes) trimmed
/var: 2.5 GiB (2648477696 bytes) trimmed
/: 49.3 GiB (52930826240 bytes) trimmed

Alwin · May 14, 2020

SOLTECSIS - Carles Munyoz said:
I have enabled the option "Run guest-trim after clone disk" expecting that with it, after moving the virtual disk between ceph pools, this will restore the used space but it has been useless.

This only works if the qemu-guest-agent is installed und running inside the VM.

SOLTECSIS - Carles Munyoz said:
The issue is that the first fstrim command (before shutting down the VM) reports only 1,7GB trimmed:

This sound like an issue inside the VM. I couldn't reproduce this on a Proxmox VE 6.2 cluster with a debian buster VM.

man fstrim

-v, --verbose
Verbose execution. With this option fstrim will output the number of bytes passed from the filesystem down the block stack to the device for potential discard. This number is a maximum discard amount from the storage device's perspective, because FITRIM ioctl called repeated will keep sending the same sectors for discard repeatedly.

fstrim will report the same potential discard bytes each time, but only sectors which had been written to between the discards would actually be discarded by the storage device. Further, the kernel block layer reserves the right to adjust the discard ranges to fit raid stripe geometry, non-trim capable devices in a LVM setup, etc. These reductions would not be reflected in fstrim_range.len (the --length option).

Seems there will always be a difference though.

SOLTECSIS - Carles Munyoz · May 14, 2020

qemu-guest-agent is installed and running:
# ps axww|grep agen
1240 ? Ss 0:02 /usr/sbin/qemu-ga --daemonize -m virtio-serial -p /dev/virtio-ports/org.qemu.guest_agent.0

But after the VM ceph pool migration nothing happens.

Alwin · May 15, 2020

SOLTECSIS - Carles Munyoz said:
But after the VM ceph pool migration nothing happens.

Ah, I didn't clarify that this is triggered on cloning not on disk move.

Search

Search

Used space increases when move VM between CEPH pools

SOLTECSIS - Carles Munyoz

Well-Known Member

Alwin

Proxmox Retired Staff

SOLTECSIS - Carles Munyoz

Well-Known Member

Alwin

Proxmox Retired Staff

SOLTECSIS - Carles Munyoz

Well-Known Member

SOLTECSIS - Carles Munyoz

Well-Known Member

Alwin

Proxmox Retired Staff

SOLTECSIS - Carles Munyoz

Well-Known Member

SOLTECSIS - Carles Munyoz

Well-Known Member

Alwin

Proxmox Retired Staff

SOLTECSIS - Carles Munyoz

Well-Known Member

Alwin

Proxmox Retired Staff

We value your privacy