VM disk takes more disk after migrating it to LVM-thin storage

Sagi · Feb 22, 2024

I am migrating a few VMs from my old server with PROXMOX-7.4 and Directory storage to my new server with PROXMOX-8.1 which uses LVM-thin storage.
I have them both on the same cluster and trying to use the "live-migration" feature.
The VMs have 40GB assigned to them, but they use only about 10GB of it, so I would expect them to take only 10GB of space after moving to the LVM-thin storage. But instead, they act as thick provisioning and take 40GB out of the new storage.

More info-
- My VMs are originally in qcow2 format and after migration, they switch to a raw format
- The VMs have an encrypted disk

What I already tried-
- Running `fstrim --all` on the guests and the host.
- Converting the VMs to "raw" format before migrating
- Activating the "Discard" option on the hardware menu
- Rebooting the guests and the host

Still, I can see on my PROXMOX UI that the VMs are taking 40GB (100% of the assigned disk) on the LVM-thin storage.

What can I do? I don't want to lose the advantages of thin-provisioning

Sagi · Feb 22, 2024

I found a partial solution-
I migrated the VM while its shutdown to the Directory storage ("local") of the new node, and then used Move Disk to move it to the LVM-Thin ("local-lvm") storage of the new node.
When doing this, the VM keeps using thin-provisioning.

Here is an output of `lvs` command from my host:

Code:

root@eco628:~# lvs
  LV            VG  Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data          pve twi-aotz-- <6.84t             1.76   0.47                           
  root          pve -wi-ao---- 96.00g                                                   
  swap          pve -wi-ao----  8.00g                                                   
  vm-112-disk-0 pve Vwi-a-tz-- 40.00g data        8.51                                   
  vm-113-disk-0 pve Vwi-aotz-- 40.00g data        100.00

- vm-112-disk: Was migrated using the above method.
- vm-113-disk-0: Was migrated while it's ON using live-migration.

Both VMs has 40GB assigned to them, but only use about 10GB.

The issue is that for my other VMs on production environment, I can't allow downtime and therefore can't afford to shut down the VMs, so this solution doesn't help me.
Any ideas on what can I do?

LnxBil · Feb 22, 2024

Sagi said:
Any ideas on what can I do?

Can you check hat disk controller is used in both VMs?

bbgeek17 · Feb 22, 2024

I suspect encryption has something to do with it. I'd have to check in code, or you can do it, but likely methods of reading/transferring the data are different between "local" and "remote", and "same type" and "different type".

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Sagi · Feb 25, 2024

LnxBil said:
Can you check hat disk controller is used in both VMs?

I'll check with my hardware guy and update you.
But I think both disks have a disk controller

Sagi · Feb 25, 2024

bbgeek17 said:
I suspect encryption has something to do with it. I'd have to check in code, or you can do it

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

I'll check that today by trying to migrate a VM without encryption and let you know.

bbgeek17 said:
but likely methods of reading/transferring the data are different between "local" and "remote", and "same type" and "different type".

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Not sure what you meant here. Can you please explain?

Sagi · Feb 25, 2024

Sagi said:
I'll check that today by trying to migrate a VM without encryption and let you know.

Ok, so I tested with an unencrypted VM and got the same result. So this isn't the reason for my issue

sg90 · Feb 27, 2024

If you enable discard and then run a fstrim on the VM after it’s enabled and you have applied it to the VM it will clear down the space.

This is a limitation of QEMU live migration

Kingneutron · Feb 27, 2024

As ^^ sg90 said. Also in VM Options, Qemu guest agent, you might check ' Run guest-trim after a disk move or VM migration ' if the guest agent is installed, it should do it auto.

LnxBil · Feb 27, 2024

Sagi said:
I'll check with my hardware guy and update you.
But I think both disks have a disk controller

I meant the VM disk controller. Not every controller does support trim.

Sagi · Feb 28, 2024

LnxBil said:
I meant the VM disk controller. Not every controller does support trim.

Is this what you mean?
If so, then yes. I got the VirtIO SCSI for all my VMs

Sagi · Feb 28, 2024

sg90 said:
If you enable discard and then run a fstrim on the VM after it’s enabled and you have applied it to the VM it will clear down the space.

This is a limitation of QEMU live migration

Kingneutron said:
As ^^ sg90 said. Also in VM Options, Qemu guest agent, you might check ' Run guest-trim after a disk move or VM migration ' if the guest agent is installed, it should do it auto.

I don't have guest agents configured on my VMs.
But as I said, I ran `fstrim -av` on them and nothing changed.

As far as I know, the "Run guest-trim after a disk move or VM migration" and "discard" options are just automatic ways to run `fstrim`. So I don't see how this will help if my manual usage hasn't made any difference.

LnxBil · Feb 29, 2024

Sagi said:
But as I said, I ran `fstrim -av` on them and nothing changed.

That does not mean it worked. Please post the output of the program after running it. If there is no output, it didn't work. Please check also if the discard-Flag is enabled on all virtual disks (need stop and start of the VM, not reboot).

Sagi · Feb 29, 2024

LnxBil said:
That does not mean it worked. Please post the output of the program after running it. If there is no output, it didn't work. Please check also if the discard-Flag is enabled on all virtual disks (need stop and start of the VM, not reboot).

Here is the output:

Code:

fstrim -av
/var: 178.3 GiB (191399063552 bytes) trimmed on /dev/mapper/vg1-lv_var
/tmp: 257.1 MiB (269615104 bytes) trimmed on /dev/mapper/vg1-lv_tmp
/boot: 0 B (0 bytes) trimmed on /dev/sda1
/: 0 B (0 bytes) trimmed on /dev/mapper/vg1-lv_root

When running "lvs" on the host, I still see the LV with 100% data usage (full), even after running the trim. Also, no change in the PROXMOX storage UI.

I just enabled the "Discard" flag on all VMs, then stopped&started them. Still, 100% usage of the disk.
Does the "Discard" have any meaning if I don't have a guest agent installed on my VM anyway?

LnxBil · Feb 29, 2024

Sagi said:
Here is the output:

Code:

fstrim -av /var: 178.3 GiB (191399063552 bytes) trimmed on /dev/mapper/vg1-lv_var /tmp: 257.1 MiB (269615104 bytes) trimmed on /dev/mapper/vg1-lv_tmp /boot: 0 B (0 bytes) trimmed on /dev/sda1 /: 0 B (0 bytes) trimmed on /dev/mapper/vg1-lv_root

When running "lvs" on the host, I still see the LV with 100% data usage (full), even after running the trim. Also, no change in the PROXMOX storage UI.

I just enabled the "Discard" flag on all VMs, then stopped&started them. Still, 100% usage of the disk.

That looks ok, yet sad that it's not trimmed. Have you tried a backup and a restore?

Sagi said:
Does the "Discard" have any meaning if I don't have a guest agent installed on my VM anyway?

No, that are two different things.

Sagi · Mar 3, 2024

LnxBil said:
That looks ok, yet sad that it's not trimmed. Have you tried a backup and a restore?

It worked!
So I created a backup (snapshot mode) to my "local" (Directory type) storage and then restored the VM, and now it doesn't take 100% of the LV.
At first, the VM took 97% of the LV, but then I ran "fstrim" and got it down to 17%.

The only issue is that I can't do that in my production server, since I don't have a big enough extra storage to create backups to every VM. I only got 1 disk with 2 storage configured- "local" (Directory type) with 100GB, and "local-lvm" (LVM-thin type) with 4TB.
Because LVM-thin storage can't store backups, I can only store them (even for a moment) in the "local" storage, which is just 100GB.

Any creative solution I can do here?

Thanks!

LnxBil · Mar 4, 2024

Do you still have an environment, in which you can test the migration or is the production the last VM that needs to be migrated?

sg90 · Mar 4, 2024

You need to have the discard flag set on the disk before you run the fstrim command.

If you don't have it enabled the fstrim command will run but won't actually be passed through to the underlying disk.

You need to make sure discard is enabled, VM stopped and started and then fstirm enable.

The guest agent just allows Promox to pass through the fstrim command automatically.

Sagi · Mar 4, 2024

LnxBil said:
Do you still have an environment, in which you can test the migration or is the production the last VM that needs to be migrated?

I still have a testing environment available to reproduce this case.

sg90 said:
You need to have the discard flag set on the disk before you run the fstrim command.

If you don't have it enabled the fstrim command will run but won't actually be passed through to the underlying disk.

You need to make sure discard is enabled, VM stopped and started and then fstirm enable.

The guest agent just allows Promox to pass through the fstrim command automatically.

I tried that. But that didn't help.
I was still left with the same 100% data usage.

VictorSTS · Mar 4, 2024

Sagi said:
The only issue is that I can't do that in my production server, since I don't have a big enough extra storage to create backups to every VM. I only got 1 disk with 2 storage configured- "local" (Directory type) with 100GB, and "local-lvm" (LVM-thin type) with 4TB.
Because LVM-thin storage can't store backups, I can only store them (even for a moment) in the "local" storage, which is just 100GB.

Any creative solution I can do here?

Thanks!

Your problem may be that your filesystem refuses to trim already trimmed sectors [1]. Given that you have enabled discard on the drive and using VirtIO SCSI [single] as the controller, If you reboot the VM and then run fstrim -v again if should discard the free the space.

If you can't afford any downtime, ask the filesystem politely to please trim the free sectors

Code:

#10G file
dd if=/dev/zero of=/zerofile1 bs=1M count=10000
rm /zerofile1
fstrim -v /

Adapt paths an sizes to your use case. You could also not set a size, but dd will fill the disk and running apps may suffer.

BTW, you can manually create an LV, format it with i.e. ext4, mount it in some path and add such path as a directory storage for backup content. That may help you with this migration. Once done, remove the storage from PVE and delete the LV.

[1] https://unix.stackexchange.com/ques...im-data-blocks-on-btrfs-ecrypts/371665#371665

VM disk takes more disk after migrating it to LVM-thin storage

New Member

New Member

Distinguished Member

Distinguished Member

New Member

New Member

New Member

Renowned Member

Renowned Member

Distinguished Member

New Member

New Member

Distinguished Member

New Member

Distinguished Member

New Member

Distinguished Member

Renowned Member

New Member

Famous Member

We value your privacy