VM disk takes more disk after migrating it to LVM-thin storage

Sagi

New Member
Nov 6, 2023
15
0
1
I am migrating a few VMs from my old server with PROXMOX-7.4 and Directory storage to my new server with PROXMOX-8.1 which uses LVM-thin storage.
I have them both on the same cluster and trying to use the "live-migration" feature.
The VMs have 40GB assigned to them, but they use only about 10GB of it, so I would expect them to take only 10GB of space after moving to the LVM-thin storage. But instead, they act as thick provisioning and take 40GB out of the new storage.

More info-
- My VMs are originally in qcow2 format and after migration, they switch to a raw format
- The VMs have an encrypted disk

What I already tried-
- Running `fstrim --all` on the guests and the host.
- Converting the VMs to "raw" format before migrating
- Activating the "Discard" option on the hardware menu
- Rebooting the guests and the host

Still, I can see on my PROXMOX UI that the VMs are taking 40GB (100% of the assigned disk) on the LVM-thin storage.

What can I do? I don't want to lose the advantages of thin-provisioning
 
I found a partial solution-
I migrated the VM while its shutdown to the Directory storage ("local") of the new node, and then used Move Disk to move it to the LVM-Thin ("local-lvm") storage of the new node.
When doing this, the VM keeps using thin-provisioning.

Here is an output of `lvs` command from my host:
Code:
root@eco628:~# lvs
  LV            VG  Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data          pve twi-aotz-- <6.84t             1.76   0.47                           
  root          pve -wi-ao---- 96.00g                                                   
  swap          pve -wi-ao----  8.00g                                                   
  vm-112-disk-0 pve Vwi-a-tz-- 40.00g data        8.51                                   
  vm-113-disk-0 pve Vwi-aotz-- 40.00g data        100.00

- vm-112-disk: Was migrated using the above method.
- vm-113-disk-0: Was migrated while it's ON using live-migration.

Both VMs has 40GB assigned to them, but only use about 10GB.


The issue is that for my other VMs on production environment, I can't allow downtime and therefore can't afford to shut down the VMs, so this solution doesn't help me.
Any ideas on what can I do?
 
I suspect encryption has something to do with it. I'd have to check in code, or you can do it


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
I'll check that today by trying to migrate a VM without encryption and let you know.

but likely methods of reading/transferring the data are different between "local" and "remote", and "same type" and "different type".


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Not sure what you meant here. Can you please explain?
 
I'll check that today by trying to migrate a VM without encryption and let you know.
Ok, so I tested with an unencrypted VM and got the same result. So this isn't the reason for my issue
 
If you enable discard and then run a fstrim on the VM after it’s enabled and you have applied it to the VM it will clear down the space.

This is a limitation of QEMU live migration
 
  • Like
Reactions: Kingneutron
As ^^ sg90 said. Also in VM Options, Qemu guest agent, you might check ' Run guest-trim after a disk move or VM migration ' if the guest agent is installed, it should do it auto.
 
If you enable discard and then run a fstrim on the VM after it’s enabled and you have applied it to the VM it will clear down the space.

This is a limitation of QEMU live migration
As ^^ sg90 said. Also in VM Options, Qemu guest agent, you might check ' Run guest-trim after a disk move or VM migration ' if the guest agent is installed, it should do it auto.
I don't have guest agents configured on my VMs.
But as I said, I ran `fstrim -av` on them and nothing changed.

As far as I know, the "Run guest-trim after a disk move or VM migration" and "discard" options are just automatic ways to run `fstrim`. So I don't see how this will help if my manual usage hasn't made any difference.
 
But as I said, I ran `fstrim -av` on them and nothing changed.
That does not mean it worked. Please post the output of the program after running it. If there is no output, it didn't work. Please check also if the discard-Flag is enabled on all virtual disks (need stop and start of the VM, not reboot).
 
That does not mean it worked. Please post the output of the program after running it. If there is no output, it didn't work. Please check also if the discard-Flag is enabled on all virtual disks (need stop and start of the VM, not reboot).
Here is the output:

Code:
fstrim -av
/var: 178.3 GiB (191399063552 bytes) trimmed on /dev/mapper/vg1-lv_var
/tmp: 257.1 MiB (269615104 bytes) trimmed on /dev/mapper/vg1-lv_tmp
/boot: 0 B (0 bytes) trimmed on /dev/sda1
/: 0 B (0 bytes) trimmed on /dev/mapper/vg1-lv_root

When running "lvs" on the host, I still see the LV with 100% data usage (full), even after running the trim. Also, no change in the PROXMOX storage UI.


I just enabled the "Discard" flag on all VMs, then stopped&started them. Still, 100% usage of the disk.
Does the "Discard" have any meaning if I don't have a guest agent installed on my VM anyway?
 
Here is the output:

Code:
fstrim -av
/var: 178.3 GiB (191399063552 bytes) trimmed on /dev/mapper/vg1-lv_var
/tmp: 257.1 MiB (269615104 bytes) trimmed on /dev/mapper/vg1-lv_tmp
/boot: 0 B (0 bytes) trimmed on /dev/sda1
/: 0 B (0 bytes) trimmed on /dev/mapper/vg1-lv_root

When running "lvs" on the host, I still see the LV with 100% data usage (full), even after running the trim. Also, no change in the PROXMOX storage UI.


I just enabled the "Discard" flag on all VMs, then stopped&started them. Still, 100% usage of the disk.
That looks ok, yet sad that it's not trimmed. Have you tried a backup and a restore?


Does the "Discard" have any meaning if I don't have a guest agent installed on my VM anyway?
No, that are two different things.
 
That looks ok, yet sad that it's not trimmed. Have you tried a backup and a restore?
It worked!
So I created a backup (snapshot mode) to my "local" (Directory type) storage and then restored the VM, and now it doesn't take 100% of the LV.
At first, the VM took 97% of the LV, but then I ran "fstrim" and got it down to 17%.

The only issue is that I can't do that in my production server, since I don't have a big enough extra storage to create backups to every VM. I only got 1 disk with 2 storage configured- "local" (Directory type) with 100GB, and "local-lvm" (LVM-thin type) with 4TB.
Because LVM-thin storage can't store backups, I can only store them (even for a moment) in the "local" storage, which is just 100GB.

Any creative solution I can do here?

Thanks!
 
Last edited:
Do you still have an environment, in which you can test the migration or is the production the last VM that needs to be migrated?
 
You need to have the discard flag set on the disk before you run the fstrim command.

If you don't have it enabled the fstrim command will run but won't actually be passed through to the underlying disk.

You need to make sure discard is enabled, VM stopped and started and then fstirm enable.

The guest agent just allows Promox to pass through the fstrim command automatically.
 
Do you still have an environment, in which you can test the migration or is the production the last VM that needs to be migrated?
I still have a testing environment available to reproduce this case.

You need to have the discard flag set on the disk before you run the fstrim command.

If you don't have it enabled the fstrim command will run but won't actually be passed through to the underlying disk.

You need to make sure discard is enabled, VM stopped and started and then fstirm enable.

The guest agent just allows Promox to pass through the fstrim command automatically.
I tried that. But that didn't help.
I was still left with the same 100% data usage.
 
The only issue is that I can't do that in my production server, since I don't have a big enough extra storage to create backups to every VM. I only got 1 disk with 2 storage configured- "local" (Directory type) with 100GB, and "local-lvm" (LVM-thin type) with 4TB.
Because LVM-thin storage can't store backups, I can only store them (even for a moment) in the "local" storage, which is just 100GB.

Any creative solution I can do here?

Thanks!
Your problem may be that your filesystem refuses to trim already trimmed sectors [1]. Given that you have enabled discard on the drive and using VirtIO SCSI [single] as the controller, If you reboot the VM and then run fstrim -v again if should discard the free the space.

If you can't afford any downtime, ask the filesystem politely to please trim the free sectors :)

Code:
#10G file
dd if=/dev/zero of=/zerofile1 bs=1M count=10000
rm /zerofile1
fstrim -v /

Adapt paths an sizes to your use case. You could also not set a size, but dd will fill the disk and running apps may suffer.

BTW, you can manually create an LV, format it with i.e. ext4, mount it in some path and add such path as a directory storage for backup content. That may help you with this migration. Once done, remove the storage from PVE and delete the LV.

[1] https://unix.stackexchange.com/ques...im-data-blocks-on-btrfs-ecrypts/371665#371665
 
  • Like
Reactions: Sagi

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!