Proxmox always pre-allocates when migrating? (LVM to LVM)

Moonrocks · Jun 18, 2025

Hello,

I have a 4 node cluster, each cluster has a thin lvm pool with 4 drives each.
Whenever I create a new VM, the space is not pre-allocated (which is great), but when I migrate a VM from one node to another node the full disk size gets used up on the target node. Is pre-allocation done by default when migrating, if yes, how do I avoid this?

TIA!

Impact · Jun 18, 2025

This has previously been discussed here

Moonrocks · Sunday at 00:07

Impact said:
This has previously been discussed here

https://forum.proxmox.com/threads/raw-image-fills-100-space-of-thin-lvm-after-migration.137665/

https://forum.proxmox.com/threads/v...fter-migrating-it-to-lvm-thin-storage.142070/

Thank you for pointing these out, the issue that I have is that I don't have Qemu-agent installed on the VMs and these are customer VMs so cant really go in and run the trim. I have setup a new node where I want to migrate these VMs (and avoid any downtime). I can setup a different type of storage on this new node (ZFS etc). I need a way to be able to live migrate without having to run trim on the guest machines and without pre-allocating I am open to using different type of storage on the new node. Do I have any options?

guruevi · Sunday at 05:37

Why not use Ceph?

Moonrocks · Sunday at 17:17

guruevi said:
Why not use Ceph?

CEPH Sounds really attractive and I'm seriously considering it. However, I am a bit concerned regarding the performance in comparison to RAW NVMEs. I am currently using RAW NVMe. How does CEPH Compare to that?

I have ConnectX4 and X3 (40G) NICs on these servers.

guruevi · Sunday at 18:34

The performance is largely dependent on the network, with 4 DC NVMe per host you can definitely saturate up to 100-200G if you have 40G, do you need more than that?

The consideration is mostly cost-performance-safety, if your host dies, how will you recover your current environment? Is the data loss or downtime you currently can experience acceptable? If so, Ceph with unsafe cache on and 2 copies will perform just as well, you’ll get the same data security but with the benefit of a shorter data loss and recovery time frame.

If you want the synchronous 3-way data safety of your default Ceph install, obviously you have to sacrifice some speed, but your data loss is going to be near zero and recovery time frame measured in seconds.

Most performance stories here have to do with desktop grade equipment.

Moonrocks · Sunday at 23:53

guruevi said:
The performance is largely dependent on the network, with 4 DC NVMe per host you can definitely saturate up to 100-200G if you have 40G, do you need more than that?

The consideration is mostly cost-performance-safety, if your host dies, how will you recover your current environment? Is the data loss or downtime you currently can experience acceptable? If so, Ceph with unsafe cache on and 2 copies will perform just as well, you’ll get the same data security but with the benefit of a shorter data loss and recovery time frame.

If you want the synchronous 3-way data safety of your default Ceph install, obviously you have to sacrifice some speed, but your data loss is going to be near zero and recovery time frame measured in seconds.

Most performance stories here have to do with desktop grade equipment.

Thanks you for the insight, regarding hardware, I'm running R740s for compute with 2x Platinum 8268. How would things look like if I populate each storage node with 1x Xeon E5-2683 v4?

guruevi · Monday at 00:14

That processor is about 10 years old, you may still be able to saturate 40G though but a single E5 with DDR4 will increase latency and won’t be able to saturate your NVMe. Good enough for testing or spinning disk backup/archive, wouldn’t rely on anything that old for production.

Moonrocks · Monday at 23:55

What if I migrate from my current storage to a directory backed store? Will the space be pre-allocated still?

VictorSTS · Tuesday at 01:07

Space will be preallocated (that is, thin provisioning will be lost) on any non-shared storage if you live migrate the VM due to the fact that QEMU needs to set the source disk in "mirror" state so every write done to the source disk is written synchronously to the destination disk too. That even copies "zeros" instead of sparse areas of the source disk. If you migrate the VM while it's off, QEMU uses disk clone which preserves sparse areas.

You should definitely tell the VM administrator to install QEMU agent (if Windows, the full VirtIO + balloon driver is a must too). Besides the option to automatically run a fstrim after disk clone or disk migration, QEMU Agent will help to make backups consistent at the filesystem level.

IMHO going with local storage can only be justified if both very very low latency disk access is required and high RTO/RPO can be tolerated in case of host failure. I would use Ceph and dimension the storage accordingly.

Moonrocks said:
to a directory backed store

If you want snapshots you have to use qcow2 format for disks and creating/deleting snapshots with it is quite slower than with LVM/ZFS/Ceph.

Moonrocks · Tuesday at 01:16

The issue that I currently have is that we are considering setting up CEPH for non latency intensive VMs and migrating VMs from our current Thin-LVM backed storage over to our CEPH storage, from what I understand so far regardless of what storage backend we migrate to the preallocation will be made if we want to live-migrate from our current Thin-LVM storage. What we’re considering now is that we can take a backup of the VM -> Spin Up backup on the new storage backed once VM is live there we shutdown the existing one (however we are trying to avoid doing this since it can have its own problems in terms of data consistency on the VMs)

I also tried running fstrim inside a test VM but that didn’t release the unused space.

VictorSTS · Tuesday at 01:31

Moonrocks said:
considering setting up CEPH for non latency intensive VMs

Do you have numbers on what the performance should be? Without them, you can't decide which VMs are "non latency intensive". Ceph isn't slow by any means, but of course you have the added latency and capacity limit of the network. How much that may affect real usage performance depends on many factors like write patterns or sync vs async writes. Take data from your current storage to find out your current perf usage (netdada and Proxmox integrated Metric Server + Grafana will help you here).

Moonrocks said:
the preallocation will be made if we want to live-migrate from our current Thin-LVM storage

Yes, it will. That's why you need QEMU Agent in every VM. Or make customer pay for the whole drive they provision so you can buy more disks to allocate the whole space

Moonrocks said:
What we’re considering now is that we can take a backup of the VM

You will lose minutes, maybe even hours of data in those VMs. How important that is for you to decide.

Moonrocks said:
I also tried running fstrim inside a test VM but that didn’t release the unused space.

You are doing it wrong. It's detailed in the post linked before [1]. With proper VM configuration it works 100% of the time if the VMs OS supports trimming.

[1] https://forum.proxmox.com/threads/v...fter-migrating-it-to-lvm-thin-storage.142070/

Moonrocks · Tuesday at 01:52

VictorSTS said:
Do you have numbers on what the performance should be? Without them, you can't decide which VMs are "non latency intensive". Ceph isn't slow by any means, but of course you have the added latency and capacity limit of the network. How much that may affect real usage performance depends on many factors like write patterns or sync vs async writes. Take data from your current storage to find out your current perf usage (netdada and Proxmox integrated Metric Server + Grafana will help you here).

The pre-req for us to setup CEPH is to be able to keep the thin-provisioning so we are trying to figure that out before gathering the data. Once we have a clear roadmap we will work on aggregating the data.

VictorSTS said:
You will lose minutes, maybe even hours of data in those VMs. How important that is for you to decide.

This would likely be the last resort for us.

VictorSTS said:
You are doing it wrong. It's detailed in the post linked before [1]. With proper VM configuration it works 100% of the time if the VMs OS supports trimming.

[1] https://forum.proxmox.com/threads/v...fter-migrating-it-to-lvm-thin-storage.142070/

You're right, followed the post to the t and was able to get the space released. Now, would it be possible to:

1) Install Qemu Guest Agent via CloudInit? We have cloudinit setup on these VMs. I'm thinking something like this:
-> Setup a CloudInit Script: /var/lib/vz/snippets/qemu-ga.yaml

Code:

#cloud-config
packages:
  - qemu-guest-agent

runcmd:
  - systemctl enable --now qemu-guest-agent

2) Enable the Qemu Guest Agent trim on these VMs (is enabling this possible without a power cycle?)

TIA

guruevi · Tuesday at 02:01

Note that most modern database systems use some form of network replication before considering data to be “stable”. If Ceph adding a few hundred nanoseconds is a problem, you must consider the entire VM/SDN network architecture as well.

As far as the trim, depending on the OS, if you have cloudinit, then you can just install/enable qemu-agent, make sure you have VirtIO disks and continuous trim should happen automatically on most mainstream modern Linux.

Moonrocks · Tuesday at 02:10

guruevi said:
Note that most modern database systems use some form of network replication before considering data to be “stable”. If Ceph adding a few hundred nanoseconds is a problem, you must consider the entire VM/SDN network architecture as well.

We do have archival backup for that purpose but you're right, it does make RTO much longer.

guruevi said:
As far as the trim, depending on the OS, if you have cloudinit, then you can just install/enable qemu-agent,

Perfect, I think this would be the way to go since we need the trim even if we want to move over to CEPH.

guruevi said:
make sure you have VirtIO disks and continuous trim should happen automatically on most mainstream modern Linux.

We do have VirtIO Disks and discard is on (PFA the screen grab), does this mean if we do a phased migration and wait it out the trims will happen automatically? We are trying to avoid having to reboot the VMs (for cloudinit to do its thing and enable the Guest Agent on proxmox side for each VM).

guruevi · Tuesday at 02:30

Again, depends on your environment but default Ubuntu 22, 24 and RHEL 9 have fstrim on a timer (systemctl list-timers). Windows 10/2016+ does as well.

Now if you use ZFS or FAT32 file partitions, or you disabled it on XFS or set nodiscard in the mount options then you need to check what needs to be done for trim.

You don’t need to reboot to enable qemu-guest-agent, you should be able to start it on the guest manually or through Ansible deployments or however you manage your guests and it will work and trim will immediately work provided you have it set to discard in the VM configuration as well.

Search

Search

Proxmox always pre-allocates when migrating? (LVM to LVM)

Moonrocks

Member

Impact

Well-Known Member

Moonrocks

Member

guruevi

Well-Known Member

Moonrocks

Member

guruevi

Well-Known Member

Moonrocks

Member

guruevi

Well-Known Member

Moonrocks

Member

VictorSTS

Famous Member

Moonrocks

Member

VictorSTS

Famous Member

Moonrocks

Member

guruevi

Well-Known Member

Moonrocks

Member

Attachments

guruevi

Well-Known Member

We value your privacy