Snapshots as Volume-Chain Creates Large Snapshot Volumes

CraigHolyoak

New Member
Nov 10, 2025
1
0
1
I've been playing with the new Snapshots as Volume-Chain feature. It works fine, but I've noticed that for each snapshot it creates a snapshot volume that is the same size as the source volume. I suppose this make a certain sense, and may even be a hard requirement, but it does pose a problem for large VM disks, since it implies that the underlying VG has free space at least as large as the largest volume, no matter how little writing will actually happen while the snapshot is present. For large VMs may be very large indeed.

Imagine a large 10TB disk. A snapshot, that may last for an hour, may be lucky to see 100GB written to it, yet it would have the full 10TB allocated for each snapshot.

If the underlying storage (iSCSI) is thin-provisioned you can just massively over-provision each volume, though that may look ugly on that side if everything is over-provisioned by a factor of two. Might I suggest two options that may go some way to reducing this problem:

  1. Allow a second, large, VG to be specified to be used for these snapshot chain volumes.
  2. Allow a size to be specified (% or absolute) for the size of the snapshot volume to be created. Granted, there would be a risk here of filling this snapshot if this was set too low, or the snapshot kept too long.
I don't know if either of these options are technically feasible, but I think they would be helpful to keep things tidy for large VM disks.
 
I have tested and observed the same behavior. I am also interested in this as it is not practical when working in a SAN environment to overprovision the LUNs to allow these large drives to get a snapshot. Will there be an alternative improvement to allow delta-like functionality and is there a feature request already that is being considered? I appreciate the work done so far to bring into the fold legacy environments but to achieve parity in this space with more support options in the US would break the dam holding some of us back.
 
We are seeing the same behaviour. Our testing setup:
Proxmox VE 9.1-1
Dell PowerStore 500T SAN
iSCSI with multipathing between them
  1. Create an LVM storage object on top of the PV/VG created on the iSCSI LUN of 100GB.
  2. Create a VM with 20GB disk. Real usage inside is about 5GB.
  3. Take a snapshot.
  4. The Proxmox side storage shows 40GB used. The SAN side shows 5GB used
We have the additional problem of storage not being reclaimed when snapshots are deleted when using 'wipe removed volumes'. So deleting the snapshot created above will cause the Proxmox storage side to show 20GB used, but then SAN will now report 25GB consumed as the snapshot space is 'filled'. So after 4 snapshots, the SAN LUN will be 100% consumed and so far we haven't found a way to resolve this.

With ESXi we can use snapshots for application upgrades, and then rollback or proceed and delete that snapshot later. Consider a VM with a 2TB disk attached, where the upgrade will change maybe 500MB of on-disk content. We'd need to have 4+TB of space available to take that snapshot, and deleting the snapshot afterwards will not release that space back to us.

Understandably, the feature is a tech-preview and it's great to see progress here. This would be one of our main sticking points with fully converting from VMWare to Proxmox.
 
I've been playing with the new Snapshots as Volume-Chain feature. It works fine, but I've noticed that for each snapshot it creates a snapshot volume that is the same size as the source volume. I suppose this make a certain sense, and may even be a hard requirement, but it does pose a problem for large VM disks, since it implies that the underlying VG has free space at least as large as the largest volume, no matter how little writing will actually happen while the snapshot is present. For large VMs may be very large indeed.

Imagine a large 10TB disk. A snapshot, that may last for an hour, may be lucky to see 100GB written to it, yet it would have the full 10TB allocated for each snapshot.

If the underlying storage (iSCSI) is thin-provisioned you can just massively over-provision each volume, though that may look ugly on that side if everything is over-provisioned by a factor of two. Might I suggest two options that may go some way to reducing this problem:

  1. Allow a second, large, VG to be specified to be used for these snapshot chain volumes.
this doesn't work (for technical reasons)
  1. Allow a size to be specified (% or absolute) for the size of the snapshot volume to be created. Granted, there would be a risk here of filling this snapshot if this was set too low, or the snapshot kept too long.
if we wanted this semantics, we could just use regular LVM snapshots with all their issues. PVE snapshots are not meant to only be ephemeral.
 
I have tested and observed the same behavior. I am also interested in this as it is not practical when working in a SAN environment to overprovision the LUNs to allow these large drives to get a snapshot. Will there be an alternative improvement to allow delta-like functionality and is there a feature request already that is being considered? I appreciate the work done so far to bring into the fold legacy environments but to achieve parity in this space with more support options in the US would break the dam holding some of us back.

@spirit and another contributor are evaluating dynamically expanding the LVs, but it is tricky - compared to snapshot operations (which can wait a bit if there is contention) expanding a volume before it runs out of space is very time critical, yet in a cluster/shared storage context, we need to ensure consistency.
 
We are seeing the same behaviour. Our testing setup:
Proxmox VE 9.1-1
Dell PowerStore 500T SAN
iSCSI with multipathing between them
  1. Create an LVM storage object on top of the PV/VG created on the iSCSI LUN of 100GB.
  2. Create a VM with 20GB disk. Real usage inside is about 5GB.
  3. Take a snapshot.
  4. The Proxmox side storage shows 40GB used. The SAN side shows 5GB used
We have the additional problem of storage not being reclaimed when snapshots are deleted when using 'wipe removed volumes'. So deleting the snapshot created above will cause the Proxmox storage side to show 20GB used, but then SAN will now report 25GB consumed as the snapshot space is 'filled'. So after 4 snapshots, the SAN LUN will be 100% consumed and so far we haven't found a way to resolve this.

With ESXi we can use snapshots for application upgrades, and then rollback or proceed and delete that snapshot later. Consider a VM with a 2TB disk attached, where the upgrade will change maybe 500MB of on-disk content. We'd need to have 4+TB of space available to take that snapshot, and deleting the snapshot afterwards will not release that space back to us.

Understandably, the feature is a tech-preview and it's great to see progress here. This would be one of our main sticking points with fully converting from VMWare to Proxmox.

this should work, but you need to ensure that the whole layered storage stack supports discarding data on deletion. please ensure you are on the latest PVE package versions, and verify that discard is enabled and supported across the whole stack.
 
@spirit and another contributor are evaluating dynamically expanding the LVs, but it is tricky - compared to snapshot operations (which can wait a bit if there is contention) expanding a volume before it runs out of space is very time critical, yet in a cluster/shared storage context, we need to ensure consistency.
I'm still working on it ;)
 
  • Like
Reactions: Johannes S
this should work, but you need to ensure that the whole layered storage stack supports discarding data on deletion. please ensure you are on the latest PVE package versions, and verify that discard is enabled and supported across the whole stack.
We first tested on the ISO release of 9.0-1 and could see it doing the cstream wipe, and today updated to 9.1-1 ISO release and saw it do the blkdiscard wipe. Both cases have the same behaviour on the SAN volume usage.

From what we can see, the SAN supports it, and the LUN presented to the host appears to show discard support:

Code:
root@edupve103:~# lsblk  --discard
NAME                                DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda                                        0        4K      16M         0
??368ccf09800e4d31cf36e8069af88f6bc        0        4K      16M         0

When we create the PV/VG, do we need to do anything specific?
 
lvm.conf also has a few knobs that might be relevant.. but you can test this without LVM as well:

- create a LUN
- write some data onto it from PVE (directly to the block device)
- verify usage went up
- discard from PVE
- verify usage went down

if that works as expected, the next step is to verify that it works with LVM layered on top ;)
 
lvm.conf also has a few knobs that might be relevant.. but you can test this without LVM as well:

- create a LUN
- write some data onto it from PVE (directly to the block device)
- verify usage went up
- discard from PVE
- verify usage went down

if that works as expected, the next step is to verify that it works with LVM layered on top ;)
Cool I have just done this and the SAN reports the space reclaimed on the test volume.

Write junk data to half the 1G test volume (no LVM)
Code:
root@edupve103:~# dd if=/dev/urandom of=/dev/mapper/368ccf098000cb98e97490a1316bc7aa5 bs=10M count=50
50+0 records in
50+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 1.3476 s, 389 MB/s

The SAN reports 500MB logically used:

Use `blkdiscard` to discard 512M (it complains about 500M not being 512k sector aligned)
Code:
root@edupve103:~# blkdiscard -l 512MB /dev/mapper/368ccf098000cb98e97490a1316bc7aa5

The SAN reports 11MB used, so looks like the discard works on that level :)
 
I wonder
Cool I have just done this and the SAN reports the space reclaimed on the test volume.

Write junk data to half the 1G test volume (no LVM)
Code:
root@edupve103:~# dd if=/dev/urandom of=/dev/mapper/368ccf098000cb98e97490a1316bc7aa5 bs=10M count=50
50+0 records in
50+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 1.3476 s, 389 MB/s

The SAN reports 500MB logically used:

Use `blkdiscard` to discard 512M (it complains about 500M not being 512k sector aligned)
Code:
root@edupve103:~# blkdiscard -l 512MB /dev/mapper/368ccf098000cb98e97490a1316bc7aa5

The SAN reports 11MB used, so looks like the discard works on that level :)
yes, it should work. The only "problem" is that lvm is reservering blocks address space. The creation of snapshot itself is not writing zero.

I wonder if it could be possible to declare a lun with with virtual size bigger than your san real storage space ? (I think it's depend of the san implementation)
 
We first tested on the ISO release of 9.0-1 and could see it doing the cstream wipe, and today updated to 9.1-1 ISO release and saw it do the blkdiscard wipe. Both cases have the same behaviour on the SAN volume usage.
for secure delete, the new blkdiscard here, is not using discard, but zeroing feature (blkdiscard -z). It's a little bit different, because it's really writing zeroes (telling to the storage to writing zeroes by range from this begin sector - end sectore), that t's it's faster than cstream where you need to send zeroes block by block. it's some kind of zeroing offloading.
 
Ah right, so the zeroing marks the blocks as 'used', which the SAN reports. And there should be a discard happening at some point to tell the SAN those blocks are unmapped again, but this doesn't appear to happen? (possibly the issue_discards LVM config).

So in an example, if a 100GB volume has a 20GB VM disk on it, we take three snapshots and end up with 80GB used. Then delete the snapshots, they are filled with zeroes, the Proxmox VG shows 20GB used (only the VM disk), while the SAN shows 80GB used (20GB VM disk + 60GB of zeroes from the three snapshots).
If this is correct, it shouldn't be a problem except for viewing storage from the SAN side. Proxmox would reallocate those zeroed blocks to real data when needed?

Sorry to hijack OPs thread about large snapshots! That still is a bit of a problem for us, but at least it seems like we can get the space back (or ignore what the SAN reports).
 
Ah right, so the zeroing marks the blocks as 'used', which the SAN reports. And there should be a discard happening at some point to tell the SAN those blocks are unmapped again, but this doesn't appear to happen? (possibly the issue_discards LVM config).
Maybe because if the space is allocate again to a new vm, if the vm is doing a discard, it should remove the zeroed block
So in an example, if a 100GB volume has a 20GB VM disk on it, we take three snapshots and end up with 80GB used. Then delete the snapshots, they are filled with zeroes, the Proxmox VG shows 20GB used (only the VM disk), while the SAN shows 80GB used (20GB VM disk + 60GB of zeroes from the three snapshots).
If this is correct, it shouldn't be a problem except for viewing storage from the SAN side. Proxmox would reallocate those zeroed blocks to real data when needed?
yes, it's zeroing by security, to avoid to have old datas when you'll create a new vm on the old allocated space of previous vm..
but of course this zerroed space is allocatable for new vms.
 
it's some kind of zeroing offloading
IMHO, a final delete/discard should be done, too. If no discard is sent, it would delegate to the SAN what to do with those zeroes and depending on SAN capabilities (mainly thin provision but also compression and deduplication) may not free the space from the SAN perspective. And yes, some SANs will also do weird things on discard and recovering space from the SAN perspective would still depend on SAN implementation, but at least it would be done in two ways in order to recover that space from the SAN perspective.

Also, any sane storage should return empty blocks from any discarded address space, even if they haven't been fully deleted from disk themself.