What happens when I delete a VM snapshot?

rabban

New Member
Sep 11, 2023
20
2
3
Hi all,

We have some VMs with disks stored in LVM-thin. As we want to delete old snapshots (some of them are years old), we want to understand the underlying things when a snapshot is deleted.

In our experience, we know that VMs get frozen while a snapshot is being deleted. How can I be sure that it's safe to delete a snapshot? Why do VMs are frozen? Is there a safer way or procedure to delete these snapshots?

VMs' disks size are around 400-800GiB.

Thanks!
 
Snapshots should only be kept for a few days; this is not a backup. When you create a snapshot, the deltas are written into it. If you delete this, the changes will be merged back into one image.

The faster the snapshots are gone, the faster it works.

You should also note that each snapshot can grow to the size of the virtual hard drive. If you have a 100 GB disk and you have 3 snapshots, these could take up to 400 GB in total.
 
Thanks for your answer, sb-jw. I understand that they should not be kept as a backup, this was done previously. I'm worried with that as we have very old and nested snapshots, so I'm not sure how much will take to be deleted.

Maybe it's safer to delete them with the VM offline, as some of the VMs have databases we're worried about the transactions and corruption to the filesystem.

If you have any suggestion I'll appreciate it.

Thanks!
 
You could look at how big the snapshots are, decide based on the size and then delete them one by one. As a rule, not much should happen to the VM, but that largely depends on the storage and its utilization underneath. So even if you stop this VM, others on the node may be affected.
 
If you delete this, the changes will be merged back into one image.
I would say that the reality is more nuanced and depends highly on the underlying storage. Removing a snapshot on ZFS for example, works almost instantly as it does not have to merge any data back, just remove the snapshot which marks any blocks used purely by that snapshot as free to use.
 
I would say that the reality is more nuanced and depends highly on the underlying storage.
That's absolutely true, but the TO had already noted that he was using LVM Thin - so I omitted general aspects and differentiations. But of course you should keep that in mind.

But ZFS also creates snapshots at the file system level, while CEPH or LVM Thin do it at the block level.
 
lvm thin also doesn't need to merge any changes back though. it will potentially discard the no longer needed extents, and depending on what hardware you have that can cause some storage load of course.
 
LVM thin bascially does ref-counting on blocks (so removing a snapshot is just decreasing those refcounts by one, and discards the block if they reach zero) - classical LVM had the merging problem, which made its snapshots basically unusable for anything other than short-lived "ensure consistency for backup access" cases, and even that was often painfully expensive/slow.
 
So deleting snapshot via lvremove with LVM thin could affect the VM?

Guest is offline and LVM not accessible in GUI due to accidental over provisioning and inexperience with proxmox/LVM.

2 of the snapshots I took show <2.6tb on a 1.8tb VM.

Thanks
 
deleting a snapshot should not affect the VM. but if you overprovisioned your storage and exceeded its capacity, that is a problem on its own, irrespective of removing a snapshot.
 
deleting a snapshot should not affect the VM. but if you overprovisioned your storage and exceeded its capacity, that is a problem on its own, irrespective of removing a snapshot.
Thank you for replying.

I over provisioned, but only have 1tb in the VM. I think it's because I didn't select discard and shut off defrag on the windows 10 VM. I also left sleep/hibernate enabled with the over provisioned size and am wildly guessing that's why it persistently got worse on each reboot with 100% disk usage until proxmox protected the LVM by shutting it down.

I've been researching tirelessly for a few days now to figure a way out of the hole I've dug myself :).

If you have a link to someone with a similar problem I would greatly appreciate it.

I have a 200gb blank disk installed and added to the vg. Waiting for my day off to attempt lvextend to see if I can force the LVM to init and get my data off of it. If I can get the windows vm to boot with discard checked and run defrag I think it may be okay.
 
Last edited:
it really depends what happened whether recovery is possible or not, hard to tell from a distance. if you don't have other backups, doing a full disk image of the disk backing your LVM would be prudent in any case, so that you can retry potentially destructive recovery operations multiple times.
 
it really depends what happened whether recovery is possible or not, hard to tell from a distance. if you don't have other backups, doing a full disk image of the disk backing your LVM would be prudent in any case, so that you can retry potentially destructive recovery operations multiple times.
Back to the LVM snapshot discussion.

My current understanding:

Data on the LVM (VM or other things) continue to be written and that is where the data physically lives.

Snapshots just attach a flag of some sort or catalog everything written after so the data after the snapshot can be removed... returning data to the specified date/time.

Snapshots contain no data of their own aside from whatever (addressing?) is needed to find and delete the underlying data.

Is this correct?
 
no, snapshots in lvm thin work rather different. there's a metadata volume that keeps track of which extents are used by which volume, and a data volume that has the actual data extents. a snapshot is just like any other volume, and volumes can share extents (updating/writing to an extent then requires allocating a new extent so that the other volumes still have the old data). the most common issue with LVM thin is running out of metadata space, if the metadata is corrupt the mapping where data is and which volume (or snapshot) it belongs to is gone.
 
no, snapshots in lvm thin work rather different. there's a metadata volume that keeps track of which extents are used by which volume, and a data volume that has the actual data extents. a snapshot is just like any other volume, and volumes can share extents (updating/writing to an extent then requires allocating a new extent so that the other volumes still have the old data). the most common issue with LVM thin is running out of metadata space, if the metadata is corrupt the mapping where data is and which volume (or snapshot) it belongs to is gone
Thank you for that explanation.

So it turns out that my disk and metadata are full. 15.88gb for the metadata. Is there a person or service you know of that could reliably recover something? Any hail marry would be greatly appreciated.

Luckily I didn't try any of the repair options yet while I was researching.

I might be able to try making the LVM not active and deleting the 2 largest snapshots (2.9tb each). That could free up enough to let a broken VM boot. The LVM just gives errors when I attempt anything at this point.
 
Last edited:
Is there a person or service you know of that could reliably recover something? Any hail marry would be greatly appreciated.

not really - but asking whether they've recovered an LVM thin pool that ran out of metadata space should be easy to do if you attempt to contract somebody!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!