What happens when I delete a VM snapshot?

rabban · Feb 3, 2024

Hi all,

We have some VMs with disks stored in LVM-thin. As we want to delete old snapshots (some of them are years old), we want to understand the underlying things when a snapshot is deleted.

In our experience, we know that VMs get frozen while a snapshot is being deleted. How can I be sure that it's safe to delete a snapshot? Why do VMs are frozen? Is there a safer way or procedure to delete these snapshots?

VMs' disks size are around 400-800GiB.

Thanks!

sb-jw · Feb 4, 2024

Snapshots should only be kept for a few days; this is not a backup. When you create a snapshot, the deltas are written into it. If you delete this, the changes will be merged back into one image.

The faster the snapshots are gone, the faster it works.

You should also note that each snapshot can grow to the size of the virtual hard drive. If you have a 100 GB disk and you have 3 snapshots, these could take up to 400 GB in total.

rabban · Feb 4, 2024

Thanks for your answer, sb-jw. I understand that they should not be kept as a backup, this was done previously. I'm worried with that as we have very old and nested snapshots, so I'm not sure how much will take to be deleted.

Maybe it's safer to delete them with the VM offline, as some of the VMs have databases we're worried about the transactions and corruption to the filesystem.

If you have any suggestion I'll appreciate it.

Thanks!

sb-jw · Feb 4, 2024

You could look at how big the snapshots are, decide based on the size and then delete them one by one. As a rule, not much should happen to the VM, but that largely depends on the storage and its utilization underneath. So even if you stop this VM, others on the node may be affected.

aaron · Feb 4, 2024

sb-jw said:
If you delete this, the changes will be merged back into one image.

I would say that the reality is more nuanced and depends highly on the underlying storage. Removing a snapshot on ZFS for example, works almost instantly as it does not have to merge any data back, just remove the snapshot which marks any blocks used purely by that snapshot as free to use.

sb-jw · Feb 5, 2024

aaron said:
I would say that the reality is more nuanced and depends highly on the underlying storage.

That's absolutely true, but the TO had already noted that he was using LVM Thin - so I omitted general aspects and differentiations. But of course you should keep that in mind.

But ZFS also creates snapshots at the file system level, while CEPH or LVM Thin do it at the block level.

fabian · Feb 5, 2024

lvm thin also doesn't need to merge any changes back though. it will potentially discard the no longer needed extents, and depending on what hardware you have that can cause some storage load of course.

sb-jw · Feb 5, 2024

@fabian If merging is the wrong term for you, how would you describe it?

fabian · Feb 7, 2024

LVM thin bascially does ref-counting on blocks (so removing a snapshot is just decreasing those refcounts by one, and discards the block if they reach zero) - classical LVM had the merging problem, which made its snapshots basically unusable for anything other than short-lived "ensure consistency for backup access" cases, and even that was often painfully expensive/slow.

Bobsmyuncle · Jul 2, 2024

So deleting snapshot via lvremove with LVM thin could affect the VM?

Guest is offline and LVM not accessible in GUI due to accidental over provisioning and inexperience with proxmox/LVM.

2 of the snapshots I took show <2.6tb on a 1.8tb VM.

Thanks

fabian · Jul 2, 2024

deleting a snapshot should not affect the VM. but if you overprovisioned your storage and exceeded its capacity, that is a problem on its own, irrespective of removing a snapshot.

Bobsmyuncle · Jul 3, 2024

fabian said:
deleting a snapshot should not affect the VM. but if you overprovisioned your storage and exceeded its capacity, that is a problem on its own, irrespective of removing a snapshot.

Thank you for replying.

I over provisioned, but only have 1tb in the VM. I think it's because I didn't select discard and shut off defrag on the windows 10 VM. I also left sleep/hibernate enabled with the over provisioned size and am wildly guessing that's why it persistently got worse on each reboot with 100% disk usage until proxmox protected the LVM by shutting it down.

I've been researching tirelessly for a few days now to figure a way out of the hole I've dug myself

.

If you have a link to someone with a similar problem I would greatly appreciate it.

I have a 200gb blank disk installed and added to the vg. Waiting for my day off to attempt lvextend to see if I can force the LVM to init and get my data off of it. If I can get the windows vm to boot with discard checked and run defrag I think it may be okay.

fabian · Jul 3, 2024

it really depends what happened whether recovery is possible or not, hard to tell from a distance. if you don't have other backups, doing a full disk image of the disk backing your LVM would be prudent in any case, so that you can retry potentially destructive recovery operations multiple times.

Bobsmyuncle · Jul 3, 2024

Yeah, it's been a learning journey. Had never used LVM or proxmox, and 10yrs since I messed with kvm/Ubuntu. Just thought snapshots sounded great .

Thanks for the advice. I'll find another drive and dd before I do anything crazy .

If you want a chaotic read with some more info on my situation ...
https://forum.proxmox.com/threads/need-help-lvmthin-volume-will-not-load.150093/

Bobsmyuncle · Jul 3, 2024

fabian said:
it really depends what happened whether recovery is possible or not, hard to tell from a distance. if you don't have other backups, doing a full disk image of the disk backing your LVM would be prudent in any case, so that you can retry potentially destructive recovery operations multiple times.

Back to the LVM snapshot discussion.

My current understanding:

Data on the LVM (VM or other things) continue to be written and that is where the data physically lives.

Snapshots just attach a flag of some sort or catalog everything written after so the data after the snapshot can be removed... returning data to the specified date/time.

Snapshots contain no data of their own aside from whatever (addressing?) is needed to find and delete the underlying data.

Is this correct?

fabian · Jul 4, 2024

no, snapshots in lvm thin work rather different. there's a metadata volume that keeps track of which extents are used by which volume, and a data volume that has the actual data extents. a snapshot is just like any other volume, and volumes can share extents (updating/writing to an extent then requires allocating a new extent so that the other volumes still have the old data). the most common issue with LVM thin is running out of metadata space, if the metadata is corrupt the mapping where data is and which volume (or snapshot) it belongs to is gone.

Bobsmyuncle · Jul 4, 2024

fabian said:
no, snapshots in lvm thin work rather different. there's a metadata volume that keeps track of which extents are used by which volume, and a data volume that has the actual data extents. a snapshot is just like any other volume, and volumes can share extents (updating/writing to an extent then requires allocating a new extent so that the other volumes still have the old data). the most common issue with LVM thin is running out of metadata space, if the metadata is corrupt the mapping where data is and which volume (or snapshot) it belongs to is gone

Thank you for that explanation.

So it turns out that my disk and metadata are full. 15.88gb for the metadata. Is there a person or service you know of that could reliably recover something? Any hail marry would be greatly appreciated.

Luckily I didn't try any of the repair options yet while I was researching.

I might be able to try making the LVM not active and deleting the 2 largest snapshots (2.9tb each). That could free up enough to let a broken VM boot. The LVM just gives errors when I attempt anything at this point.

fabian · Jul 4, 2024

Bobsmyuncle said:
Is there a person or service you know of that could reliably recover something? Any hail marry would be greatly appreciated.

not really - but asking whether they've recovered an LVM thin pool that ran out of metadata space should be easy to do if you attempt to contract somebody!

Winet.maier · Jul 29, 2024

I had here a problem when deleting a snapshot of a windows server. VM's are on a jovian NFS share with SSD and a 40 Gbit Metronet. The Server was locked at least 3 min. The Storage as still enough space, so this could not be the reason.

fabian · Jul 30, 2024

Winet.maier said:
I had here a problem when deleting a snapshot of a windows server. VM's are on a jovian NFS share with SSD and a 40 Gbit Metronet. The Server was locked at least 3 min. The Storage as still enough space, so this could not be the reason.

that sounds like a different issue? how are you using that storage box on the PVE side?

What happens when I delete a VM snapshot?

New Member

Famous Member

New Member

Famous Member

Proxmox Staff Member

Famous Member

Proxmox Staff Member

Famous Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

Member

Proxmox Staff Member

We value your privacy