Best way to remove old snapshots

DanielK

New Member
Feb 22, 2023
1
0
1
When setting up a new server I created a few snapshots after installing/configuring each part of the needed software, so if anything goes wrong i can easily revert to the previous step. I know when everything is working properlly they should've been deleted. But they weren't.

Couple weeks ago when i noticed it, i decided to delete start deleting them with the server online. Selected the most recent (which was about 2 weeks old, OS update), and deleted. 20 minutes in, every service within the server stop responding and we had top abort. After that almost everything is OK, we had a couple corrupt files, but were able to recover them from a recent backup.
I expected it to become slow, or very slow, but not to go stop responding.

The server is now in production for almost 5 months, and disk has over 1.5 TB of data, and i had 9 nested snapshots that I would like to delete. the most recent is 4 months old.

We had a 4 hours maintenance windows recently, so I shut down the VM and I tried deleting the most recent snapshot. After 4 hours it didn't finish, so we had to stop it to bring the server back online.
When we stoped the snapthot deletion process, this message came up:

Code:
TASK ERROR: command '/usr/bin/qemu-img snapshot -d snap_XX_XX_XXXX /var/lib/vz/images/101/vm-101-disk-2.qcow2' failed: received interrupt

Because of that, we expected to have a lot (or at least some) corrupted data, and we were ready to restore it from an external backup tool.
For our surprise, we had no data loss at all. Well, at least we couldn't find any. The server has been running for a few days with no issues, and since we had no customer complaining about data loss, looks like everything is allright.

Now I don't know if when we tried to delete the snapshots the process completed internally, and only the interface showed as if it was still running.

I need to remove these snapshots, but I can't take the server offline for too long.
1) What should I do to delete those snapshots with the least downtime possible?
2) What problems could happen if I just keep them for more time? 1 year, 2 years... Would that be a real threat to the server? If so, how?

I'm running PVE 7.2-1, LVM file-system.
 
imo, managing snapshots in 1TB+ qcow2 will take many hours if not days, more older is snapshot more time will be needed to delete it. i won't trust a qcow2+snapshots of more than 1TB. I doubt others users will advise large qcow2.
lvmthin is the recommended and the default storage for vDisks of VM, with better snapshot management.
 
Last edited: