Q: Snapshots in qcow2 - sanity check

fortechitsolutions

Renowned Member
Jun 4, 2008
433
45
93
Hi,

I am not super familiar with how this is implemented and meant to work, and I can't find any hints when search forum, so I wanted to briefly ask if anyone can comment.

This is on a proxmox latest 8.0.4 host
VM is using qcow2 disk format
qcow2 is stored on a Proxmox VM storage pool - local NFS storage host with Gig ether connectivity (ie, modest but acceptable performance generally)

basic scenario
- snapshot taken yesterday to make sure it worked as expected
- snapshot was deleted / changes merged in a short while later.
- another snap was done, and some work was done inside the VM
- at this point looking in proxmox GUI in 'snapshots' for the VM we can see one snapshot present
- then this morning, approx 12hrs after snap was created, tried to do the snap delete-merge again.
- by good(bad) luck the NFS server was doing a raid5 parity check when the snap merge process was kicked off. After a while it returned timeout errors because NFS performance was terrible
- made sure backups from last night are good to PBS server. so we have a rollback in case of 'dammit' problems.
- gently poked things in the following hour, had to unlock the VM which was still flagged as locked for snap deletion. There was nothing visible in terms of extra snap data file in the dir where the VM QCow2 files live. In hindsight I am not sure we actually expect new files to exist, or if the snaps are purely internal to the qcow2 files themselves logically.
- endgame, manually intervened and in proxmox ssh console, edited/removed the config stanza in the VM Config file under /etc/pve/qemu-server
ie, the top-half of the file is content I recognize
the lower-half was content related to the snap state
kept copy of the conf file prior to change/edit

once that was done, we no longer see any snapshot listed in the proxmox gui for this VM.

however, under the hood, in the "MONITOR" we can see this:

Type 'help' for help. # info snapshots List of snapshots present on all disks: None List of partial (non-loadable) snapshots on 'drive-virtio1': ID TAG VM SIZE DATE VM CLOCK ICOUNT 1 BeforeForceReplicate 0 B 2023-11-03 17:31:47 48:41:43.178 2 Before_____014Rejoin 0 B 2023-11-04 09:50:28 15:01:01.588


so there is clearly a trace of snap activity having taken place in the qcow
but
I think the 'size' column is telling me we've got zero bytes in the snaps
so

I am curious - if anyone is familiar with what we expect to see here in the monitor / the 'info snapshots' - is it normal that
- after a qcow2 has had a snapshot taken; then deleted.
- we will expect to see some place holder persist forever, even if there is no more data associated with the snap / we can assume snap is not active etc

or

will I be better off to do something to 'export' the VM and 'import' in order to get back to a clean slate where we have no unwanted linger trace of snap

I can't even tell if I expect PBS Backup to preserve the internal snap metadata. I am assuming yes, because I am guessing PBS is going to backup (the QCOW2 file as it stands) and if there is internal snap metadata, that is part of the bundle.

anyhoo. It maybe that this is all good and fine.

Hindsight, I know now to not play with snapshots while my NFS performance is poor / as it creates drama with timeout

ideally my goal here is to (a) better understand this for future reference, and (b) ideally be relatively confident the VM is good here in question, and not going to blow up with COW data buffer infinite growth with time.

Right now the VM in question is booted and working normally / looks good

so this is kind of a 'sanity check' sort of query

thank you for the help / reading this far.


Tim
 
basic scenario
- snapshot taken yesterday to make sure it worked as expected
- snapshot was deleted / changes merged in a short while later.
- another snap was done, and some work was done inside the VM
- at this point looking in proxmox GUI in 'snapshots' for the VM we can see one snapshot present
- then this morning, approx 12hrs after snap was created, tried to do the snap delete-merge again.
- by good(bad) luck the NFS server was doing a raid5 parity check when the snap merge process was kicked off. After a while it returned timeout errors because NFS performance was terrible
Unfortunately, qcow2 + NFS can be quite slow when removing snapshots. If you face such issues, it is recommended to delete the snapshot while the VM is shut down instead.
- made sure backups from last night are good to PBS server. so we have a rollback in case of 'dammit' problems.
- gently poked things in the following hour, had to unlock the VM which was still flagged as locked for snap deletion. There was nothing visible in terms of extra snap data file in the dir where the VM QCow2 files live. In hindsight I am not sure we actually expect new files to exist, or if the snaps are purely internal to the qcow2 files themselves logically.
- endgame, manually intervened and in proxmox ssh console, edited/removed the config stanza in the VM Config file under /etc/pve/qemu-server
ie, the top-half of the file is content I recognize
the lower-half was content related to the snap state
kept copy of the conf file prior to change/edit

once that was done, we no longer see any snapshot listed in the proxmox gui for this VM.
Yes, the Proxmox UI relies on the presence of a snapshot section.
however, under the hood, in the "MONITOR" we can see this:

Type 'help' for help. # info snapshots List of snapshots present on all disks: None List of partial (non-loadable) snapshots on 'drive-virtio1': ID TAG VM SIZE DATE VM CLOCK ICOUNT 1 BeforeForceReplicate 0 B 2023-11-03 17:31:47 48:41:43.178 2 Before_____014Rejoin 0 B 2023-11-04 09:50:28 15:01:01.588
It tells you that they are partial snapshots. You can remove them, while the VM is turned off, with qemu-img snapshot -d BeforeForceReplicate /path/to/your/image.qcow2
I am curious - if anyone is familiar with what we expect to see here in the monitor / the 'info snapshots' - is it normal that
- after a qcow2 has had a snapshot taken; then deleted.
- we will expect to see some place holder persist forever, even if there is no more data associated with the snap / we can assume snap is not active etc
These partial snapshots are left-overs, because something went wrong/was aborted in the middle.
 
Thank you very much for the reply / this information! Greatly appreciated.

Part of the drama around this whole thing, was that the first-pass test snap/remove was done smoothly on Friday
and it seemed to go ok
but when the NFS server had the added background performance issue of raid metadata scrub (ie, we didn't realize that was running)
this is what made things 'more exciting' on saturday

so. Many thanks for clarification on what I am seeing.

My next step will be

- wait for down time window on server, make sure I have good backup on PBS
- power down the VM
- use the commands you indicated to delete the partial-bad-unwanted snaps from the KVM monitor
- once scrubbed and not visible, power on VM and do review/sanity test to make sure all is well
- happy days ensue!

Thank you,

Tim
 
slow footnote on this thread. I gave this a go this morning, and it did not quite go as expected. I think this means I just have to ignore it and not worry, but just in case of interest.

I tried to follow hint to remove snaps, after power down the VM

command line it tells me "no"

query to qcow2 file tells me "no snaps, have a nice day" (ie, list snaps gives no output)

as per

Code:
root@proxmox-1:/var/lib/vz/images/115# ls -la
total 20957716
drwxr----- 2 root root         4096 Nov  5 09:45 .
drwxr-xr-x 9 root root         4096 Nov  5 09:45 ..
-rw-r----- 1 root root 268476678144 Nov  9 06:28 vm-115-disk-0.qcow2

root@proxmox-1:/var/lib/vz/images/115# qemu-img snapshot -d BeforeForceReplicate vm-115-disk-0.qcow2
qemu-img: Could not delete snapshot 'BeforeForceReplicate': snapshot not found

root@proxmox-1:/var/lib/vz/images/115# qemu-img snapshot -d BeforeWgs014Rejoin vm-115-disk-0.qcow2
qemu-img: Could not delete snapshot 'BeforeWgs014Rejoin': snapshot not found
 
root@proxmox-1:/var/lib/vz/images/115# qemu-img snapshot -l vm-115-disk-0.qcow2
root@proxmox-1:/var/lib/vz/images/115#

So, I am a bit baffled how the 'monitor' is managing to audit and see traces of snaps
yet the qcow - qemu-image app - is telling me otherwise.
the snap place holders don't appear to have changed in the last few days of uptime, so I am not so worried they are doing much
but it is a bit of a mystery.


Tim
 
So, I am a bit baffled how the 'monitor' is managing to audit and see traces of snaps
yet the qcow - qemu-image app - is telling me otherwise.
the snap place holders don't appear to have changed in the last few days of uptime, so I am not so worried they are doing much
but it is a bit of a mystery.
Are they still visible after starting the VM again via monitor?

What you can do otherwise, if you haven't created a new snapshot in the meantime, is move the disk to a different storage (or same storage with different format if you have enough space) and then back again.
 
Yes, they are visible as shown / as per usual / once the VM was restarted. So. This is great to know though that doing a disk move will effectively give me clean slate new instance disk / and should banish them. I am pretty sure I've got enough space on the local SSD proxmox storage pool to simply do a quick downtime / local copy to different format / then local copy back to same-as-before-format. (ie, QCOW>RAW>QCOW). And if I don't have enough local space on SSD I have definitely got tons of space on NFS storage pool. But that one is slower so hopefully I will just get it done on local. Anyhoo. Details. I will schedule a bit of downtime and poke this forward in the next few days. And then followup to this thread after that to confirm status. Thank you!
 
Yes, they are visible as shown / as per usual / once the VM was restarted. So. This is great to know though that doing a disk move will effectively give me clean slate new instance disk / and should banish them. I am pretty sure I've got enough space on the local SSD proxmox storage pool to simply do a quick downtime / local copy to different format / then local copy back to same-as-before-format. (ie, QCOW>RAW>QCOW). And if I don't have enough local space on SSD I have definitely got tons of space on NFS storage pool. But that one is slower so hopefully I will just get it done on local. Anyhoo. Details. I will schedule a bit of downtime and poke this forward in the next few days. And then followup to this thread after that to confirm status. Thank you!
You should be able to do the move while the VM is running too. QEMU has a drive-mirror feature which Proxmox VE uses for that. But whichever way you choose, best to have a working backup ready should anything go wrong.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!