[SOLVED] Snapshot Not Working

mhayhurst

Renowned Member
Jul 21, 2016
111
7
83
43
Hello everyone!

On Proxmox (pve-manager/5.4-14/b0e640f7 (running kernel: 4.15.18-28-pve)) my snapshots are not working for one of my VMs. They appear to complete in the Proxmox UI and on the CLI:

root@proxmox2:~# qm snapshot 110 "Test1" --description "This is a test" root@proxmox2:~# qm listsnapshot 110 root@proxmox2:~# qm snapshot 110 "Test1" --description "This is a test" snapshot name 'Test1' already used root@proxmox2:~#

/var/log/pve/tasks/index shows the snapshot as "OK":
UPID:proxmox2:00004ADE:000AB986:5EB5D4BD:qmsnapshot:110:root@pam: 5EB5D4BE OK

Snapshots do not appear in the Proxmox UI either:

1588974847463.png

You will also notice the "NOW" default snapshot is gone. Does anyone know what's going on and how to correct it? It's the only VM that is experiencing this issue.
 
Can you have a look at the task history of the vm?

Yes, for the snapshot I mentioned above it shows: "TASK OK" for one entry and then another entry shows: "TASK ERROR: snapshot name 'Test1' already used" when I attempted to take another snapshot using the same name. "Test1" still does not show up in the Proxmox UI for that particular VM.
 
Hi,
could you check which snapshots are present in ZFS directly (i.e. zfs list <DATASET>/vm-110-disk-<N> -r -t all)? Could you also share the configuration file /etc/pve/qemu-server/110.conf of the VM?
 
Hi,
could you check which snapshots are present in ZFS directly (i.e. zfs list <DATASET>/vm-110-disk-<N> -r -t all)? Could you also share the configuration file /etc/pve/qemu-server/110.conf of the VM?

Looks like the snapshots are definitely on the ZFS filesystem:

root@proxmox2:~# zfs list rpool/data/vm-110-disk-1 -r -t all NAME USED AVAIL REFER MOUNTPOINT rpool/data/vm-110-disk-1 1.24T 1.43T 1.24T - rpool/data/vm-110-disk-1@Pre_Upgrade 2.55G - 1.23T - rpool/data/vm-110-disk-1@Pre_File_Change 151M - 1.24T - rpool/data/vm-110-disk-1@test 0B - 1.24T - rpool/data/vm-110-disk-1@Test1 0B - 1.24T -

root@proxmox2:~# cat /etc/pve/qemu-server/110.conf #PHP 7.2 agent: 1 balloon: 2048 boot: cn bootdisk: virtio0 cores: 6 memory: 4096 name: ownCloud net0: virtio=4A:56:66:AB:E6:D3,bridge=vmbr0 numa: 1 onboot: 1 ostype: l26 parent: Test1 protection: 1 scsihw: virtio-scsi-pci smbios1: uuid=297aed7a-f439-45c2-8915-ae7f0483778a sockets: 2 startup: order=2,up=60 vcpus: 6 virtio0: local-zfs:vm-110-disk-1,size=1605286976553 [Pre_File_Change] agent: 1 balloon: 2048 boot: cn bootdisk: virtio0 cores: 6 memory: 4096 name: ownCloud net0: virtio=4A:56:66:AB:E6:D3,bridge=vmbr0 numa: 1 onboot: 1 ostype: l26 parent: Pre_Upgrade protection: 1 scsihw: virtio-scsi-pci smbios1: uuid=297aed7a-f439-45c2-8915-ae7f0483778a snaptime: 1588962906 sockets: 2 startup: order=2,up=60 vcpus: 6 virtio0: local-zfs:vm-110-disk-1,size=1605286976553 [Pre_Upgrade] #Pre upgrade to ownCloud 10.4.1 agent: 1 balloon: 2048 boot: cn bootdisk: virtio0 cores: 6 memory: 4096 name: ownCloud net0: virtio=4A:56:66:AB:E6:D3,bridge=vmbr0 numa: 1 onboot: 1 ostype: l26 parent: Pre_Upgrade protection: 1 scsihw: virtio-scsi-pci smbios1: uuid=297aed7a-f439-45c2-8915-ae7f0483778a snaptime: 1588875257 sockets: 2 startup: order=2,up=60 vcpus: 6 virtio0: local-zfs:vm-110-disk-1,size=1605286976553 [Test1] #This is a test agent: 1 balloon: 2048 boot: cn bootdisk: virtio0 cores: 6 memory: 4096 name: ownCloud net0: virtio=4A:56:66:AB:E6:D3,bridge=vmbr0 numa: 1 onboot: 1 ostype: l26 parent: test protection: 1 scsihw: virtio-scsi-pci smbios1: uuid=297aed7a-f439-45c2-8915-ae7f0483778a snaptime: 1588974782 sockets: 2 startup: order=2,up=60 vcpus: 6 virtio0: local-zfs:vm-110-disk-1,size=1605286976553 [test] #You are here! agent: 1 balloon: 2048 boot: cn bootdisk: virtio0 cores: 6 memory: 4096 name: ownCloud net0: virtio=4A:56:66:AB:E6:D3,bridge=vmbr0 numa: 1 onboot: 1 ostype: l26 parent: Pre_File_Change protection: 1 scsihw: virtio-scsi-pci smbios1: uuid=297aed7a-f439-45c2-8915-ae7f0483778a snaptime: 1588972714 sockets: 2 startup: order=2,up=60 vcpus: 6 virtio0: local-zfs:vm-110-disk-1,size=1605286976553
 
Code:
[Pre_Upgrade]
#Pre upgrade to ownCloud 10.4.1
...
parent: Pre_Upgrade
...

The problem is here. This snapshot is marked as its own parent for some reason. Try removing the line parent: Pre_Upgrade. Did you do anything special when creating this snapshot?
 
  • Like
Reactions: mhayhurst
Code:
[Pre_Upgrade]
#Pre upgrade to ownCloud 10.4.1
...
parent: Pre_Upgrade
...

The problem is here. This snapshot is marked as its own parent for some reason. Try removing the line parent: Pre_Upgrade. Did you do anything special when creating this snapshot?

Thank you, that fixed it!!! I created the snapshots within the Proxmox UI...nothing special or anything on the backend was done except the "Test1" and "test" snapshots that were created in the CLI but only after I noticed this problem.

1589284830395.png
 
Thank you, that fixed it!!! I created the snapshots within the Proxmox UI...nothing special or anything on the backend was done except the "Test1" and "test" snapshots that were created in the CLI but only after I noticed this problem.

View attachment 17098

Good to hear! The question remains why that happened. Did you also do rollbacks or removals or did you only create those two snapshots? Were any other changes made to the configuration around that time?
 
Good to hear! The question remains why that happened. Did you also do rollbacks or removals or did you only create those two snapshots? Were any other changes made to the configuration around that time?

Yes, I agree but the only major thing I've done these past few weeks to that particular node is run an Update --> Refresh and Upgrade followed by a Reboot through the Proxmox UI. I did not perform any Rollbacks but I deleted one or two Snapshots and created those first two Snapshots (Pre_Upgrade and Pre_File_Change) all done through the Proxmox UI. Proxmox has been very solid on that specific node so I have not messed with anything on the back-end for a very long time.
 
We experienced the same bug today. A relatively fresh install of 6.3-3 is setting every snapshot's parent as itself. We typically create snapshots of a handful of VMs using a script, which is where we first saw the bug - but any snapshot created using the PVE UI or pvesh CLI results in the same bug occurrng. A reboot of the PVE system did not help.

EDIT: Initially after rebooting, the issue was still occurring. After writing this post, both the UI and CLI appear to be working properly again.

Our pveversion -v:

Code:
proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-2
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
 
Last edited:
Hi,
We experienced the same bug today. A relatively fresh install of 6.3-3 is setting every snapshot's parent as itself. We typically create snapshots of a handful of VMs using a script, which is where we first saw the bug - but any snapshot created using the PVE UI or pvesh CLI results in the same bug occurrng. A reboot of the PVE system did not help.

EDIT: Initially after rebooting, the issue was still occurring. After writing this post, both the UI and CLI appear to be working properly again.
so far, I'm unable to trigger this behavior here. Do you still have the configuration for a VM before the problematic snapshot was made (maybe from a backup)? Did all affected VMs already have some existing snapshot(s)? Would you mind sharing the script (or at least the part for actually making the snapshots)?

Edit: Are there any errors or warnings in the task logs of the affected VMs?
 
Last edited:
I also hit this today on 7.0-13. There were no errors in the log for the snapshot itself, but there were problems before.

The VM had a previous snapshot which I tried to delete via the GUI. It appears the snapshot RAW file was deleted, but the snapshot delete took too long, and gave this error: "TASK ERROR: VM 501 qmp command 'blockdev-snapshot-delete-internal-sync' failed - got timeout", and the VM was unresponsive to networking. I was able to issue a "stop" command and the VM turned off. Restarting the VM restored normal operation.

The conf file still had the old snapshot listed, so I deleted that section. I then started a new snapshot from the GUI which completed and all the required files were in place. However, the snapshot GUI was blank. After I found this thread, I did find the "parent:" line with the snapshot name. Once I removed that line, the snapshot showed up in the GUI.
 
Last edited:
I've had the same issue and removing the wrong 'parent' line indeed solves the GUI issue.
But the next snapshot I've made didn't work.
I did the snapshot and immediately Rolled back for testing, and it did NOT rollback to the last snapshot but to the one that was deleted last before the bug happened, which did not appear in the GUI and should have not existed anymore.

I'm going to try creating various snapshots at once and see if the subsequent ones are healthy.

EDIT: To be a bit clearer, here are some more details :
- This VM had 1 snapshot.
- It was deleted, then another one was created.
- The GUI wouldn't show any snapshots after succesfully creating the last one (and, thus, the only one for that VM). Note that the GUI didn't show the "NOW" state either, snapshot GUI is empty.
- I deleted the faulty line that made the snapshot a parent of itself
- The snapshot reappeared
- I deleted it again, to make sure
- And recreated again. It worked and showed up in the GUI normally.
- I Rolled back to it, but it didn't Rollback to this version, it rolled back to a way older snapshot (I'm guessing the one that faulted when deleted and that created the whole issue).

All of these snapshots were taken with the machine OFF ("RAM: No").
Also note all these snapshots used the same name.
 
Last edited:
After some testing, I think that my particular issue where the snapshot will not update and keep using some old "ghost-snapshot" only occurs when the exact same name is used. Comments seem to have no influence.

Here is how this was tested (actions here start after the actions described in the previous post) :
- I deleted the snapshot (which still had the same name than the faulty one, let's call it "Snapshot-F"), because it was faulty.
- I created a snapshot with another name, and restored it.
- It worked.
- I kept this healthy snapshot and recreated a backup named "Snapshot-F", the faulty name.
- I restored "Snapshot-F". It did NOT restore to the state expected, but to some older state that makes no sense (only sense it makes is, as stated, that it is somehow kept in memory since the initial bug).
- I kept both backups and restored to the healthy one. It worked.

I will simply change the name, but I was keeping the same one because of scripts.

Not that, again, all snapshots are done with the VM being OFF here.
 
Hi,
After some testing, I think that my particular issue where the snapshot will not update and keep using some old "ghost-snapshot" only occurs when the exact same name is used. Comments seem to have no influence.
please share the output of pveversion -v and the output of cat /etc/pve/qemu-server/<ID>.conf with the actual VM ID.

Should you be able to reproduce the issue, the configuration would be interesting for each here below, i.e. after the steps:
Here is how this was tested (actions here start after the actions described in the previous post) :
here
- I deleted the snapshot (which still had the same name than the faulty one, let's call it "Snapshot-F"), because it was faulty.
here
- I created a snapshot with another name, and restored it.
- It worked.
- I kept this healthy snapshot and recreated a backup named "Snapshot-F", the faulty name.
here (What exactly does recreated a backup named "Snapshot-F" mean? Or do you just mean you created a new snapshot with that name?)
- I restored "Snapshot-F". It did NOT restore to the state expected, but to some older state that makes no sense (only sense it makes is, as stated, that it is somehow kept in memory since the initial bug).
and here. Nothing should be kept in memory related to snapshots after the operation is finished. The relevant information is rather written to the configuration.
- I kept both backups and restored to the healthy one. It worked.

I will simply change the name, but I was keeping the same one because of scripts.

Not that, again, all snapshots are done with the VM being OFF here.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!