pct snapshot <vmid> <snapname> not working reliably

efinley

Member
Jul 16, 2018
25
1
23
54
I have a script that snapshots all VMs and containers each night. After upgrading a few nights ago, 'pct snapshot' no longer works reliably. The script will issue the command, it shows up in the GUI log 'CT 119 - Snapshot', but the snapshot doesn't actually happen (sometimes - it's hit and miss). I've been using the script reliably for over a year and it just started doing this after this last upgrade. Is this the right place to post this? Does anyone have any idea as to what's going on?

root@vsys10:~# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-1-pve)
pve-manager: 6.0-6 (running version: 6.0-6/c71f879f)
pve-kernel-5.0: 6.0-7
pve-kernel-helper: 6.0-7
pve-kernel-4.15: 5.4-8
pve-kernel-5.0.21-1-pve: 5.0.21-1
pve-kernel-5.0.18-1-pve: 5.0.18-3
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.15.18-12-pve: 4.15.18-36
ceph-fuse: 12.2.11+dfsg1-2.1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.11-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-4
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-7
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-64
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-7
pve-cluster: 6.0-5
pve-container: 3.0-5
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve2
 
Do backups work(when run manually) without the script? If so check script syntax. Not saying your script is faulty, suggesting the upgrade could have broken the script.
 
Do backups work(when run manually) without the script? If so check script syntax. Not saying your script is faulty, suggesting the upgrade could have broken the script.

TL;DR
It's not a syntax error in the script.

Longer answer:

The snapshot shows up in the GUI log. It's in the log as if it worked (no errors) and yet the snapshot doesn't actually happen.

Sometimes it works, sometimes it doesn't, but it always shows up in the log as if it worked.

It's the same script and even the same function in the script that does the snapshot of the VMs and containers. The only thing that changes is that for VMs, it uses 'qm' and for containers it uses 'pct'. The VM snapshots are still working reliably.

The line in the script that does the snapshot is:
`$cmd snapshot $id $name`;

Where $cmd is either '/usr/sbin/qm' or '/usr/sbin/pct' depending on whether it's a VM or container respectively.
 
On what storage are the CTs/VMs? Also, can you manually execute a pct snapshot through CLI and post the output here?

The underlying storage is local ZFS. zvols for VMs and zfs datasets for containers.

root@vsys11:~# pct snapshot 120 periodic_running_2019_09_06_11_53_00 root@vsys11:~#

This snapshot worked.
 
hi,

can you run your script manually and post the output? if your script is suppressing stderr, make sure you print it out so we can see possible errors.
 
hi,

can you run your script manually and post the output? if your script is suppressing stderr, make sure you print it out so we can see possible errors.

I have ran it manually many times. There is no output because there are no errors.

The pct snapshot command doesn't throw an error, it returns with no output, just like normal.
The entry in the GUI tasks log shows up just like normal and the status is 'OK'.

But (sometimes) the snapshot doesn't show up in the snapshots listing and there is no snapshot on the underlying dataset.

This doesn't happen all the time, it only happens sometimes and it's with different containers each time. I run enough containers across enough hosts that I end up with between 3-10 containers that don't get a daily snapshot. The next time the script runs, it will usually snapshot the ones that didn't last time and miss others, but it's just completely random.

I've verified that the command is issued. The 'OK' status even shows up in the GUI tasks log, but the snap doesn't happen.

I know it's extremely hard/frustrating to troubleshoot something that isn't consistently broken. If I could get it to consistently fail, the fix would be easy to find. But I haven't been able to get it to consistently fail. The only data point that may be of value, is that the VM snapshots done with the 'qm' command have been very reliable even when the container snapshots done with the 'pct' command have not.

I'm considering putting logic in the script that actually checks for the existence of the snapshot after the command is issued and it returns without error. This obviously isn't the correct fix, but my snapshots/backups are important.
 
Last edited:
I would like to conclude this thread by apologizing for wasting anyone's time. It turns out that 'pct snapshot' is reliable.

Here's what was happening:

The script goes through and snapshots all the VMs with 'qm snapshot', then snapshots all the containers with 'pct snapshot'. The script then goes through and deletes old snapshots (oldest first) until there are only 5 left. It would do this by listing the snapshots with 'qm listsnapshot' and 'pct listsnapshot' and deleting the first one listed.

'qm listsnapshot' lists the snapshots in date order so this worked.

'pct listsnapshot' lists the snapshots in seemingly random order and sometimes it was the newest snapshot that was listed first so it would be deleted.

The script was fixed by sorting the listed results.

Again, I apologize for wasting anyone's time with this.
 
  • Like
Reactions: guletz
hi,

'qm listsnapshot' lists the snapshots in date order so this worked.

'pct listsnapshot' lists the snapshots in seemingly random order and sometimes it was the newest snapshot that was listed first so it would be deleted.

it looks like these are missing parity. thanks for investigating, pct listsnapshot also needs to order by date. we'll fix it.
 
sent patches for this in the mailing list, it should be fixed with new versions after they're applied. thanks!
 
  • Like
Reactions: ca_maer and guletz

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!