Backup job stuck on backing up LXC with unavailable NFS share

Dunuin

Distinguished Member
Jun 30, 2020
14,793
4,627
258
Germany
Hi,

Looks like my backup jobs gets stuck when backing up my running PBS LXC in snapshot mode (to another PBS):
Code:
INFO: Starting Backup of VM 143 (lxc)
INFO: Backup started at 2022-12-12 12:24:38
INFO: status = running
INFO: CT Name: PBS2
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
snapshot create failed: starting cleanup
trying to acquire lock...
Stuck for over 90 minutes...then I stopped the backup job.

Backing it up while the LXC is stopped works fine:
Code:
INFO: Backup started at 2022-12-12 13:37:07
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: PBS2
INFO: including mount point rootfs ('/') in backup
INFO: creating Proxmox Backup Server archive 'ct/143/2022-12-12T12:37:07Z'
INFO: run: /usr/bin/proxmox-backup-client backup --crypt-mode=encrypt --keyfd=16 pct.conf:/var/tmp/vzdumptmp2053366_143/etc/vzdump/pct.conf root.pxar:/mnt/vzsnap0 --include-dev /mnt/vzsnap0/./ --skip-lost-and-found --exclude=/tmp/?* --exclude=/var/tmp/?* --exclude=/var/run/?*.pid --backup-type ct --backup-id 143 --backup-time 1670848627 --repository XXXX@pbs@pbs001.tuxis.nl:DBXXXX_PBStuxis --ns MainCluster/Daily
INFO: Starting backup: [MainCluster/Daily]:ct/143/2022-12-12T12:37:07Z
INFO: Client name: j3710
INFO: Starting backup protocol: Mon Dec 12 13:37:07 2022
INFO: Using encryption key from file descriptor..
INFO: Encryption key fingerprint: XX:XX:XX:XX:XX:XX:XX:XX
INFO: Downloading previous manifest (Sat Dec 10 06:55:24 2022)
INFO: Upload config file '/var/tmp/vzdumptmp2053366_143/etc/vzdump/pct.conf' to 'DBXXXX@pbs@pbs001.tuxis.nl:8007:DBXXXX_PBStuxis' as pct.conf.blob
INFO: Upload directory '/mnt/vzsnap0' to 'DBXXXX@pbs@pbs001.tuxis.nl:8007:DBXXXX_PBStuxis' as root.pxar.didx
INFO: root.pxar: had to backup 378.379 MiB of 1.106 GiB (compressed 112.269 MiB) in 44.13s
INFO: root.pxar: average backup speed: 8.575 MiB/s
INFO: root.pxar: backup was done incrementally, reused 753.838 MiB (66.6%)
INFO: Uploaded backup catalog (665.634 KiB)
INFO: Duration: 44.71s
INFO: End Time: Mon Dec 12 13:37:52 2022
INFO: adding notes to backup
INFO: Finished Backup of VM 143 (00:00:46)
INFO: Backup finished at 2022-12-12 13:37:53
INFO: Backup job finished successfully
TASK OK

There is also no problem backing up all other LXCs and VMs. My guess would be that the snapshot mode backup got a problem with it because my PBS datastore of that privileged LXC, which is on a NFS share, isn't available? My idea was to keep that PBS LXC running and only enable maintaince mode for the datastore while my NAS, serving the NFS share, is shutdown to save power.

All guests are stored on a fine-working ZFS pool.

Or is there another problem?
 
Hi,
Looks like my backup jobs gets stuck when backing up my running PBS LXC in snapshot mode (to another PBS):
Code:
INFO: Starting Backup of VM 143 (lxc)
INFO: Backup started at 2022-12-12 12:24:38
INFO: status = running
INFO: CT Name: PBS2
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
snapshot create failed: starting cleanup
trying to acquire lock...
Stuck for over 90 minutes...then I stopped the backup job.
it's unfortunate that we only output the "snapshot create"-error after cleanup and the cleanup is stuck in your case :/

With the following, you should see why creating the snapshot fails:
Code:
root@pve701 ~ # cat print-error-early.patch
diff --git a/src/PVE/AbstractConfig.pm b/src/PVE/AbstractConfig.pm
index a0c0bc6..11d5b3e 100644
--- a/src/PVE/AbstractConfig.pm
+++ b/src/PVE/AbstractConfig.pm
@@ -836,7 +836,7 @@ sub snapshot_create {
     }
 
     if ($err) {
-    warn "snapshot create failed: starting cleanup\n";
+    warn "snapshot create failed: $err - starting cleanup\n";
     eval { $class->snapshot_delete($vmid, $snapname, 1, $drivehash); };
     warn "$@" if $@;
     die "$err\n";
--
2.30.2
root@pve701 ~ # patch /usr/share/perl5/PVE/AbstractConfig.pm print-error-early.patch
patching file /usr/share/perl5/PVE/AbstractConfig.pm
root@pve701 ~ # systemctl reload-or-restart pvedaemon.service pveproxy.service

There is also no problem backing up all other LXCs and VMs. My guess would be that the snapshot mode backup got a problem with it because my PBS datastore of that privileged LXC, which is on a NFS share, isn't available? My idea was to keep that PBS LXC running and only enable maintaince mode for the datastore while my NAS, serving the NFS share, is shutdown to save power.
That's a good guess. How is the NFS mounted in the LXC? If the mount was still active when the NFS went off, that leads to a "hanging" mount and those often cause problems. Please share the output of pveversion -v and pct config <ID>.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!