Proxmox Backup task keeps running after backup is done

bgkprmx

New Member
Aug 24, 2023
7
0
1
I have recently setup a new PBS and pointed the pve to backup to that.

PBS (3.3.2)
PVE (8.3.2)

Here's the backup log

Code:
INFO: starting new backup job: vzdump 993 --storage bulk-pbs --notification-mode auto --node pve1 --remove 0 --notes-template '{{guestname}}' --mode stop
INFO: Starting Backup of VM 993 (lxc)
INFO: Backup started at 2025-01-08 23:09:07
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: network-test
INFO: including mount point rootfs ('/') in backup
INFO: creating Proxmox Backup Server archive 'ct/993/2025-01-09T04:09:07Z'
INFO: set max number of entries in memory for file-based backups to 1048576
INFO: run: lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- /usr/bin/proxmox-backup-client backup --crypt-mode=none pct.conf:/var/tmp/vzdumptmp1604631_993/etc/vzdump/pct.conf root.pxar:/mnt/vzsnap0 --include-dev /mnt/vzsnap0/./ --skip-lost-and-found --exclude=/tmp/?* --exclude=/var/tmp/?* --exclude=/var/run/?*.pid --backup-type ct --backup-id 993 --backup-time 1736395747 --entries-max 1048576 --repository pbs-pve@pbs@192.168.101.91:bulk-pbs-pve
INFO: Starting backup: ct/993/2025-01-09T04:09:07Z   
INFO: Client name: pve1   
INFO: Starting backup protocol: Wed Jan  8 23:09:07 2025   
INFO: No previous manifest available.   
INFO: Upload config file '/var/tmp/vzdumptmp1604631_993/etc/vzdump/pct.conf' to 'pbs-pve@pbs@192.168.101.91:8007:bulk-pbs-pve' as pct.conf.blob   
INFO: Upload directory '/mnt/vzsnap0' to 'pbs-pve@pbs@192.168.101.91:8007:bulk-pbs-pve' as root.pxar.didx   
INFO: root.pxar: had to backup 716.363 MiB of 716.363 MiB (compressed 250.354 MiB) in 7.73 s (average 92.731 MiB/s)
INFO: Uploaded backup catalog (478.271 KiB)
INFO: Duration: 8.08s   
INFO: End Time: Wed Jan  8 23:09:15 2025   
INFO: adding notes to backup
INFO: Finished Backup of VM 993 (00:00:08)
INFO: Backup finished at 2025-01-08 23:09:15
INFO: Backup job finished successfully


I can see the backup file is created in the PBS too, but the backup task in the PVE is still running (stuck?) even after 12 hours passed.


1736442626721.png

Any idea how to fix it, where should I look?


Thanks,
 
Is your datastore on a hdd and the "verify after backup" option enabled? Verify is known to take looong on spinning discs
 
Is your datastore on a hdd and the "verify after backup" option enabled? Verify is known to take looong on spinning discs
Yes, the datastore is on 3x16TB HDD with RAIDZ1. Even though the LXC disk size I am backing up is 8G, it's supposed to take 13+ hours (and still running)?

This is what's set on the PBS datastore (default values)


1736445135403.png
 
Yes, the datastore is on 3x16TB HDD with RAIDZ1. Even though the LXC disk size I am backing up is 8G, it's supposed to take 13+ hours (and still running)?

This sounds extreme but RAIDz in general and hdds are each known for performance issues, even more together. It shouldn't be that worse for eight GB though.
If you gave a ssd available on the PBS could you please try to create a temporary datastore on it to see whether that makes any difference?

Another issue might be the network but eight GB isn't that much data to cause trouble even on a slow line.

This is what's set on the PBS datastore (default values)


View attachment 80463
The notify options only affects in which cases a notification will be sent not the actual verify. The interesting option would be "verify New snapshots", which is already disabled thus not the culprit.

Could you please copy-paste the whole output of the job?
 
This sounds extreme but RAIDz in general and hdds are each known for performance issues, even more together. It shouldn't be that worse for eight GB though.
If you gave a ssd available on the PBS could you please try to create a temporary datastore on it to see whether that makes any difference?

Another issue might be the network but eight GB isn't that much data to cause trouble even on a slow line.


The notify options only affects in which cases a notification will be sent not the actual verify. The interesting option would be "verify New snapshots", which is already disabled thus not the culprit.

Could you please copy-paste the whole output of the job?
So, I created a test datastore which is on an SSD. I stopped the old running backup task.

The same disk got backed up on the SSD datastore within 4 seconds.

This is the output of the backup job (on SSD datastore)

Code:
INFO: starting new backup job: vzdump 993 --node pve1 --storage pbs-pve-test-ssd --notification-mode auto --notes-template '{{guestname}}' --mode stop --remove 0
INFO: Starting Backup of VM 993 (lxc)
INFO: Backup started at 2025-01-09 13:36:18
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: network-test
INFO: including mount point rootfs ('/') in backup
INFO: creating Proxmox Backup Server archive 'ct/993/2025-01-09T18:36:18Z'
INFO: set max number of entries in memory for file-based backups to 1048576
INFO: run: lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- /usr/bin/proxmox-backup-client backup --crypt-mode=none pct.conf:/var/tmp/vzdumptmp2601291_993/etc/vzdump/pct.conf root.pxar:/mnt/vzsnap0 --include-dev /mnt/vzsnap0/./ --skip-lost-and-found --exclude=/tmp/?* --exclude=/var/tmp/?* --exclude=/var/run/?*.pid --backup-type ct --backup-id 993 --backup-time 1736447778 --entries-max 1048576 --repository pbs-pve@pbs@192.168.101.91:test-pbs-ssd
INFO: Starting backup: ct/993/2025-01-09T18:36:18Z  
INFO: Client name: pve1  
INFO: Starting backup protocol: Thu Jan  9 13:36:18 2025  
INFO: No previous manifest available.  
INFO: Upload config file '/var/tmp/vzdumptmp2601291_993/etc/vzdump/pct.conf' to 'pbs-pve@pbs@192.168.101.91:8007:test-pbs-ssd' as pct.conf.blob  
INFO: Upload directory '/mnt/vzsnap0' to 'pbs-pve@pbs@192.168.101.91:8007:test-pbs-ssd' as root.pxar.didx  
INFO: root.pxar: had to backup 716.363 MiB of 716.363 MiB (compressed 250.354 MiB) in 2.24 s (average 320.145 MiB/s)
INFO: Uploaded backup catalog (478.271 KiB)
INFO: Duration: 2.41s  
INFO: End Time: Thu Jan  9 13:36:21 2025  
INFO: adding notes to backup
INFO: Finished Backup of VM 993 (00:00:03)
INFO: Backup finished at 2025-01-09 13:36:21
INFO: Backup job finished successfully
TASK OK


I tried to do the same LXC backup on the 3x16 TB RAIDz pool after that and same thing, it is showing "Backup job finished successfully" but the backup task is still running,

Backup on HDD datastore output:

Code:
INFO: starting new backup job: vzdump 993 --notes-template '{{guestname}}' --mode stop --remove 0 --node pve1 --storage bulk-pbs --notification-mode auto
INFO: Starting Backup of VM 993 (lxc)
INFO: Backup started at 2025-01-09 13:39:03
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: network-test
INFO: including mount point rootfs ('/') in backup
INFO: creating Proxmox Backup Server archive 'ct/993/2025-01-09T18:39:03Z'
INFO: set max number of entries in memory for file-based backups to 1048576
INFO: run: lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- /usr/bin/proxmox-backup-client backup --crypt-mode=none pct.conf:/var/tmp/vzdumptmp2605767_993/etc/vzdump/pct.conf root.pxar:/mnt/vzsnap0 --include-dev /mnt/vzsnap0/./ --skip-lost-and-found --exclude=/tmp/?* --exclude=/var/tmp/?* --exclude=/var/run/?*.pid --backup-type ct --backup-id 993 --backup-time 1736447943 --entries-max 1048576 --repository pbs-pve@pbs@192.168.101.91:bulk-pbs-pve
INFO: Starting backup: ct/993/2025-01-09T18:39:03Z  
INFO: Client name: pve1  
INFO: Starting backup protocol: Thu Jan  9 13:39:03 2025  
INFO: Downloading previous manifest (Thu Jan  9 13:37:04 2025)  
INFO: Upload config file '/var/tmp/vzdumptmp2605767_993/etc/vzdump/pct.conf' to 'pbs-pve@pbs@192.168.101.91:8007:bulk-pbs-pve' as pct.conf.blob  
INFO: Upload directory '/mnt/vzsnap0' to 'pbs-pve@pbs@192.168.101.91:8007:bulk-pbs-pve' as root.pxar.didx  
INFO: root.pxar: had to backup 0 B of 716.363 MiB (compressed 0 B) in 1.47 s (average 0 B/s)
INFO: root.pxar: backup was done incrementally, reused 716.363 MiB (100.0%)
INFO: Uploaded backup catalog (478.271 KiB)
INFO: Duration: 1.55s  
INFO: End Time: Thu Jan  9 13:39:05 2025  
INFO: adding notes to backup
INFO: Finished Backup of VM 993 (00:00:02)
INFO: Backup finished at 2025-01-09 13:39:05
INFO: Backup job finished successfully
 
Could you please open the host console, enter following commands and paste the results here?
Bash:
ps ax|grep proxmox-backup-client
Bash:
ps ax|grep vzdump
 
Last edited:
Please post the output of cat /var/run/vzdump.pid while the task is seemingly hanging. Does that file exist at that point? Do you have notifications set up for the backup task?
 
  • Like
Reactions: Johannes S
Please post the output of cat /var/run/vzdump.pid while the task is seemingly hanging. Does that file exist at that point? Do you have notifications set up for the backup task?
I had to kill the old task running. However, I started new backup this morning and it's been running for about 2 hours.
This is the the output of cat /var/run/vzdump.pid

UPID:pve1:00035A23:02F4A347:67814FA1:vzdump:216:root@pam:

This backup task is currently running

Code:
INFO: starting new backup job: vzdump 216 --notes-template '{{guestname}}' --mode stop --remove 0 --node pve1 --storage bulk-pbs --notification-mode auto
INFO: Starting Backup of VM 216 (lxc)
INFO: Backup started at 2025-01-10 11:49:37
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: controller
INFO: including mount point rootfs ('/') in backup
INFO: stopping virtual guest
INFO: creating Proxmox Backup Server archive 'ct/216/2025-01-10T16:49:37Z'
INFO: set max number of entries in memory for file-based backups to 1048576
INFO: run: lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- /usr/bin/proxmox-backup-client backup --crypt-mode=none pct.conf:/var/tmp/vzdumptmp219683_216/etc/vzdump/pct.conf root.pxar:/mnt/vzsnap0 --include-dev /mnt/vzsnap0/./ --skip-lost-and-found --exclude=/tmp/?* --exclude=/var/tmp/?* --exclude=/var/run/?*.pid --backup-type ct --backup-id 216 --backup-time 1736527777 --entries-max 1048576 --repository pbs-pve@pbs@192.168.101.91:bulk-pbs-pve
INFO: Starting backup: ct/216/2025-01-10T16:49:37Z   
INFO: Client name: pve1   
INFO: Starting backup protocol: Fri Jan 10 11:49:42 2025   
INFO: No previous manifest available.   
INFO: Upload config file '/var/tmp/vzdumptmp219683_216/etc/vzdump/pct.conf' to 'pbs-pve@pbs@192.168.101.91:8007:bulk-pbs-pve' as pct.conf.blob   
INFO: Upload directory '/mnt/vzsnap0' to 'pbs-pve@pbs@192.168.101.91:8007:bulk-pbs-pve' as root.pxar.didx   
INFO: root.pxar: had to backup 2.686 GiB of 3.204 GiB (compressed 1.218 GiB) in 8.90 s (average 309.087 MiB/s)
INFO: root.pxar: backup was done incrementally, reused 530.6 MiB (16.2%)
INFO: Uploaded backup catalog (701.371 KiB)
INFO: Duration: 9.54s   
INFO: End Time: Fri Jan 10 11:49:52 2025   
INFO: adding notes to backup
INFO: restarting vm
INFO: guest is online again after 17 seconds
INFO: Finished Backup of VM 216 (00:00:17)
INFO: Backup finished at 2025-01-10 11:49:54
INFO: Backup job finished successfully
 
UPID:pve1:00035A23:02F4A347:67814FA1:vzdump:216:root@pam:
I assume this matches the UPID for the currently running job? Is that the case?

What about the notifications? There should not happen much else between the Backup job finished successfully log message and removing the vzdump.pid file, which is still present in your case. Please also check the systemd journal for errors around the time of the Backup finished log message.
 
I assume this matches the UPID for the currently running job? Is that the case?

What about the notifications? There should not happen much else between the Backup job finished successfully log message and removing the vzdump.pid file, which is still present in your case. Please also check the systemd journal for errors around the time of the Backup finished log message.
Hi Chris, yes the UPID matches the currently running job.

Notification: I do have SMTP notification setup for the backup tasks. I get the notification only after "TASK OK". I do not get notification when the backup task gets stuck.

One thing to note is: I tried to backup couple of VMs on a locally setup "backup" storage which is on ZFS. The backup tasks get stuck in the local backup storage too, not only the PBS. So it looks like this is not a PBS related issue, rather a PVE issue. The hanging backup tasks issue seems to be intermittent. sometimes the backup gets successfully and shows "TASK OK", then sends notification. sometimes the task get stuck after "INFO: Backup job finished successfully" message.

I will check the systemd journal for errors and post back soon.
 
I assume this matches the UPID for the currently running job? Is that the case?

What about the notifications? There should not happen much else between the Backup job finished successfully log message and removing the vzdump.pid file, which is still present in your case. Please also check the systemd journal for errors around the time of the Backup finished log message.
So I checked the system journal logs, found nothing interesting when I compared between "stuck" tasks and finished tasks.

However, I removed the SMTP notification and setup webhooks. I did a couple of backups and now I don't see any stuck tasks. All backup seems to be finishing completely now. @Chris Thanks for the hint.
 
Last edited: