container backup fails on pbs (locally works, other containers to pbs work too)

pva

New Member
Feb 29, 2024
5
0
1
Hi! I've configured pve cluster and pbs. Now, I'm trying to backup all containers to PBS and everything works for all but one container that fails with the following log:

Code:
INFO: starting new backup job: vzdump 192 --notification-mode auto --storage iv-pbs0 --remove 0 --mode snapshot --notes-template '{{guestname}}' --node cf-pve2
INFO: Starting Backup of VM 192 (lxc)
INFO: Backup started at 2024-05-03 00:20:01
INFO: status = running
INFO: CT Name: cf-pve2-ct
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
INFO: creating Proxmox Backup Server archive 'ct/192/2024-05-02T21:20:01Z'
INFO: set max number of entries in memory for file-based backups to 1048576
INFO: run: lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- /usr/bin/proxmox-backup-client backup --crypt-mode=none pct.conf:/var/tmp/vzdumptmp805226_192/etc/vzdump/pct.conf root.pxar:/mnt/vzsnap0 --include-dev /mnt/vzsnap0/./ --skip-lost-and-found --exclude=/tmp/?* --exclude=/var/tmp/?* --exclude=/var/run/?*.pid --backup-type ct --backup-id 192 --backup-time 1714684801 --entries-max 1048576 --repository cf@pbs@***.***.**.***:backups0 --ns cf
INFO: Starting backup: [cf]:ct/192/2024-05-02T21:20:01Z
INFO: Client name: cf-pve2
INFO: Starting backup protocol: Fri May  3 00:20:06 2024
INFO: No previous manifest available.
INFO: Upload config file '/var/tmp/vzdumptmp805226_192/etc/vzdump/pct.conf' to 'cf@pbs@***.***.**.***:8007:backups0' as pct.conf.blob
INFO: Upload directory '/mnt/vzsnap0' to 'cf@pbs@***.***.**.***:8007:backups0' as root.pxar.didx
INFO: HTTP/2.0 connection failed
INFO: catalog upload error - channel closed
INFO: Error: connection reset
INFO: cleanup temporary 'vzdump' snapshot
ERROR: Backup of VM 192 failed - command 'lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- /usr/bin/proxmox-backup-client backup '--crypt-mode=none' pct.conf:/var/tmp/vzdumptmp805226_192/etc/vzdump/pct.conf root.pxar:/mnt/vzsnap0 --include-dev /mnt/vzsnap0/./ --skip-lost-and-found '--exclude=/tmp/?*' '--exclude=/var/tmp/?*' '--exclude=/var/run/?*.pid' --backup-type ct --backup-id 192 --backup-time 1714684801 --entries-max 1048576 --repository cf@pbs@***.***.**.***:backups0 --ns cf' failed: exit code 255
INFO: Failed at 2024-05-03 00:33:12
INFO: Backup job finished with errors
INFO: notified via target `mail-to-root`
TASK ERROR: job errors

Now, although the error is quite clear about the reasons, I don't understand why I have it. I've backed up three times larger CT to the same pbs without any problems and any other containers was backed up without problem. Also, I've tried to backup this container locally and that works. In pbs I see tasks and there are no visible problem with this one. Last entries in the log:
Code:
2024-05-03T00:33:06+03:00: POST /dynamic_chunk
2024-05-03T00:33:06+03:00: upload_chunk done: 16777216 bytes, eed4e211094975fe5780b0cafac187c02d227aad01f658751fe17062f348443a
2024-05-03T00:33:06+03:00: upload_chunk done: 5768111 bytes, 5b510c689a94cddd79648e258a3270a14ccc77a8304100a8eaac31ca8dfd5e21
2024-05-03T00:33:06+03:00: POST /dynamic_chunk

This is the latest pve/pbs versions.

# pveversion
pve-manager/8.2.2/9355359cd7afbae4 (running kernel: 6.8.4-2-pve)

So, my question is, how can I debug this problem? Is it possible to configure retry policy? Can I configure upload bandwidth may be? I mean I would like to try different network options if any to see if that helps... Thanks in advance for any help.
 
do you see anything on the PBS side for that backup attempt (journal, PBS task log, PBS access log)?
 
do you see anything on the PBS side for that backup attempt (journal, PBS task log, PBS access log)?
Yes, I do. POST /dynamic_chunk above are from PBS side. But the whole log is very long, so that are only the last messages.

That log I've took from the longest tasks:

1714809683558.png

And numbers look sane here. The backup job for ct 191 took that long since that is the largest container. Two long backup jobs for 192 took a lot of time, for the first time and then the second time after I removed the backup leftover on the PBS side. Finally, other jobs that took about 13 minutes are further attempts for backups.
 
Last edited:
well, something seems to be interrupting the connection between PVE and PBS - and a few seconds before that the PBS side still received something according to the log, so it isn't a timeout in our code.. how's your network setup like? any component there that could be responsible for killing long-running connections?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!