container backup fails on pbs (locally works, other containers to pbs work too)

pva · May 3, 2024

Hi! I've configured pve cluster and pbs. Now, I'm trying to backup all containers to PBS and everything works for all but one container that fails with the following log:

Code:

INFO: starting new backup job: vzdump 192 --notification-mode auto --storage iv-pbs0 --remove 0 --mode snapshot --notes-template '{{guestname}}' --node cf-pve2
INFO: Starting Backup of VM 192 (lxc)
INFO: Backup started at 2024-05-03 00:20:01
INFO: status = running
INFO: CT Name: cf-pve2-ct
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
INFO: creating Proxmox Backup Server archive 'ct/192/2024-05-02T21:20:01Z'
INFO: set max number of entries in memory for file-based backups to 1048576
INFO: run: lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- /usr/bin/proxmox-backup-client backup --crypt-mode=none pct.conf:/var/tmp/vzdumptmp805226_192/etc/vzdump/pct.conf root.pxar:/mnt/vzsnap0 --include-dev /mnt/vzsnap0/./ --skip-lost-and-found --exclude=/tmp/?* --exclude=/var/tmp/?* --exclude=/var/run/?*.pid --backup-type ct --backup-id 192 --backup-time 1714684801 --entries-max 1048576 --repository cf@pbs@***.***.**.***:backups0 --ns cf
INFO: Starting backup: [cf]:ct/192/2024-05-02T21:20:01Z
INFO: Client name: cf-pve2
INFO: Starting backup protocol: Fri May  3 00:20:06 2024
INFO: No previous manifest available.
INFO: Upload config file '/var/tmp/vzdumptmp805226_192/etc/vzdump/pct.conf' to 'cf@pbs@***.***.**.***:8007:backups0' as pct.conf.blob
INFO: Upload directory '/mnt/vzsnap0' to 'cf@pbs@***.***.**.***:8007:backups0' as root.pxar.didx
INFO: HTTP/2.0 connection failed
INFO: catalog upload error - channel closed
INFO: Error: connection reset
INFO: cleanup temporary 'vzdump' snapshot
ERROR: Backup of VM 192 failed - command 'lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- /usr/bin/proxmox-backup-client backup '--crypt-mode=none' pct.conf:/var/tmp/vzdumptmp805226_192/etc/vzdump/pct.conf root.pxar:/mnt/vzsnap0 --include-dev /mnt/vzsnap0/./ --skip-lost-and-found '--exclude=/tmp/?*' '--exclude=/var/tmp/?*' '--exclude=/var/run/?*.pid' --backup-type ct --backup-id 192 --backup-time 1714684801 --entries-max 1048576 --repository cf@pbs@***.***.**.***:backups0 --ns cf' failed: exit code 255
INFO: Failed at 2024-05-03 00:33:12
INFO: Backup job finished with errors
INFO: notified via target `mail-to-root`
TASK ERROR: job errors

Now, although the error is quite clear about the reasons, I don't understand why I have it. I've backed up three times larger CT to the same pbs without any problems and any other containers was backed up without problem. Also, I've tried to backup this container locally and that works. In pbs I see tasks and there are no visible problem with this one. Last entries in the log:

Code:

2024-05-03T00:33:06+03:00: POST /dynamic_chunk
2024-05-03T00:33:06+03:00: upload_chunk done: 16777216 bytes, eed4e211094975fe5780b0cafac187c02d227aad01f658751fe17062f348443a
2024-05-03T00:33:06+03:00: upload_chunk done: 5768111 bytes, 5b510c689a94cddd79648e258a3270a14ccc77a8304100a8eaac31ca8dfd5e21
2024-05-03T00:33:06+03:00: POST /dynamic_chunk

This is the latest pve/pbs versions.

# pveversion
pve-manager/8.2.2/9355359cd7afbae4 (running kernel: 6.8.4-2-pve)

So, my question is, how can I debug this problem? Is it possible to configure retry policy? Can I configure upload bandwidth may be? I mean I would like to try different network options if any to see if that helps... Thanks in advance for any help.

fabian · May 3, 2024

do you see anything on the PBS side for that backup attempt (journal, PBS task log, PBS access log)?

pva · May 4, 2024

fabian said:
do you see anything on the PBS side for that backup attempt (journal, PBS task log, PBS access log)?

Yes, I do. POST /dynamic_chunk above are from PBS side. But the whole log is very long, so that are only the last messages.

That log I've took from the longest tasks:

And numbers look sane here. The backup job for ct 191 took that long since that is the largest container. Two long backup jobs for 192 took a lot of time, for the first time and then the second time after I removed the backup leftover on the PBS side. Finally, other jobs that took about 13 minutes are further attempts for backups.

fabian · May 6, 2024

well, something seems to be interrupting the connection between PVE and PBS - and a few seconds before that the PBS side still received something according to the log, so it isn't a timeout in our code.. how's your network setup like? any component there that could be responsible for killing long-running connections?

Search

Search

container backup fails on pbs (locally works, other containers to pbs work too)

pva

New Member

fabian

Proxmox Staff Member

pva

New Member

fabian

Proxmox Staff Member