Timeout during backups

Einar Stenberg

Well-Known Member
Mar 7, 2012
44
11
48
Gjøvik, Norway, Norway
Recently we have started getting "http upgrade request timed out" errors during backup, causing the jobs to fail.
This is for both pve jobs and also jobs using the backup-client to backups some files from disk.

This started fairly recently and seems to happen randomly .
ie. the 2 first vm's in a backup fails but the rest are ok.
For the file transfers, sometimes i fails several times in a row, then suddenly works.


any ideas? What is the cause of the message?
 
could you post the client version (and pveversion -v for PVE systems)?
 
could you post the client version (and pveversion -v for PVE systems)?
root@HV01:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.106-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-8
pve-kernel-helper: 6.3-8
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.8
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.3-1
proxmox-backup-client: 1.0.11-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-9
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-4
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-8
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1


The installed client is 1.0.13-1 on the fileserver we are copying the files from, I think this is the newest version.
 
anything special regarding your network setup? do you see anything in the server side logs? the timeout for the upgrade request is 120s..
 
anything special regarding your network setup? do you see anything in the server side logs? the timeout for the upgrade request is 120s..
The servers running the backup jobs are connected to the same switch as the backupserver. Not much happening on the network in the middle of the night when the jobs are running.

Is there a limit to how many backup jobs that can run at once?
 
it's limited by how much resources your PBS system has. it is possible if the load gets very high and there is some issue with scheduling tasks, that the upgrade runs into the timeout because it never gets handled. do you have monitoring in place?
 
I also hit this issue with the https upgrade request timeouts for different VMs during the backups. Running the latest 6.4 updates. My PBS target is quite and old desktop system with Celeron processor, and I have three proxmox servers with a backup job based on Pool membership. VMs are spread across all servers, so all three servers start the backups at the same time... That makes the overload hypothesis quite likely.
My question is if there is anything I can do about it? Can I increase this timeout value? Or maybe make the failed jobs retry later? Or make the servers wait until the other server finishes?
For now I can only think about a hook script with different delays for each server at the start phase, but that's not a great solution, and will impact the manual backups...
 
right now there is nothing much you can do besides staggering the starts (a bit, or a lot if that doesn't help), or upgrading your PBS machine to beefier HW. there is a tracking issue for implementing a server-side queue: https://bugzilla.proxmox.com/show_bug.cgi?id=3086
 
Hi Fabian

Thank you for looking into that. I decided to watch the utilization as backups were being done and I found that I had the option 'Verify New Snapshots' turned on, which caused the additional verification jobs created right after the backups. It appeared like those verification jobs were creating more load than the actual backups. Turned off that option, and a test manual backup ran fine, so I am hopeful that the issue is fixed for me (it definitely should not hurt).
Just wanted to mention it as a possible workaround for people that get the same issue.

BTW, maybe you can consider throttling down the verification jobs on the PBS server?
 
  • Like
Reactions: herzkerl

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!