Proxmox Backup Server 4.1.6 Random Hang Issue

singhb998

New Member
Apr 10, 2026
1
0
1
Hi Everyone,

I have been using Proxmox Backup for the past few years, but with the latest version I am facing some issues.Previously, in version 4.0.11, the Proxmox proxy service would go into a hang state, and I had to restart it manually to bring it back to normal. This caused all my running backups to fail.

To resolve this, I upgraded to 4.1.6, but now the Proxmox Backup Server itself goes into a hang state randomly. After about 10–15 minutes it comes back to normal, but during this time the GUI becomes inaccessible and after that running backups starts to fail again randomly.

I have around 200 VMs whose backups are stored on this server. I also checked CPU and RAM usage during these events, and both appear to be normal.

Could anyone help me fix this issue? Any guidance would be greatly appreciated.
 
Hi,
do you see any errors in the systemd journal around the time of the hang? Does the PBS host show high IO delay? What operations are being performed when this happens? What datastore backend is used (local, network share, S3, ...)?
 
Let me know if you find the issue for this because I'm getting the same thing on my Proxmox 4.1.5 and it just randomly hangs and I'm not sure why? I have to force power off and then reboot for it to work again and has caused some of my backups to fail.
 
Let me know if you find the issue for this because I'm getting the same thing on my Proxmox 4.1.5 and it just randomly hangs and I'm not sure why? I have to force power off and then reboot for it to work again and has caused some of my backups to fail.
Please also for your case provide the requested information as asked above.
 
PBS investigation summary

Host:

  • Hostname: g2mini-pbs
  • Hardware: HP ProDesk 400 G2 Mini
  • OS: Debian GNU/Linux 13 (trixie)
  • Kernel: 6.17.13-2-pve
  • PBS version: 4.1.6-1
Datastores:

  • pbs-main -> /datastore/pbs-main (local datastore on ext4)
  • omv-archive -> /mnt/omv-pbs/archive (NFSv4.2 mount from 192.168.1.25:/PBS-Archive)
Datastore config:

  • pbs-main GC schedule: 03:00
  • omv-archive GC schedule: 04:00
Journal / kernel findings:

  • Checked journalctl for 2026-04-05 20:00:00 to 21:00:00
  • No warning/alert entries from journalctl -p warning..alert
  • Grepping the journal window only showed repeated pulse-agent HTTP 502 warnings
  • No obvious PBS, ext4, NFS, OOM, hung-task, or lockup messages in the captured journal
  • dmesg did not show SATA reset, ext4 I/O error, OOM, or hung-task messages in the captured output
Current IO status collected on PBS host:

  • iostat -x 1 10 showed %iowait around 0.00–0.25%
  • Local disk sda util stayed very low (~0.2–0.5%)
  • Await times were low (roughly low single-digit ms)
  • vmstat 1 10 showed wa=0
  • /proc/pressure/io:
    • some avg10=0.00 avg60=0.10 avg300=0.05
    • full avg10=0.00 avg60=0.10 avg300=0.05
Recent PBS tasks:

  • syncjob from pbs-main to localhost:omv-archive at 02:00 completed OK in 59s
  • many backup tasks to omv-archive completed OK
  • garbage_collection:pbs-main at 03:00 completed OK in 2s
  • garbage_collection:omv-archive at 04:00 completed OK in 16m53s
Current interpretation:

  • I do not currently see evidence of high local disk IO delay or a local filesystem/storage problem on the PBS host
  • The local datastore (pbs-main) appears healthy
  • The more suspicious component is the remote NFS-backed datastore (omv-archive), especially if the perceived hang happens during sync/GC/archive activity on that datastore
  • Because omv-archive is mounted over NFSv4.2 with a hard mount, remote storage path stalls may be relevant even without obvious local disk errors
 
This issue does not look like a CPU or RAM problem. The behavior you described (GUI freezing for 10–15 minutes and then recovering) is usually related to storage I/O delays.

Since you have around 200 VMs, it’s possible that too many backups running at the same time are overloading the storage.

I would suggest:

  • Check logs using journalctl -xe and dmesg -T for any disk or I/O errors
  • Monitor I/O wait (%wa) using top during the issue
  • If using ZFS, check zpool status for any problems
  • Try reducing the number of concurrent backup jobs
Most likely this is due to storage performance or latency issues rather than a bug.

If you can share your storage setup and number of parallel backups, it will help to identify the exact cause.
 
Last edited:
I assume it would be the same for me because I have all the my backups happening at the same time and the last time it hung was during a backup but then the time before then it was just a random point in time. Most of the time the backups worked pretty well though tbf.
 
I spent the whole day debugging what I thought was an S3 issue in the PBS 4.1.6-1 + S3 bundle, not understanding why it was hanging... Turns out the problem wasn't in S3 at all.
I downgraded to 4.1.5-2 and everything worked fine. That's the way it goes.
 
Please install gdb and the PBS debug symbols via apt install gdb proxmox-backup-sever-dbgsym and when PBS hangs the next time run gdb --batch --ex 't a a bt' -p $(pidof proxmox-backup-proxy) > proxy.backtrace. Then attach the backtrace here, thanks!
 
Please install gdb and the PBS debug symbols via apt install gdb proxmox-backup-sever-dbgsym and when PBS hangs the next time run gdb --batch --ex 't a a bt' -p $(pidof proxmox-backup-proxy) > proxy.backtrace. Then attach the backtrace here, thanks!
Hi Chris,

At the moment I’m unfortunately no longer able to reproduce the issue, because the initial full backup has now completed successfully, so subsequent runs no longer hit the same condition.

What we were clearly able to observe before that:

- the issue happened on an S3-backed datastore (MEGA S4),
- it was reproducible at nearly the same position during the first full backup of one large VM,
- the backup consistently stalled around 1% / ~6.2 GiB of 603.5 GiB,
- changing rate limits / slowing the upload only changed how long it took to reach that point, but did not change the point where it stalled,
- during the stall, proxmox-backup-proxy / PBS GUI became unresponsive as well,
- the last visible task log lines were around chunk handling, i.e. repeated
Skip upload of already encountered chunk ...
and then it stopped making progress.