PBS 4.1: Push Sync Jobs Fail, No Output in Task Viewer & Connection Error

PeterZwgatPX

New Member
Sep 6, 2024
7
1
3
Hello everyone,


I am currently facing several issues with my local Proxmox Backup Server which most likely are related, and I am hoping for some support in identifying the root cause.


When I start a Push Sync Job from my local PBS, it fails, especially when syncing larger VMs. If I then try to view the failed task in the Task Viewer, no output is shown anymore. As a result, it is not possible for me to understand why the task failed. This behavior does not only occur with Sync Jobs, but also with Verify Jobs, where no output is displayed in the Task Viewer either. Prune Jobs, however, continue to show log output as expected.


In addition, I receive a Connection Error as soon as I open the Status tab of a specific task. At the same time, the log of the local PBS is constantly being filled with very similar messages, for example:
Code:
Dec 23 05:05:54 pbs proxmox-backup-proxy[788]: processed 10.867 GiB in 3d 15h 11m 8s, uploaded 7.289 GiB
Dec 23 05:05:59 pbs proxmox-backup-proxy[788]: processed 8.492 GiB in 3d 20m 5s, uploaded 6.59 GiB
Dec 23 05:06:01 pbs proxmox-backup-proxy[788]: processed 19 GiB in 1h 6m 0s, uploaded 16.379 GiB
Dec 23 05:06:04 pbs proxmox-backup-proxy[788]: processed 3.586 GiB in 2d 1h 6m 4s, uploaded 1.91 GiB
Dec 23 05:06:05 pbs proxmox-backup-proxy[788]: processed 1.089 TiB in 1d 1h 6m 2s, uploaded 127.473 GiB
Dec 23 05:06:06 pbs proxmox-backup-proxy[788]: processed 15.52 GiB in 3d 1h 6m 5s, uploaded 13.223 GiB
Dec 23 05:06:07 pbs proxmox-backup-proxy[788]: processed 1.419 GiB in 4d 1h 6m 7s, uploaded 1.101 GiB
Dec 23 05:06:07 pbs proxmox-backup-proxy[788]: processed 7.793 GiB in 3d 18h 30m 6s, uploaded 3.395 GiB
Dec 23 05:06:13 pbs proxmox-backup-proxy[788]: processed 4.813 GiB in 3d 18h 6m 9s, uploaded 2.711 GiB
Dec 23 05:06:27 pbs proxmox-backup-proxy[788]: processed 2.641 GiB in 3d 20h 45m 6s, uploaded 1.844 GiB

These log entries repeat continuously, even when no job is actively running.


Regarding my setup: locally, I am running a Proxmox Backup Server version 4.1.0. The system uses a 12 TB Seagate IronWolf, a 4 TB WD Red, and an older 1 TB WD Blue. Each disk is configured as its own datastore, and all datastores are using ZFS.
The offsite PBS is also running version 4.1.0, uses two 2 TB WD disks with a separate datastore on each disk using ext4, and is connected via an IPSec site-to-site tunnel. This system does not show any issues.


As part of my troubleshooting, I recreated the datastores on the local PBS, which were originally using ext4, as new ZFS datastores. I also completely reinstalled the local PBS. In addition, I restarted both the PBS itself and the proxmox-backup-proxy.service multiple times. I also checked disk utilization; none of the HDDs are more than 50% utilized. Unfortunately, the very first sync job after the reinstallation failed again, and the described behavior reappeared.


Thank you very much in advance for your support.
 
could you try updating both ends to the 6.17.4-2 kernel on pbs-test? there is a known issue with older 6.17 kernels that can lead to TCP connection stalls.. if that doesn't help, you could try the 6.14 kernel next.

although the symptoms don't match exactly, so there might be a different issue (or the mentioned issue *and* a different issue) on your PBS system.
 
I have now activated the pbs-test repository on both systems and updated the kernel to 6.17.4-2-pve. I have now restarted the sync job. It is currently running, but I still cannot see any logs in the Task Viewer.
1766488907899.png
 
The sync job also failed with the new kernel. Unfortunately, I still cannot see any logs from the sync task on the local PBS. On the remote PBS, I get the following output:
Code:
2025-12-23T12:03:48+01:00: starting new backup on datastore 'basti-datastore' from ::ffff:192.168.201.11: "vm/100/2025-12-18T22:47:03Z"
2025-12-23T12:03:48+01:00: add blob "/mnt/datastore/basti-datastore/vm/100/2025-12-18T22:47:03Z/qemu-server.conf.blob" (379 bytes, comp: 379)
2025-12-23T12:03:48+01:00: created new fixed index 1 ("vm/100/2025-12-18T22:47:03Z/drive-scsi0.img.fidx")
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: backup failed: connection error
2025-12-23T12:47:09+01:00: removing failed backup
2025-12-23T12:47:09+01:00: removing backup snapshot "/mnt/datastore/basti-datastore/vm/100/2025-12-18T22:47:03Z"
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: TASK ERROR: connection error: connection reset
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
2025-12-23T12:47:09+01:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection
 
could you try the 6.14 kernel as well? if that also doesn't work, it really looks like a network/load issue on your system(s)..

regarding the task log on the UI - could you try accessing them using the CLI and/or using a fresh, new browser profile? are there any errors in the browser dev console?
 
I think I have found the cause of both problems.
Regarding the empty Task Viewer:
I accessed the GUI of my Proxmox and Proxmox Backup Server via a domain that runs behind a Traefik reverse proxy. When I access the GUI directly via the IP address, I can see the logs in the Task Viewer again. Since the setup worked before, I need to take another closer look at how I can access the Task Viewer via the domain again.

Regarding the failed sync jobs:
The sync job also failed with kernel 6.14.11-5-pve, so I looked for other possible causes. Since the problem only occurs in one direction and with one datastore, and therefore only with one HDD, I took a closer look at the SMART data for this HDD. Although the HDD passed all automatic SMART tests and has a SMART status of “passed” in the GUI, it shows many command timeouts. I therefore assume that the hard drive is damaged and have already ordered a new one. Due to the holidays, I will not be able to install it until the beginning of next year. I will then provide a new update on whether the hard drive was the cause of the problem.

Thank you very much for your support, and I wish you a Merry Christmas.
 
  • Like
Reactions: Onslow
Due to the missing logs in Task Viewer, I have now downgraded my Traefik reverse proxy from version 3.6.5 back to version 3.5.0. In thie Version all logs are now displayed as desired. I will continue investigating over the holidays and see if I can find a solution that will allow me to continue using the latest Treafik version.
Once I have found a solution, I will update the thread.