PBS: every 3 days PBS become unstable

aabye

New Member
Nov 19, 2024
8
1
3
Hi guys.

I have problem with my primary PBS in production environment.

Approx. every third day server become unresponsive and crashes. In error log there is nothing useful, here is from yesterday:

Jan 09 23:17:37 backup1 proxmox-backup-proxy[1560]: Upload backup log to datastore 'BACKUP1', root namespace vm/169/2026-01-09T22:17:27Z/client.log.blob
Jan 09 23:25:11 backup1 proxmox-backup-proxy[1560]: Upload backup log to datastore 'BACKUP1', root namespace vm/170/2026-01-09T22:17:37Z/client.log.blob
Jan 09 23:25:25 backup1 proxmox-backup-proxy[1560]: Upload backup log to datastore 'BACKUP1', root namespace vm/171/2026-01-09T22:25:11Z/client.log.blob
Jan 09 23:25:33 backup1 proxmox-backup-proxy[1560]: Upload backup log to datastore 'BACKUP1', root namespace vm/176/2026-01-09T22:25:25Z/client.log.blob
Jan 09 23:46:03 backup1 proxmox-backup-proxy[1560]: rrd journal successfully committed (52 files in 0.069 seconds)
Jan 10 00:00:01 backup1 proxmox-backup-proxy[1560]: rotated access log, telling daemons to re-open log file
Jan 10 00:00:01 backup1 proxmox-backup-proxy[1560]: rotated auth log, telling daemons to re-open log file
Jan 10 00:00:01 backup1 proxmox-backup-proxy[1560]: re-opening access-log file
Jan 10 00:00:01 backup1 proxmox-backup-api[1406]: re-opening access-log file
Jan 10 00:00:01 backup1 proxmox-backup-proxy[1560]: re-opening auth-log file
Jan 10 00:00:01 backup1 proxmox-backup-api[1406]: re-opening auth-log file
Jan 10 00:00:19 backup1 proxmox-backup-proxy[1560]: GET /api2/json/nodes/localhost/tasks?running=1&limit=100: 400 Bad Request: [client [::ffff:]:6012] Unable to acquire lock "/var/log/proxmox-backup/tasks/.acti>
Jan 10 00:00:20 backup1 proxmox-backup-proxy[1560]: error writing task result to the tasklog
Jan 10 00:00:22 backup1 proxmox-backup-proxy[1560]: error writing task result to the tasklog
Jan 10 00:00:25 backup1 proxmox-backup-proxy[1560]: GET /api2/json/nodes/localhost/tasks?limit=0&since=1765405318: 400 Bad Request: [client [::ffff:]:47818] Unable to acquire lock "/var/log/proxmox-backup/tasks>

Jan 10 00:00:25 backup1 systemd[1]: Starting dpkg-db-backup.service - Daily dpkg database backup service...
Jan 10 00:00:26 backup1 systemd[1]: dpkg-db-backup.service: Deactivated successfully.
Jan 10 00:00:26 backup1 systemd[1]: Finished dpkg-db-backup.service - Daily dpkg database backup service.
Jan 10 00:00:35 backup1 proxmox-backup-proxy[1560]: GET //api2/json/backup?backup-id=128&backup-time=1767999604&backup-type=vm&benchmark=false&debug=false&store=BACKUP1: 400 Bad Request: [client [::ffff:]:3569>
Jan 10 00:00:37 backup1 proxmox-backup-proxy[1560]: GET //api2/json/backup?backup-id=147&backup-time=1767999602&backup-type=vm&benchmark=false&debug=false&store=BACKUP2: 400 Bad Request: [client [::ffff:]:5154>
Jan 10 00:01:37 backup1 proxmox-backup-proxy[1560]: GET /api2/json/nodes/localhost/tasks?running=1&limit=100: 400 Bad Request: [client [::ffff:]:19404] Unable to acquire lock "/var/log/proxmox-backup/tasks/.act>
Jan 10 00:01:41 backup1 proxmox-backup-proxy[1560]: GET /api2/json/nodes/localhost/tasks?limit=0&since=1765405318: 400 Bad Request: [client [::ffff:]:5751] Unable to acquire lock "/var/log/proxmox-backup/tasks/>
Jan 10 00:02:43 backup1 proxmox-backup-api[1406]: server error: Transport endpoint is not connected (os error 107)
Jan 10 00:02:43 backup1 proxmox-backup-api[1406]: request_shutdown
Jan 10 00:02:43 backup1 proxmox-backup-api[1406]: daemon shutting down...
Jan 10 00:02:43 backup1 proxmox-backup-api[1406]: daemon shut down.

Jan 10 00:02:43 backup1 proxmox-backup-proxy[1560]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:]:34634] client error (SendRequest)
Jan 10 00:02:43 backup1 proxmox-backup-proxy[1560]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:]:41306] client error (SendRequest)
Jan 10 00:02:43 backup1 proxmox-backup-api[1406]: server shutting down, waiting for active workers to complete
Jan 10 00:02:43 backup1 proxmox-backup-api[1406]: done - exit server
Jan 10 00:02:43 backup1 proxmox-backup-proxy[1560]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:]:43060] client error (SendRequest)
Jan 10 00:02:43 backup1 proxmox-backup-proxy[1560]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:]:37144] client error (SendRequest)
Jan 10 00:02:43 backup1 proxmox-backup-proxy[1560]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:]:38492] client error (SendRequest)



I have everytime manually reboot server (server is physical) and from reboot on everything is working well for next 3 days.

On PVE there are such errors duging problem in time when PBS is unresponsible.

TASK ERROR: could not activate storage 'backup2': backup2: error fetching datastores - 400 Bad Request

I belive there is something with API servis but I cannot find anything in logs.

Please help.

Environment is up to date, PBS 4.1.1 with all patches. We do not have subscription yet but in future there will be.

Thank you for help.

Aabye
 
Are the datastores local or remote? Every 3 days sounds like a prune or verify job which then runs into a timeout/lock.
 
Datastores are local. How can I check if there is something with prunes or are there any timeouts?
 
Within the console:

Code:
proxmox-backup-manager task list | egrep "verify|prune|gc|sync"

For system logs:

Code:
journalctl -u proxmox-backup -u proxmox-backup-api -b -1 --no-pager

Kernel logs (I/O errors):

Code:
journalctl -k -b -1 --no-pager

How is your local storage configured? ZFS RAID? Ext4? SSD/NVME or HDD? The job itself should not be a problem. Prune, verify and other jobs are standard tasks. I guess it‘s a problem with the underlying storage system (faulty or slow drive, controller, cable, etc.). Another possibility (if ZFS is used): ARC/RAM problems (ARC size to high for example).
 
Last edited:
HI.

Thank you.

task list is empty:

root@backup1:/var/log/proxmox-backup/tasks# proxmox-backup-manager task list | egrep "verify|prune|gc|sync"
root@backup1:/var/log/proxmox-backup/tasks#
root@backup1:/var/log/proxmox-backup/tasks#
root@backup1:/var/log/proxmox-backup/tasks#
root@backup1:/var/log/proxmox-backup/tasks#

journalctl shows similar as logs before in my previous reply:

...
root@backup1:/var/log/proxmox-backup/tasks# journalctl -u proxmox-backup -u proxmox-backup-api -b -1 --no-pager
...
Jan 09 21:17:42 backup1 proxmox-backup-api[1406]: notified via target `mail-to-root`
Jan 10 00:00:01 backup1 proxmox-backup-api[1406]: re-opening access-log file
Jan 10 00:00:01 backup1 proxmox-backup-api[1406]: re-opening auth-log file
Jan 10 00:02:43 backup1 proxmox-backup-api[1406]: server error: Transport endpoint is not connected (os error 107)
Jan 10 00:02:43 backup1 proxmox-backup-api[1406]: request_shutdown
Jan 10 00:02:43 backup1 proxmox-backup-api[1406]: daemon shutting down...
Jan 10 00:02:43 backup1 proxmox-backup-api[1406]: daemon shut down.
Jan 10 00:02:43 backup1 proxmox-backup-api[1406]: server shutting down, waiting for active workers to complete
Jan 10 00:02:43 backup1 proxmox-backup-api[1406]: done - exit server
Jan 10 00:02:43 backup1 systemd[1]: proxmox-backup.service: Deactivated successfully.

journalctl:
....
Jan 08 00:13:06 backup1 kernel: hpsa 0000:03:00.0: Using 48-bit DMA addresses
Jan 10 00:21:39 backup1 kernel: XFS (sdb1): Unmounting Filesystem 1d8f08ac-1189-49ad-bc86-611d253ecbb8
Jan 10 00:21:39 backup1 systemd-shutdown[1]: Syncing filesystems and block devices.
Jan 10 00:21:40 backup1 systemd-shutdown[1]: Sending SIGTERM to remaining processes...
....


local storage (there are two) is ext4, second xfs, 8TB in size. Behind is RAID1, hardware, no errors seen there. And disks are HDD.

Same configuration was working before on previous version PBS (3.*), after upgrade to 4.* I having problems.
 
Last edited:
we have found the problem. Problem was with S3 Remote datastore... now we have disable it and server working fine again.
 
  • Like
Reactions: Johannes S
Hi.

It happened again, please help if anyone has similar problem.

Thank you.

Best, Aabye
 
We have the same issue with a PBS4 and we found a solution by restarting pbs proxy service instead of rebooting.
It seems a network related issue but I cannot reproduce.
 
What version of the kernel are you running? Please post output of proxmox-backup-manager version --verbose
 
What version of the kernel are you running? Please post output of proxmox-backup-manager version --verbose
proxmox-backup 4.0.0 running kernel: 6.17.13-2-pve
proxmox-backup-server 4.1.4-1 running version: 4.1.4
proxmox-kernel-helper 9.0.4
proxmox-kernel-6.17 6.17.13-2
proxmox-kernel-6.17.13-2-pve-signed 6.17.13-2
proxmox-kernel-6.17.13-1-pve-signed 6.17.13-1
ifupdown2 unknown
libjs-extjs 7.0.0-5
proxmox-backup-docs 4.1.4-1
proxmox-backup-client 4.1.4-1
proxmox-mail-forward unknown
proxmox-mini-journalreader 1.6
proxmox-offline-mirror-helper unknown
proxmox-widget-toolkit 5.1.8
pve-xtermjs 5.5.0-3
smartmontools 7.4-pve1
zfsutils-linux 2.4.0-pve1
 
Can you describe the exact symptoms you are seeing? And post the excerpt of the task log from the PBS when the server becomes unresponsive journalctl --since <DATETIME> --until <DATETIME> > journal.txt
 
The log show a generic:
Mar 15 00:02:10 hzpbs1 proxmox-backup-proxy[57233]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.98]:60146] client error (Connect)
Mar 15 00:02:11 hzpbs1 proxmox-backup-proxy[57233]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.98]:60150] client error (Connect)

Clients and also the gui are unable to log in into the system
 

Attachments

What realm are you using for login? Are clients being authenticated via OpenID? Do you have any datastores backed by S3 configured?
 
Last edited:
Are there any errors in the systemd journal before the client connection errors start appearing?
 
Looking at the log I see this message

Code:
2026-03-17T09:12:24.294602+01:00 hzpbs1 proxmox-backup-api[1039]: server error: Transport endpoint is not connected (os error 107)
2026-03-17T09:12:24.294740+01:00 hzpbs1 proxmox-backup-api[1039]: request_shutdown
2026-03-17T09:12:24.294776+01:00 hzpbs1 proxmox-backup-api[1039]: daemon shutting down...
2026-03-17T09:12:24.294824+01:00 hzpbs1 proxmox-backup-api[1039]: daemon shut down.
2026-03-17T09:12:24.301950+01:00 hzpbs1 proxmox-backup-api[1039]: server shutting down, waiting for active workers to complete
2026-03-17T09:12:24.350812+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.200]:38138] client error (SendRequest)
2026-03-17T09:12:24.350928+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.102]:41246] client error (SendRequest)
2026-03-17T09:12:24.350970+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.99]:41908] client error (SendRequest)
2026-03-17T09:12:24.351003+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.98]:52826] client error (SendRequest)
2026-03-17T09:12:24.351038+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.197]:47482] client error (SendRequest)
2026-03-17T09:12:24.351078+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.199]:45662] client error (SendRequest)
2026-03-17T09:12:24.351116+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.101]:35330] client error (SendRequest)
2026-03-17T09:12:24.351150+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.100]:58544] client error (SendRequest)
2026-03-17T09:12:24.380590+01:00 hzpbs1 proxmox-backup-api[1039]: done - exit server
2026-03-17T09:12:24.560501+01:00 hzpbs1 systemd[1]: proxmox-backup.service: Deactivated successfully.
2026-03-17T09:12:24.560637+01:00 hzpbs1 systemd[1]: proxmox-backup.service: Consumed 3h 13min 6.848s CPU time, 239.7M memory peak.
2026-03-17T09:12:24.608531+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.198]:55044] client error (Connect)
2026-03-17T09:12:24.608754+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.102]:41250] client error (Connect)
2026-03-17T09:12:24.707965+01:00 hzpbs1 minio[1045]: node(127.0.0.1:9000): Read/Write/Delete successful, bringing drive /mnt/minio-data online
2026-03-17T09:12:24.742240+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.198]:55046] client error (Connect)
2026-03-17T09:12:24.802582+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.197]:47488] client error (Connect)
2026-03-17T09:12:24.847397+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.200]:52422] client error (Connect)
2026-03-17T09:12:24.911981+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.100]:39994] client error (Connect)
2026-03-17T09:12:25.072588+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.100]:40004] client error (Connect)
2026-03-17T09:12:25.088357+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.99]:41912] client error (Connect)
2026-03-17T09:12:25.133976+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.200]:52432] client error (Connect)
2026-03-17T09:12:25.161202+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.101]:35342] client error (Connect)
2026-03-17T09:12:25.229433+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.99]:41922] client error (Connect)
2026-03-17T09:12:25.246361+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.199]:42048] client error (Connect)
2026-03-17T09:12:25.303409+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.101]:35350] client error (Connect)
2026-03-17T09:12:25.375197+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.199]:42056] client error (Connect)
2026-03-17T09:12:25.416559+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.197]:47490] client error (Connect)
2026-03-17T09:12:25.579816+01:00 hzpbs1 proxmox-backup-proxy[1200]: POST /api2/json/access/ticket: 400 Bad Request: [client [::ffff:10.10.12.98]:57372] client error (Connect)
 
2026-03-17T09:12:24.294602+01:00 hzpbs1 proxmox-backup-api[1039]: server error: Transport endpoint is not connected (os error 107) 2026-03-17T09:12:24.294740+01:00 hzpbs1 proxmox-backup-api[1039]: request_shutdown 2026-03-17T09:12:24.294776+01:00 hzpbs1 proxmox-backup-api[1039]: daemon shutting down... 2026-03-17T09:12:24.294824+01:00 hzpbs1 proxmox-backup-api[1039]: daemon shut down.
Is this shutdown manually triggered, e.g. by stopping the systemd service? What is the current status of the services? systemctl status proxmox-backup.service proxmox-backup-proxy.service

Edit: Also, since which upgrade does this happen? Can you identify the previously working version, e.g. by the apt history?
 
Last edited:
Analyzing the issue seems related to case with very high pressure on i/o. Our backupserver has mechanical HDs and during those high stress moment I presume many timeouts are expiring:

2026-03-18T23:52:32.431669+01:00 hzpbs1 kernel: INFO: task tokio-runtime-w:20224 blocked for more than 122 seconds.
2026-03-18T23:52:32.431681+01:00 hzpbs1 kernel: Tainted: P O 6.17.13-2-pve #1
2026-03-18T23:52:32.431682+01:00 hzpbs1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2026-03-18T23:52:32.431682+01:00 hzpbs1 kernel: task:tokio-runtime-w state:D stack:0 pid:20224 tgid:10934 ppid:1 task_flags:0x400040 flags:0x00004002
2026-03-18T23:52:32.431683+01:00 hzpbs1 kernel: Call Trace:
2026-03-18T23:52:32.431683+01:00 hzpbs1 kernel: <TASK>
2026-03-18T23:52:32.431683+01:00 hzpbs1 kernel: __schedule+0x468/0x1310
2026-03-18T23:52:32.431684+01:00 hzpbs1 kernel: ? __queue_delayed_work+0x112/0x170
2026-03-18T23:52:32.431684+01:00 hzpbs1 kernel: ? mod_delayed_work_on+0xae/0xc0
2026-03-18T23:52:32.432721+01:00 hzpbs1 kernel: schedule+0x27/0xf0
2026-03-18T23:52:32.432728+01:00 hzpbs1 kernel: wb_wait_for_completion+0x91/0xd0
2026-03-18T23:52:32.432728+01:00 hzpbs1 kernel: ? __pfx_autoremove_wake_function+0x10/0x10
2026-03-18T23:52:32.432729+01:00 hzpbs1 kernel: sync_inodes_sb+0xd3/0x2b0
2026-03-18T23:52:32.432729+01:00 hzpbs1 kernel: sync_filesystem+0x70/0xb0
2026-03-18T23:52:32.432730+01:00 hzpbs1 kernel: __x64_sys_syncfs+0x4a/0xe0
2026-03-18T23:52:32.432730+01:00 hzpbs1 kernel: x64_sys_call+0x14f9/0x2330
2026-03-18T23:52:32.432731+01:00 hzpbs1 kernel: do_syscall_64+0x80/0x8f0
2026-03-18T23:52:32.432731+01:00 hzpbs1 kernel: ? srso_return_thunk+0x5/0x5f
2026-03-18T23:52:32.432732+01:00 hzpbs1 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
2026-03-18T23:52:32.433767+01:00 hzpbs1 kernel: RIP: 0033:0x70a06d11a777
2026-03-18T23:52:32.433773+01:00 hzpbs1 kernel: RSP: 002b:0000709f1611fa18 EFLAGS: 00000206 ORIG_RAX: 0000000000000132
2026-03-18T23:52:32.433774+01:00 hzpbs1 kernel: RAX: ffffffffffffffda RBX: 000000000000001c RCX: 000070a06d11a777
2026-03-18T23:52:32.433774+01:00 hzpbs1 kernel: RDX: 000070a028015fd0 RSI: 0000000000000006 RDI: 000000000000001c
2026-03-18T23:52:32.433775+01:00 hzpbs1 kernel: RBP: 0000000000000000 R08: 0000000000000020 R09: 0000000000000000
2026-03-18T23:52:32.433775+01:00 hzpbs1 kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000001
2026-03-18T23:52:32.433776+01:00 hzpbs1 kernel: R13: 00005eb6e10b5548 R14: 0000709f1611fa60 R15: 000070a02800a990
2026-03-18T23:52:32.433776+01:00 hzpbs1 kernel: </TASK>
We are looking to avoid too many backups at same time and see if this issue reduce it's frequency.
 
Is this shutdown manually triggered, e.g. by stopping the systemd service? What is the current status of the services? systemctl status proxmox-backup.service proxmox-backup-proxy.service

Edit: Also, since which upgrade does this happen? Can you identify the previously working version, e.g. by the apt history?
This issue appears after upgrading from latest 3.4 to version 4.