Hi! Thank you for PBS, it is a great piece of software. I've been testing it for a month on ~60TB-scale backup tasks, and it has been holding up really very well. I'm running off the community repository -- apologies, I'll get a subscription as soon as I can carve out the budget. I'm currently on PBS 3.3.1 (the latest in the repo as of today; more version info below). Since one or two point releases, I'm experiencing an issue that is quite a blocker, so please allow me to report it:
My backup tasks start out just fine and upload all the chunks, but upon conclusion, the tasks hang and never fully conclude and return. Subsequent tasks fail because they can't acquire a lock on the datastore. Even killing the associated proxmox-backup-client process doesn't cause the task to return. The only way I can get rid of those hanging tasks is by restarting proxmox-backup-proxy. Interestingly, the actual backup snapshot shows up just fine in the datastore. So the task hangs after having done all the work...!
My setup:
Command:
Examinging the syslog on the host, I see that the last few chunks get added successfully, and then log entries from this task just end. I see no other relevant error messages whatsoever. I'm not sure where the 27 sec time offset between the client log and the syslog comes from, but this is the same task.
The server-side task log contains the same entries as the syslog and ends in the same abrupt way. Note that the reader task that gets launched with every backup task in change-detection-mode=metadata return just fine.
I'd be happy to provide further debug info or tests, please let me know what would be of interest. Any hints appreciated!
My backup tasks start out just fine and upload all the chunks, but upon conclusion, the tasks hang and never fully conclude and return. Subsequent tasks fail because they can't acquire a lock on the datastore. Even killing the associated proxmox-backup-client process doesn't cause the task to return. The only way I can get rid of those hanging tasks is by restarting proxmox-backup-proxy. Interestingly, the actual backup snapshot shows up just fine in the datastore. So the task hangs after having done all the work...!
My setup:
- OS: Proxmox PVE. No VMs or containers running.
Output ofpveversion:
pve-manager/8.3.1/fb48e850ef9dde27 (running kernel: 6.8.12-4-pve)
- PBS on same host as PVE, installed in host OS (not a VM or CT).
Output ofproxmox-backup-manager versions:
proxmox-backup-server 3.3.1-1 running version: 3.3.1
- backup sources mounted as NFS on same host
- proxmox-backup-client (3.3.1-1) runs on same host, as root cronjob
- datastore is a ZFS dataset on same host
Command:
proxmox-backup-client backup vmd.pxar:/mnt/vmd --repository 'root@pam!root-XXX@localhost:pbs_v-storagex' --ns XXX_data --change-detection-mode=metadata
Code:
Dec 08 09:42:54 append chunks list len (512)
Dec 08 09:42:54 Connection: send frame=Headers { stream_id: StreamId(21351), flags: (0x4: END_HEADERS) } peer=Client
Dec 08 09:42:54 Connection: send frame=Data { stream_id: StreamId(21351), flags: (0x1: END_STREAM) } peer=Client
Dec 08 09:42:54 Connection: received frame=Headers { stream_id: StreamId(21351), flags: (0x4: END_HEADERS) } peer=Client
Dec 08 09:42:54 Connection: received frame=Data { stream_id: StreamId(21351), flags: (0x1: END_STREAM) } peer=Client
Dec 08 09:42:54 append chunks list len (512)
Dec 08 09:42:54 Connection: send frame=Headers { stream_id: StreamId(21353), flags: (0x4: END_HEADERS) } peer=Client
Dec 08 09:42:54 Connection: send frame=Data { stream_id: StreamId(21353), flags: (0x1: END_STREAM) } peer=Client
Dec 08 09:42:54 Connection: received frame=Headers { stream_id: StreamId(21353), flags: (0x4: END_HEADERS) } peer=Client
Dec 08 09:42:54 Connection: received frame=Data { stream_id: StreamId(21353), flags: (0x1: END_STREAM) } peer=Client
Dec 08 09:42:54 append chunks list len (293)
Dec 08 09:42:54 Connection: send frame=Headers { stream_id: StreamId(21355), flags: (0x4: END_HEADERS) } peer=Client
Dec 08 09:42:54 Connection: send frame=Data { stream_id: StreamId(21355), flags: (0x1: END_STREAM) } peer=Client
Dec 08 09:42:54 Connection: received frame=Headers { stream_id: StreamId(21355), flags: (0x4: END_HEADERS) } peer=Client
Dec 08 09:42:54 Connection: received frame=Data { stream_id: StreamId(21355), flags: (0x1: END_STREAM) } peer=Client
Dec 08 09:42:54 v-data2.ppxar.didx: reused 20.233 TiB from previous snapshot for unchanged files (5459800 chunks)
Dec 08 09:42:54 v-data2.ppxar.didx: had to backup 0 B of 20.233 TiB (compressed 0 B) in 786.20 s (average 0 B/s)
Dec 08 09:42:54 v-data2.ppxar.didx: backup was done incrementally, reused 20.233 TiB (100.0%)
Dec 08 09:42:54 v-data2.ppxar.didx: Reused 6 from 5459806 chunks.
Dec 08 09:42:54 v-data2.ppxar.didx: Average chunk size was 3.886 MiB.
Dec 08 09:42:54 v-data2.ppxar.didx: Average time per request: 143 microseconds.
Dec 08 09:42:54 Connection: send frame=Headers { stream_id: StreamId(21357), flags: (0x5: END_HEADERS | END_STREAM) } peer=Client
Dec 08 09:42:54 Connection: received frame=Headers { stream_id: StreamId(21357), flags: (0x4: END_HEADERS) } peer=Client
Dec 08 09:42:54 Connection: received frame=Data { stream_id: StreamId(21357), flags: (0x1: END_STREAM) } peer=Client
Dec 08 09:42:54 Upload index.json to 'root@pam!root-v-storagex-backups@localhost:8007:pbs_v-storagex'
Dec 08 09:42:54 Connection: send frame=Headers { stream_id: StreamId(21359), flags: (0x4: END_HEADERS) } peer=Client
Dec 08 09:42:54 Connection: send frame=Data { stream_id: StreamId(21359), flags: (0x1: END_STREAM) } peer=Client
Dec 08 09:42:54 Connection: received frame=Headers { stream_id: StreamId(21359), flags: (0x4: END_HEADERS) } peer=Client
Dec 08 09:42:54 Connection: received frame=Data { stream_id: StreamId(21359), flags: (0x1: END_STREAM) } peer=Client
Dec 08 09:42:54 Connection: send frame=Headers { stream_id: StreamId(21361), flags: (0x5: END_HEADERS | END_STREAM) } peer=Client
Examinging the syslog on the host, I see that the last few chunks get added successfully, and then log entries from this task just end. I see no other relevant error messages whatsoever. I'm not sure where the 27 sec time offset between the client log and the syslog comes from, but this is the same task.
Code:
Dec 08 09:42:27 v-storage3 proxmox-backup-proxy[494488]: successfully added chunk 1260d83bbcf100b2f4f962f4d2bf938dd55490004be4450aa1a61169daa3ebc9 to dynamic index 2 (offset 15658790343380, size 2230064)
Dec 08 09:42:27 v-storage3 proxmox-backup-proxy[494488]: successfully added chunk 54544d336337a3efcfbf5a14c36602b7ef8b2da3afaa04e52c3102d752e059ae to dynamic index 2 (offset 15658792573444, size 3208618)
Dec 08 09:42:27 v-storage3 proxmox-backup-proxy[494488]: successfully added chunk 179ce43db5c3ff9365552f2309524a44c5985cda98e81d36af84bec085e13c62 to dynamic index 2 (offset 15658795782062, size 6690249)
Dec 08 09:42:27 v-storage3 proxmox-backup-proxy[494488]: successfully added chunk 8c9777e3418783d06a7633ceb6601e797ed97682691af806008fdc22a1c4eb5a to dynamic index 2 (offset 15658802472311, size 6600153)
Dec 08 09:42:27 v-storage3 proxmox-backup-proxy[494488]: successfully added chunk 27c651d7e1e78cd18da485c1643e95866b6a8c851938ecdfbb2e8e64ee220c5a to dynamic index 2 (offset 15658809072464, size 3233596)
Dec 08 09:42:27 v-storage3 proxmox-backup-proxy[494488]: successfully added chunk 86e39f5965096997b6f85f546921c2721bcc5df0d298d90339482c6f005c43bb to dynamic index 2 (offset 15658812306060, size 3237114)
Dec 08 09:50:53 v-storage3 smartd[7314]: Device: /dev/sda [SAT], CHECK POWER STATUS spins up disk (0x81 -> 0xff)
Dec 08 09:51:03 v-storage3 smartd[7314]: Device: /dev/sdc [SAT], CHECK POWER STATUS spins up disk (0x81 -> 0xff)
...
Dec 08 09:59:12 v-storage3 proxmox-backup-proxy[494488]: rrd journal successfully committed (25 files in 0.017 seconds)
The server-side task log contains the same entries as the syslog and ends in the same abrupt way. Note that the reader task that gets launched with every backup task in change-detection-mode=metadata return just fine.
I'd be happy to provide further debug info or tests, please let me know what would be of interest. Any hints appreciated!
Last edited: