PBS Tape Backup job just stops in the middle - no error

WORK-Microwave · Jun 30, 2025

Since last 2 PBS updates the tape job crashes randomly between job, without any report or crash error. Like something would kill the job.

Have anyone else similar issue?

Version:
proxmox-backup-server 3.4.1-1
proxmox-kernel-6.8.12-11-pve-signed 6.8.12-11

fabian · Jun 30, 2025

which version previously worked? is there anything visible in the logs at all? what about monitoring?

WORK-Microwave · Jun 30, 2025

HI @fabian,

it worked in 8.3.x, but was the write speed around 30Mb, now almost 10x faster but there are not the issue.

My main problem is that there is nothing in logs, nothing in dmesg, syslog or job log (at one point no more entry, last entry is still normal). Even on monitoring i cannot anything that can me useful.

I now see that even verification job was killed with "Unknown" Status but not at the same time to say there is some correlation.

I will try to test if there is any better if i make sure that tape backup is not running at the same time as verification job. (i would take few days)

fabian · Jun 30, 2025

where there any service stops during that time period that would explain that? how is PBS deployed - bare metal, or as a VM? could you provide journal output covering the day of the tape job?

WORK-Microwave · Jul 8, 2025

Now I have tested the case that only Tape Backup was running and no verification Job at the same time. Look much better, did not crash but not completed. At least with error that looks like TAPE timeout. It did not wait long enough for tape to be rewind. Result is that Tape is unloaded later as expected

Code:

2025-07-07T20:54:06+02:00: end backup PBS2-STORE:"vm/111111060/2025-07-06T21:55:24Z"
2025-07-07T20:54:06+02:00: percentage done: 100.00% (124/124 groups)
2025-07-07T20:54:08+02:00: append media catalog
2025-07-07T20:54:08+02:00: rewind media
2025-07-07T21:01:08+02:00: queued notification (id=91f2c4ce-299a-49e5-9751-f72dcfb95bd4)
2025-07-07T21:01:08+02:00: TASK ERROR: unload drive failed - scsi command failed: transport error

I will test with non-parallel with verification for few weeks and later try to do parallel job testing.

Update come if there is some new info form tests

LukasInCloud · Jul 9, 2025

Running tape backups and verification jobs in parallel seems to cause timeouts during tape rewind. The transport error on unload suggests a resource or timing conflict. Testing with staggered jobs looks promising so far.

Search

Search

PBS Tape Backup job just stops in the middle - no error

WORK-Microwave

Active Member

fabian

Proxmox Staff Member

WORK-Microwave

Active Member

Attachments

fabian

Proxmox Staff Member

WORK-Microwave

Active Member

LukasInCloud

New Member

We value your privacy