Tape backup fails when the library cleans the tape.

Jul 24, 2025
3
0
1
Hello,

First of all: I'm not sure whether our current problem is in the right place here in the forum or whether I should open one or even two bug tickets.
If necessary, please send me a quick message...

Problem no. 1:

We are using the latest PBS 3.4.2 with an old IBM TS3100 tape library, which, if necessary, automatically inserts a cleaning tape when changing tapes, before loading the tape actually requested by PBS. This intermediate step takes a few minutes, but that was not a problem with our previous backup solution.

However, when this event occurs with PBS, the tape job aborts with the following error message:

2025-08-01T10:55:47+02:00: wrote 1101 chunks (4296.02 MB at 142.00 MB/s)
2025-08-01T10:56:17+02:00: wrote 1051 chunks (4299.16 MB at 141.93 MB/s)
2025-08-01T10:56:47+02:00: wrote 1377 chunks (4295.49 MB at 141.58 MB/s)
2025-08-01T10:57:00+02:00: wrote 575 chunks (1754.53 MB at 141.60 MB/s)
2025-08-01T10:57:01+02:00: allocated new writable media 'TNT508L5'
2025-08-01T10:57:01+02:00: eject current media
2025-08-01T10:57:31+02:00: trying to load media 'TNT508L5' into drive 'LTO5'
[B]2025-08-01T11:04:00+02:00: could not load tape into drive - error reading element status: read element status (B8h) failed: Unit Attention, Additional sense: Not ready to ready change, medium may have changed[/B]
2025-08-01T11:04:00+02:00: Please insert media 'TNT508L5' into changer 'TS3100'
2025-08-01T11:04:00+02:00: queued notification (id=76adac70-c8fb-42b9-b375-214d31dfdea2)
2025-08-01T11:04:08+02:00: found media label TNT508L5 (acd114d1-f534-4e78-b949-e6b36b5e78e6)
2025-08-01T11:04:08+02:00: writing new media set label (overwrite 'ab849c9d-d203-4c83-8853-d9b4906e68db/2')
2025-08-01T11:04:18+02:00: moving to end of media
2025-08-01T11:04:21+02:00: arrived at end of media
2025-08-01T11:04:21+02:00: write catalog for previous media: f22325eb-4a64-4487-b316-5464b45cfad7
2025-08-01T11:04:21+02:00: queued notification (id=93edfbe9-c600-4a30-bcf5-53d9385ee68b)
[B]2025-08-01T11:04:21+02:00: TASK ERROR: write failed - Not Ready, Additional sense: Logical unit is in process of becoming ready[/B]

The display of the tape library itself shows that after cleaning the drive, the new tape is loaded and written to briefly, before the job is aborted.

What should I do?


Follow-up problem no. 2:


If the tape job is restarted after being aborted, the tape already in the drive is successfully written to again and the backup continues.

At least, that is how I interpret it, because all previously completed snapshots are skipped at the beginning of the log:

2025-08-01T11:38:35+02:00: Starting tape backup job 'hdd:Z12A:LTO5:Z12A'
2025-08-01T11:38:35+02:00: update media online status
2025-08-01T11:38:35+02:00: media set uuid: 70ec1f15-d919-4aee-b897-a6ef5546db13
2025-08-01T11:38:35+02:00: found 20 groups (out of 21 total)
2025-08-01T11:38:35+02:00: latest-only: true (only considering latest snapshots)
2025-08-01T11:38:35+02:00: skip snapshot ns/xx/ct/101/2025-07-31T19:45:03Z
2025-08-01T11:38:35+02:00: skip snapshot ns/yy/ct/109/2025-07-31T19:01:29Z
2025-08-01T11:38:35+02:00: skip snapshot ns/yy/ct/110/2025-07-31T19:01:30Z
2025-08-01T11:38:35+02:00: skip snapshot ns/yy/vm/100/2025-07-31T19:00:02Z
2025-08-01T11:38:35+02:00: backup snapshot "ns/yy/vm/101/2025-07-31T19:00:38Z"
2025-08-01T11:38:46+02:00: allocated new writable media 'TNT508L5'
2025-08-01T11:38:46+02:00: trying to load media 'TNT508L5' into drive 'LTO5'
2025-08-01T11:39:01+02:00: found media label TNT508L5 (acd114d1-f534-4e78-b949-e6b36b5e78e6)
2025-08-01T11:39:01+02:00: moving to end of media
2025-08-01T11:39:01+02:00: arrived at end of media
2025-08-01T11:39:39+02:00: wrote 1299 chunks (4295.23 MB at 113.12 MB/s)
2025-08-01T11:40:10+02:00: wrote 1129 chunks (4295.49 MB at 135.65 MB/s)
2025-08-01T11:40:43+02:00: wrote 1141 chunks (4295.49 MB at 133.28 MB/s)
2025-08-01T11:41:15+02:00: wrote 1205 chunks (4295.23 MB at 134.42 MB/s)
...

What is striking, however, is that this is different from what has been the case with other tape changes to date.

When the first tape change was made, it was written:

2025-08-01T02:13:30+02:00: write catalog for previous media: f22325eb-4a64-4487-b316-5464b45cfad7

During the second tape change, the following was written:

2025-08-01T04:21:17+02:00: write catalog for previous media: f22325eb-4a64-4487-b316-5464b45cfad7
2025-08-01T04:21:18+02:00: write catalog for previous media: fd4ed436-bc75-4a0a-86bc-af053c3af056

for the third tape change

2025-08-01T07:33:40+02:00: write catalog for previous media: f22325eb-4a64-4487-b316-5464b45cfad7
2025-08-01T07:33:40+02:00: write catalog for previous media: fd4ed436-bc75-4a0a-86bc-af053c3af056
2025-08-01T07:33:40+02:00: write catalog for previous media: 630abca1-468e-4e9a-9a51-ea481e16bca9

and during the last tape change (which was not completed successfully due to the clean)

2025-08-01T11:04:21+02:00: write catalog for previous media: f22325eb-4a64-4487-b316-5464b45cfad7
2025-08-01T11:04:21+02:00: queued notification (id=93edfbe9-c600-4a30-bcf5-53d9385ee68b)
2025-08-01T11:04:21+02:00: TASK ERROR: write failed - Not Ready, Additional sense: Logical unit is in process of becoming ready

As you can see at the top, when the job is restarted at 2025-08-01T11:39:39, the actual junk data is immediately written, i.e., unlike the other tapes, TNT508L5 now contains a maximum of media ID 93edfbe9-c600 -4a30-bcf5-53d9385ee68b or possibly no catalog previous media ids at all.

Is this backup complete and consistent?
And what is the best way to test this?

Thank you very much for your suggestions!