Superloader 3 sometimes hangs

Lephisto

Well-Known Member
Jun 22, 2019
182
18
58
48
Hi,

I have noticied a weird behaviour. Sometimes my tape libary (Quantum Superloader 3 / LTO9) just hangs. On a Series of Backup jobs, it changes tapes as it should, this sometimes also works flawless for days, and then suddenly the tape changer is just not receiving the command to go to the next tape:

So the Next backupjob in line stays like this:

Code:
2024-03-11T00:00:10+01:00: waiting for drive lock...
2024-03-11T00:00:50+01:00: Starting tape backup job 'px11:px11-maerz:lto9:px11'
2024-03-11T00:00:50+01:00: task triggered by schedule 'daily'
2024-03-11T00:00:50+01:00: update media online status
2024-03-11T00:00:52+01:00: media set uuid: 220727db-edf2-4c9c-9ec8-3428d14a819c
2024-03-11T00:01:44+01:00: found 130 groups
2024-03-11T00:01:44+01:00: latest-only: true (only considering latest snapshots)
2024-03-11T00:01:44+01:00: skip snapshot vm/9000/2023-11-05T22:00:01Z
2024-03-11T00:01:44+01:00: backup snapshot "vm/10100/2024-03-10T21:30:00Z"
2024-03-11T00:01:44+01:00: allocated new writable media 'NSK145L9'
2024-03-11T00:01:44+01:00: trying to load media 'NSK145L9' into drive 'lto9' 
2024-03-11T00:12:16+01:00: could not load tape into drive - error reading element status: read element status (B8h) failed: scsi command failed: transport error
2024-03-11T00:12:16+01:00: Please insert media 'NSK145L9' into changer 'sl3'

Only after issueing a "Autoloader Reset" in the Webfrontend of the Superloader this happens:

Code:
2024-03-11T12:59:46+01:00: could not load tape into drive - error reading element status: read element status (B8h) failed: Aborted Command, Additional sense: I_T nexus loss occurred
2024-03-11T12:59:46+01:00: Please insert media 'NSK145L9' into changer 'sl3'
2024-03-11T13:07:06+01:00: found media label NSK145L9 (cea82754-19f3-4248-ba6d-bf065021e81c)
2024-03-11T13:07:15+01:00: moving to end of media

.... and the Backup continues. Where can I start digging?

The Library has a dedicated SAS Controller:

01:00.0 Serial Attached SCSI controller: Adaptec Smart Storage PQI SAS (rev 01)
 
failed: scsi command failed: transport error
Do you see any issues in the syslog/journal of the backup server around the same time that indicate some IO problem?

I remember a case in our paid support where a customer ran into similar issues, that some commands would always be failing with an I/O error. In the end, the cause was that the optical connection to the tape lib was too dirty and needed a cleaning from a service technician.

The firmware on the SAS controller and tapelib is up to date?
 
  • Like
Reactions: Lephisto
Firmware on the SAS Controller is up-to-date, tapelib as well.

Next I will replace Controller and SAS cable, but i have the feeling this might not be the issue..
 
It is not totally clear yet, I replaced all possible Hardware several times now.

From what it seems, checking the Option "Eject Media" at the Backup job seems to do something. At least since about a week the Backup and Library is running stable.
 
So an Update on that Issue. It seems that the "Eject Media" Option in Fact seems to fix the stability Issues with the Quantum Superloader 3 (In my case with a LTO9 drive).

The Syslog is now clean from SCSI Errors, and Backups run reliably.

This is _not_ the "eject-before-unload" Option which is tied to the Changer, but it's the "eject-media" option, which is tied to the Backup Job and not default.

Maybe it would be good to have a remark on that in the Documentation? It took me quite a long time to figure this out. @fiona @Chris @aaron