Superloader 3 sometimes hangs

Lephisto

Well-Known Member
Jun 22, 2019
154
16
58
47
Hi,

I have noticied a weird behaviour. Sometimes my tape libary (Quantum Superloader 3 / LTO9) just hangs. On a Series of Backup jobs, it changes tapes as it should, this sometimes also works flawless for days, and then suddenly the tape changer is just not receiving the command to go to the next tape:

So the Next backupjob in line stays like this:

Code:
2024-03-11T00:00:10+01:00: waiting for drive lock...
2024-03-11T00:00:50+01:00: Starting tape backup job 'px11:px11-maerz:lto9:px11'
2024-03-11T00:00:50+01:00: task triggered by schedule 'daily'
2024-03-11T00:00:50+01:00: update media online status
2024-03-11T00:00:52+01:00: media set uuid: 220727db-edf2-4c9c-9ec8-3428d14a819c
2024-03-11T00:01:44+01:00: found 130 groups
2024-03-11T00:01:44+01:00: latest-only: true (only considering latest snapshots)
2024-03-11T00:01:44+01:00: skip snapshot vm/9000/2023-11-05T22:00:01Z
2024-03-11T00:01:44+01:00: backup snapshot "vm/10100/2024-03-10T21:30:00Z"
2024-03-11T00:01:44+01:00: allocated new writable media 'NSK145L9'
2024-03-11T00:01:44+01:00: trying to load media 'NSK145L9' into drive 'lto9' 
2024-03-11T00:12:16+01:00: could not load tape into drive - error reading element status: read element status (B8h) failed: scsi command failed: transport error
2024-03-11T00:12:16+01:00: Please insert media 'NSK145L9' into changer 'sl3'

Only after issueing a "Autoloader Reset" in the Webfrontend of the Superloader this happens:

Code:
2024-03-11T12:59:46+01:00: could not load tape into drive - error reading element status: read element status (B8h) failed: Aborted Command, Additional sense: I_T nexus loss occurred
2024-03-11T12:59:46+01:00: Please insert media 'NSK145L9' into changer 'sl3'
2024-03-11T13:07:06+01:00: found media label NSK145L9 (cea82754-19f3-4248-ba6d-bf065021e81c)
2024-03-11T13:07:15+01:00: moving to end of media

.... and the Backup continues. Where can I start digging?

The Library has a dedicated SAS Controller:

01:00.0 Serial Attached SCSI controller: Adaptec Smart Storage PQI SAS (rev 01)
 
failed: scsi command failed: transport error
Do you see any issues in the syslog/journal of the backup server around the same time that indicate some IO problem?

I remember a case in our paid support where a customer ran into similar issues, that some commands would always be failing with an I/O error. In the end, the cause was that the optical connection to the tape lib was too dirty and needed a cleaning from a service technician.

The firmware on the SAS controller and tapelib is up to date?
 
Firmware on the SAS Controller is up-to-date, tapelib as well.

Next I will replace Controller and SAS cable, but i have the feeling this might not be the issue..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!