Tape job failing with SCSI transport error

jaff · Sep 21, 2022

Hello! We are running into an issue when trying to run a Tape backup job of about 9.5TB to our tape library and it fails shortly after starting with the following error:

TASK ERROR: write chunk failed - write failed - scsi command failed: transport error

Other smaller jobs (~400GB) have completed with no issue so far. I tried looking in /var/log/syslog and dmesg to find a more descriptive error with no luck.

Our server is a Dell R730XD with a LSI HBA adapter that connects to our tape library via a SFF 8088 mini-SAS direct connection. Other operations such as labeling or moving tapes complete with no error. It is just when we try backing up the 9.5TB datastore that we get the above error. I was wondering if we could get pointed in the right direction to troubleshoot this error.

Thank you in advance!

dcsapak · Sep 22, 2022

hi,
can you post the complete task log of that tape backup job?

jaff · Sep 22, 2022

dcsapak said:
hi,
can you post the complete task log of that tape backup job?

Hello, yes. Here it is.

Also here is a previous backup where it got farther before failing.

Thank you!

dcsapak · Sep 22, 2022

sadly the log does not help more.... can you check the cables & drive (maybe also check for firmware updates for drive/hba)? also check the host syslog for any messages during that time (you can also post it here, maybe we can see some anomalies). and last please post your pbs versions with 'proxmox-backup-manager versions --verbose'

jaff · Sep 22, 2022

dcsapak said:
sadly the log does not help more.... can you check the cables & drive (maybe also check for firmware updates for drive/hba)? also check the host syslog for any messages during that time (you can also post it here, maybe we can see some anomalies). and last please post your pbs versions with 'proxmox-backup-manager versions --verbose'

Here is the syslog output at time of failure. I ran another backup to generate this.

Code:

Sep 22 10:17:11 pbs proxmox-backup-proxy[3486]: write rrd data back to disk
Sep 22 10:17:11 pbs proxmox-backup-proxy[3486]: starting rrd data sync
Sep 22 10:17:11 pbs proxmox-backup-proxy[3486]: rrd journal successfully committed (30 files in 0.018 seconds)
Sep 22 10:47:12 pbs proxmox-backup-proxy[3486]: write rrd data back to disk
Sep 22 10:47:12 pbs proxmox-backup-proxy[3486]: starting rrd data sync
Sep 22 10:47:12 pbs proxmox-backup-proxy[3486]: rrd journal successfully committed (30 files in 0.019 seconds)
Sep 22 10:49:11 pbs kernel: [156769.235780] mpt2sas_cm0: log_info(0x3112010c): originator(PL), code(0x12), sub_code(0x010c)
Sep 22 10:49:12 pbs proxmox-backup-proxy[3486]: could not send chunk to reader thread: sending on a closed channel

The next lines of the log are just Proxmox sending an email to the SMTP server to notify me.

I am in contact with the tape library manufacturer to get updated firmware for the tape drive. I'll report back when I do. I also ordered a new Dell HBA and cable which should arrive in a few days and I can test the new hardware.

Here is the output of proxmox-backup-manager versions --verbose

Code:

root@pbs:~# proxmox-backup-manager versions --verbose
proxmox-backup             2.2-1        running kernel: 5.15.53-1-pve
proxmox-backup-server      2.2.6-1      running version: 2.2.5      
pve-kernel-helper          7.2-12                                  
pve-kernel-5.15            7.2-10                                  
pve-kernel-5.15.53-1-pve   5.15.53-1                                
pve-kernel-5.15.39-4-pve   5.15.39-4                                
pve-kernel-5.15.39-1-pve   5.15.39-1                                
pve-kernel-5.15.35-1-pve   5.15.35-3                                
ifupdown2                  3.1.0-1+pmx3                            
libjs-extjs                7.0.0-1                                  
proxmox-backup-docs        2.2.6-1                                  
proxmox-backup-client      2.2.6-1                                  
proxmox-mini-journalreader 1.2-1                                    
proxmox-widget-toolkit     3.5.1                                    
pve-xtermjs                4.16.0-1                                
smartmontools              7.2-pve3                                
zfsutils-linux             2.1.5-pve1

Thank you

Update:

I've updated the Tape drive firmware and am still receiving the same error unfortunately. I'll report back when I get the new HBA and mini SAS cable in.

jaff · Sep 27, 2022

dcsapak said:
sadly the log does not help more.... can you check the cables & drive (maybe also check for firmware updates for drive/hba)? also check the host syslog for any messages during that time (you can also post it here, maybe we can see some anomalies). and last please post your pbs versions with 'proxmox-backup-manager versions --verbose'

I just wanted to follow up on this post to say I got the new Dell HBA and cable in and unfortunately am still receiving the same error. Here's the log of the backup using the new HBA. It looks like it starts off okay and then fails eventually.

Code:

2022-09-27T11:29:30-05:00: update media online status
2022-09-27T11:29:30-05:00: starting new media set - reason: forced
2022-09-27T11:29:30-05:00: media set uuid: b2bd3a11-22e3-45d1-a5d0-2ba1d8804b22
2022-09-27T11:29:30-05:00: found 2 groups
2022-09-27T11:29:30-05:00: datastore 'Media', root namespace, group host/benchmark was empty
2022-09-27T11:29:30-05:00: backup snapshot "vm/105/2022-09-14T19:02:34Z"
2022-09-27T11:29:38-05:00: allocated new writable media 'TSA003L8'
2022-09-27T11:29:38-05:00: loading media 'TSA003L8' into drive 'magstor_drive'
2022-09-27T11:31:19-05:00: found media label TSA003L8 (908f0f89-47fb-4a46-b734-3ec87e552b3d)
2022-09-27T11:31:19-05:00: writing new media set label (overwrite '00000000-0000-0000-0000-000000000000/0')
2022-09-27T11:31:28-05:00: moving to end of media
2022-09-27T11:31:31-05:00: arrived at end of media
2022-09-27T12:09:36-05:00: wrote 1517 chunks (4295.49 MB at 1.88 MB/s)
2022-09-27T12:10:08-05:00: wrote 1038 chunks (4297.59 MB at 132.72 MB/s)
2022-09-27T12:10:40-05:00: wrote 1051 chunks (4298.90 MB at 135.77 MB/s)
2022-09-27T12:11:12-05:00: wrote 1064 chunks (4298.38 MB at 133.24 MB/s)
2022-09-27T12:11:43-05:00: wrote 1058 chunks (4296.54 MB at 138.81 MB/s)
2022-09-27T12:12:16-05:00: wrote 1055 chunks (4297.59 MB at 129.86 MB/s)
2022-09-27T12:12:48-05:00: wrote 1069 chunks (4297.06 MB at 135.26 MB/s)
2022-09-27T12:13:21-05:00: wrote 1065 chunks (4296.02 MB at 130.92 MB/s)
2022-09-27T12:13:54-05:00: wrote 1067 chunks (4297.59 MB at 130.12 MB/s)
2022-09-27T12:14:23-05:00: TASK ERROR: write chunk failed - write failed - scsi command failed: transport error

dcsapak · Sep 28, 2022

mhmm... thats weird, a scsi transport error normally occurs only when the transport of the command/data fails (as the name suggests) like in case of a bad cable/hba/drive/etc..

did you check for a firmware upgrade of the drive/changer?
can you post the output of

Code:

proxmox-tape status --drive <drive_name>

?

jaff · Sep 29, 2022

dcsapak said:
mhmm... thats weird, a scsi transport error normally occurs only when the transport of the command/data fails (as the name suggests) like in case of a bad cable/hba/drive/etc..

did you check for a firmware upgrade of the drive/changer?
can you post the output of

Code:

proxmox-tape status --drive <drive_name>

?

It's very weird, but I appreciate you bearing with me. Here's the command output:

Code:

┌────────────────┬──────────────────────────┐
│ Name           │ Value                    │
╞════════════════╪══════════════════════════╡
│ blocksize      │ 0                        │
├────────────────┼──────────────────────────┤
│ density        │ LTO8                     │
├────────────────┼──────────────────────────┤
│ compression    │ 1                        │
├────────────────┼──────────────────────────┤
│ buffer-mode    │ 1                        │
├────────────────┼──────────────────────────┤
│ alert-flags    │ (empty)                  │
├────────────────┼──────────────────────────┤
│ file-number    │ 11                       │
├────────────────┼──────────────────────────┤
│ block-number   │ 160190                   │
├────────────────┼──────────────────────────┤
│ manufactured   │ Mon Sep 20 19:00:00 2021 │
├────────────────┼──────────────────────────┤
│ bytes-written  │ 45.257 GiB               │
├────────────────┼──────────────────────────┤
│ bytes-read     │ 1.302 GiB                │
├────────────────┼──────────────────────────┤
│ medium-passes  │ 220                      │
├────────────────┼──────────────────────────┤
│ medium-wearout │ 1.38%                    │
├────────────────┼──────────────────────────┤
│ volume-mounts  │ 12                       │
└────────────────┴──────────────────────────┘

I've also been in contact with the tape library manufacturer and they provided me with a firmware update which I applied to the drive. It still is giving the error unfortunately. However, they did mention to make sure our HBA adapter has TLR (Transport Layer Retries) as a feature since the absence of it can cause intermittent problems with the communication between the card and the library. I will be checking on that today and report back.

jaff · Nov 4, 2022

In case anyone else has the same issue. After about a month of troubleshooting and trying different things (updating HBA firmware, updating server firmware, seating HBA card in different PCI slots, changing out cables, cleaning the tape drive, etc.) I was finally able to make a full tape backup of our 9.5TB datastore. For some reason, when I plug in two SFF 8088 cables to make a redundant connection between the dual port HBA and the tape library, the backup is able to complete without the write error I was receiving. When either cable was tested on it's own, the backup would fail.

Hopefully someone can shed some light on why this works, but in the meantime I'm just glad it's backing up now.

dcsapak · Nov 7, 2022

weird, did not encounter such an issue before. did you also try to replace the hba itself? maybe it's broken and/or the firmware has some quirks?

anyway, glad you can use it now

jaff · Nov 7, 2022

dcsapak said:
weird, did not encounter such an issue before. did you also try to replace the hba itself? maybe it's broken and/or the firmware has some quirks?

anyway, glad you can use it now

I tried replacing it with a Dell brand card (Dell Dual Port 6Gbps SAS Controller), but ran into the same issue with the write error. I ended up putting the LSI 9200-8e card back in which is what I'm currently using now. When I can, I'll try to do more testing with both cards. Anyways, I appreciate you helping guide me through the process!

jw6677 · Dec 17, 2022

I have to say, I was really feeling quite hopeless when I too was experiencing this issue!

Here's my backstory:
https://forum.proxmox.com/threads/p...led-pbs-host-disk-failure.119486/#post-518481

TLDR: I am in disaster recovery mode, with my last true hope being the tape backups.

I was exceptionally surprised and now have a glimmer of hope, that @jaff is experiencing this issue too!

Code:

root@hal-backup:~# proxmox-tape changer status tapechanger
┌────────────┬──────────┬────────────┬─────────────┐
│ entry-kind │ entry-id │ label-text │ loaded-slot │
╞════════════╪══════════╪════════════╪═════════════╡
│ drive      │        0 │ JW0128L8   │           9 │
├────────────┼──────────┼────────────┼─────────────┤
│ slot       │        1 │ JW0129L8   │             │
├────────────┼──────────┼────────────┼─────────────┤
│ slot       │        2 │ JES715L4   │             │
├────────────┼──────────┼────────────┼─────────────┤
│ slot       │        3 │ OBC616L4   │             │
├────────────┼──────────┼────────────┼─────────────┤
│ slot       │        4 │ JES732L4   │             │
├────────────┼──────────┼────────────┼─────────────┤
│ slot       │        5 │ JES712L4   │             │
├────────────┼──────────┼────────────┼─────────────┤
│ slot       │        6 │ JES752L4   │             │
├────────────┼──────────┼────────────┼─────────────┤
│ slot       │        7 │ JW0127L8   │             │
├────────────┼──────────┼────────────┼─────────────┤
│ slot       │        8 │ JW0126L8   │             │
├────────────┼──────────┼────────────┼─────────────┤
│ slot       │        9 │            │             │
├────────────┼──────────┼────────────┼─────────────┤
│ slot       │       10 │ JW0164L8   │             │
├────────────┼──────────┼────────────┼─────────────┤
│ slot       │       11 │ JW0137L8   │             │
├────────────┼──────────┼────────────┼─────────────┤
│ slot       │       12 │ JW0166L8   │             │
├────────────┼──────────┼────────────┼─────────────┤
│ slot       │       13 │ JW0134L8   │             │
├────────────┼──────────┼────────────┼─────────────┤
│ slot       │       14 │ JW0125L8   │             │
├────────────┼──────────┼────────────┼─────────────┤
│ slot       │       15 │ JW0167L8   │             │
├────────────┼──────────┼────────────┼─────────────┤
│ slot       │       16 │ JW0170L8   │             │
└────────────┴──────────┴────────────┴─────────────┘
root@hal-backup:~# proxmox-tape status --drive mytapedrive
┌───────────────┬──────────────────────────┐
│ Name          │ Value                    │
╞═══════════════╪══════════════════════════╡
│ blocksize     │ 0                        │
├───────────────┼──────────────────────────┤
│ density       │ LTO4                     │
├───────────────┼──────────────────────────┤
│ compression   │ 1                        │
├───────────────┼──────────────────────────┤
│ buffer-mode   │ 1                        │
├───────────────┼──────────────────────────┤
│ file-number   │ 0                        │
├───────────────┼──────────────────────────┤
│ block-number  │ 0                        │
├───────────────┼──────────────────────────┤
│ manufactured  │ Sat Sep 27 18:00:00 2008 │
├───────────────┼──────────────────────────┤
│ bytes-written │ 611.793 GiB              │
├───────────────┼──────────────────────────┤
│ bytes-read    │ 659.175 GiB              │
└───────────────┴──────────────────────────┘

Code:

ProxmoxBackup Server 2.3-1
2022-12-16T17:34:38-07:00: found media label: {
  "label": {
    "ctime": 1648038649,
    "label_text": "JW0127L8",
    "uuid": "bfa3b6b0-ec51-45d7-9336-477bb25bc1f4"
  },
  "media_set_label": {
    "ctime": 1659474317,
    "pool": "hal-mediapool",
    "seq_nr": 2,
    "uuid": "92026b9b-213e-4621-a613-ff9308156583"
  }
}
2022-12-16T17:34:38-07:00: found catalog at pos 2
2022-12-16T17:34:38-07:00: successfully restored related catalog eafc699a-3095-44a3-be2a-d78de9d1dbd6
2022-12-16T17:34:38-07:00: found catalog at pos 3
2022-12-16T17:34:38-07:00: successfully restored related catalog e6372ce5-4778-44b9-8aed-bb87a295d32e
2022-12-16T17:34:38-07:00: searching for catalog at EOT (moving to EOT)
2022-12-16T17:35:45-07:00: no catalog found
2022-12-16T17:35:45-07:00: scanning entire media to reconstruct catalog
2022-12-16T17:36:24-07:00: File 2: skip catalog 'eafc699a-3095-44a3-be2a-d78de9d1dbd6'
2022-12-16T17:36:24-07:00: File 3: skip catalog 'e6372ce5-4778-44b9-8aed-bb87a295d32e'
2022-12-16T17:36:25-07:00: File 4: chunk archive for datastore 'officestore'
2022-12-16T17:37:00-07:00: register 1097 chunks
2022-12-16T17:37:00-07:00: File 5: chunk archive for datastore 'officestore'
2022-12-16T17:37:36-07:00: register 1128 chunks
2022-12-16T17:37:36-07:00: File 6: chunk archive for datastore 'officestore'
2022-12-16T17:38:13-07:00: register 1413 chunks
2022-12-16T17:38:13-07:00: File 7: chunk archive for datastore 'officestore'
2022-12-16T17:38:51-07:00: register 1289 chunks
2022-12-16T17:38:51-07:00: File 8: chunk archive for datastore 'officestore'
2022-12-16T17:39:32-07:00: register 1077 chunks
2022-12-16T17:39:32-07:00: File 9: chunk archive for datastore 'officestore'
2022-12-16T17:40:19-07:00: register 1119 chunks
2022-12-16T17:40:19-07:00: File 10: chunk archive for datastore 'officestore'
2022-12-16T17:41:00-07:00: register 1501 chunks
2022-12-16T17:41:00-07:00: File 11: chunk archive for datastore 'officestore'
2022-12-16T17:41:35-07:00: register 1108 chunks
2022-12-16T17:41:35-07:00: File 12: chunk archive for datastore 'officestore'
2022-12-16T17:42:11-07:00: register 1155 chunks
2022-12-16T17:42:11-07:00: File 13: chunk archive for datastore 'officestore'
2022-12-16T17:42:48-07:00: register 1037 chunks
2022-12-16T17:42:48-07:00: File 14: chunk archive for datastore 'officestore'
2022-12-16T17:43:23-07:00: register 1730 chunks
2022-12-16T17:43:23-07:00: File 15: chunk archive for datastore 'officestore'
2022-12-16T17:43:58-07:00: register 1567 chunks
2022-12-16T17:43:58-07:00: File 16: chunk archive for datastore 'officestore'
2022-12-16T17:44:33-07:00: register 2125 chunks
2022-12-16T17:44:33-07:00: File 17: chunk archive for datastore 'officestore'
2022-12-16T17:44:46-07:00: TASK ERROR: read failed - scsi command failed: transport error

Code:

[...]
[    7.821187] bochs-drm 0000:00:01.0: vgaarb: deactivate vga console
[    7.878086] Console: switching to colour dummy device 80x25
[    7.878573] [drm] Found bochs VGA, ID 0xb0c5.
[    7.878580] [drm] Framebuffer size 16384 kB @ 0xfb000000, mmio @ 0xfea14000.
[    7.883577] [drm] Found EDID data blob.
[    7.890386] [drm] Initialized bochs-drm 1.0.0 20130925 for 0000:00:01.0 on minor 0
[    7.913630] fbcon: bochs-drmdrmfb (fb0) is primary device
[    7.928252] Console: switching to colour frame buffer device 160x50
[    7.934297] bochs-drm 0000:00:01.0: [drm] fb0: bochs-drmdrmfb frame buffer device
[    7.957771] RAPL PMU: API unit is 2^-32 Joules, 0 fixed counters, 10737418240 ms ovfl timer
[    7.990008] cryptd: max_cpu_qlen set to 1000
[    8.038659] AVX version of gcm_enc/dec engaged.
[    8.039722] AES CTR mode by8 optimization enabled
[    8.188512] ZFS: Loaded module v2.1.6-pve1, ZFS pool version 5000, ZFS filesystem version 5
[    8.252189] Adding 4063228k swap on /dev/mapper/pbs-swap.  Priority:-2 extents:1 across:4063228k FS
[    9.344249] EXT4-fs (md127): recovery complete
[    9.347905] EXT4-fs (md127): mounted filesystem with ordered data mode. Opts: discard. Quota mode: none.
[ 2354.932411] st 2:0:0:0: device_block, handle(0x0009)
[ 2354.932434] ch 2:0:0:1: device_block, handle(0x0009)
[ 2415.180711] st 2:0:0:0: device_unblock and setting to running, handle(0x0009)
[ 2415.180736] ch 2:0:0:1: device_unblock and setting to running, handle(0x0009)
[ 2415.185718] mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x500e09e00e3aa010)
[ 2415.185730] mpt2sas_cm0: removing handle(0x0009), sas_addr(0x500e09e00e3aa010)
[ 2415.185736] mpt2sas_cm0: enclosure logical id(0x500605b00209973a), slot(3)
[ 2684.175609] mpt2sas_cm0: handle(0x9) sas_address(0x500e09e00e3aa010) port_type(0x1)
[ 2684.181434] scsi 2:0:1:0: Sequential-Access IBM      ULTRIUM-TD4      C7QH PQ: 0 ANSI: 3
[ 2684.181454] scsi 2:0:1:0: SSP: handle(0x0009), sas_addr(0x500e09e00e3aa010), phy(3), device_name(0x0000000000000000)
[ 2684.181461] scsi 2:0:1:0: enclosure logical id (0x500605b00209973a), slot(3)
[ 2684.181471] scsi 2:0:1:0: qdepth(254), tagged(1), scsi_level(4), cmd_que(1)
[ 2684.183655] scsi 2:0:1:0: Power-on or device reset occurred
[ 2684.187609] scsi 2:0:1:0: TLR Disabled
[ 2684.189484] st 2:0:1:0: Attached scsi tape st0
[ 2684.189492] st 2:0:1:0: st0: try direct i/o: yes (alignment 4 B)
[ 2684.189751] st 2:0:1:0: Attached scsi generic sg3 type 1
[ 2684.191294]  end_device-2:1: add: handle(0x0009), sas_addr(0x500e09e00e3aa010)
[ 2790.422251] st 2:0:1:0: device_block, handle(0x0009)
[ 2804.922516] st 2:0:1:0: device_unblock and setting to running, handle(0x0009)

jw6677 · Dec 17, 2022

Code:

(journalctl -f)
[...]
Dec 16 17:21:02 hal-backup systemd[1]: Finished Cleanup of Temporary Directories.
░░ Subject: A start job for unit systemd-tmpfiles-clean.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit systemd-tmpfiles-clean.service has finished successfully.
░░
░░ The job identifier is 503.
Dec 16 17:35:49 hal-backup proxmox-backup-proxy[920]: write rrd data back to disk
Dec 16 17:35:49 hal-backup proxmox-backup-proxy[920]: starting rrd data sync
Dec 16 17:35:50 hal-backup proxmox-backup-proxy[920]: rrd journal successfully committed (25 files in 0.095 seconds)
Dec 16 17:44:46 hal-backup kernel: st 2:0:0:0: device_block, handle(0x0009)
Dec 16 17:44:46 hal-backup kernel: ch 2:0:0:1: device_block, handle(0x0009)
Dec 16 17:45:47 hal-backup kernel: st 2:0:0:0: device_unblock and setting to running, handle(0x0009)
Dec 16 17:45:47 hal-backup kernel: ch 2:0:0:1: device_unblock and setting to running, handle(0x0009)
Dec 16 17:45:47 hal-backup kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x500e09e00e3aa010)
Dec 16 17:45:47 hal-backup kernel: mpt2sas_cm0: removing handle(0x0009), sas_addr(0x500e09e00e3aa010)
Dec 16 17:45:47 hal-backup kernel: mpt2sas_cm0: enclosure logical id(0x500605b00209973a), slot(3)
Dec 16 17:50:16 hal-backup kernel: mpt2sas_cm0: handle(0x9) sas_address(0x500e09e00e3aa010) port_type(0x1)
Dec 16 17:50:16 hal-backup kernel: scsi 2:0:1:0: Sequential-Access IBM      ULTRIUM-TD4      C7QH PQ: 0 ANSI: 3
Dec 16 17:50:16 hal-backup kernel: scsi 2:0:1:0: SSP: handle(0x0009), sas_addr(0x500e09e00e3aa010), phy(3), device_name(0x0000000000000000)
Dec 16 17:50:16 hal-backup kernel: scsi 2:0:1:0: enclosure logical id (0x500605b00209973a), slot(3)
Dec 16 17:50:16 hal-backup kernel: scsi 2:0:1:0: qdepth(254), tagged(1), scsi_level(4), cmd_que(1)
Dec 16 17:50:16 hal-backup kernel: scsi 2:0:1:0: Power-on or device reset occurred
Dec 16 17:50:16 hal-backup kernel: scsi 2:0:1:0: TLR Disabled
Dec 16 17:50:16 hal-backup kernel: st 2:0:1:0: Attached scsi tape st0
Dec 16 17:50:16 hal-backup kernel: st 2:0:1:0: st0: try direct i/o: yes (alignment 4 B)
Dec 16 17:50:16 hal-backup kernel: st 2:0:1:0: Attached scsi generic sg3 type 1
Dec 16 17:50:16 hal-backup kernel:  end_device-2:1: add: handle(0x0009), sas_addr(0x500e09e00e3aa010)
Dec 16 17:52:02 hal-backup kernel: st 2:0:1:0: device_block, handle(0x0009)
Dec 16 17:52:16 hal-backup kernel: st 2:0:1:0: device_unblock and setting to running, handle(0x0009)

jw6677 · Dec 17, 2022

I am passing through an LSI adapter to a PBS VM.
The only thing that looks relevant from the host dmesg is:

Code:

[88947.269220] mpt2sas_cm1: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
[88949.519316] mpt2sas_cm1: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
[88951.769212] mpt2sas_cm1: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
[88954.019234] mpt2sas_cm1: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
[88967.269235] mpt2sas_cm1: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
[88969.519344] mpt2sas_cm1: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
[88971.769238] mpt2sas_cm1: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
[88974.019222] mpt2sas_cm1: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)

jw6677 · Dec 17, 2022

Oh, I forgot to mention, I am not as fortunate as @jaff, my tape library only has one SAS port, and so I cannot use the two cable solution.

And I am seeing the issue on read, not write, but it also only appears for larger data transfer tasks, all the smaller tasks of changing tapes, etc, work as expected.

Cataloging empty and near empty tapes also works fine.
It's only when trying to inventory or catalogue full tapes, that I see this issue.

I am definitely in a bind. Any help would be greatly appreciated.

jw6677 · Dec 17, 2022

Code:

root@hal-backup:~# proxmox-backup-manager versions --verbose
proxmox-backup                2.3-1        running kernel: 5.15.74-1-pve
proxmox-backup-server         2.3.1-1      running version: 2.3.1       
pve-kernel-helper             7.3-1                                     
pve-kernel-5.15               7.2-14                                   
pve-kernel-5.15.74-1-pve      5.15.74-1                                 
ifupdown2                     3.1.0-1+pmx3                             
libjs-extjs                   7.0.0-1                                   
proxmox-backup-docs           2.3.1-1                                   
proxmox-backup-client         2.3.1-1                                   
proxmox-mini-journalreader    1.2-1                                     
proxmox-offline-mirror-helper unknown                                   
proxmox-widget-toolkit        3.5.3                                     
pve-xtermjs                   4.16.0-1                                 
smartmontools                 7.2-pve3                                 
zfsutils-linux                2.1.6-pve1

jw6677 · Dec 17, 2022

jaff said:
tape library via a SFF 8088 mini-SAS direct connection. Other operations such as labeling or moving tapes complete with no error

Same on my end.

Lastly, I've tried a couple of different cables, and a different HBA, to no luck.

jw6677 · Dec 17, 2022

Okay, I am definitely at the end of my knowledge when it comes to trying to debug this. Can anyone shed any light on where I should be looking for more error messages or similar to try and figure this out?

jw6677 · Dec 17, 2022

Okay, put together a new library from parts of other dead tape libraries, and have this working.

My apologies and best of luck to anyone else who comes across this issue!

Tape job failing with SCSI transport error

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

New Member

Proxmox Staff Member

New Member

New Member

Proxmox Staff Member

New Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member