Help me debug ZFS stability issues (unstable connection to drives i think)

pedja1

New Member
Nov 13, 2025
3
1
1
I have 6 Seagate IronWolf Pro connected to LSI 9300 8i
My zfs pool constantly enters degraded state due to some kind of connection issue with the drives.
When i run `zpool clear nas`, everything goes back to normal, sometimes for days sometimes only minutes later, it happens again

I tried couple of different cables, I tried re-seating all the connector on both lsi and the drives, nothing helps
From the log i dont think the drives are the issue but rather the connection (cable), but i am not sure
I also tried re-routing the cables, thinking that maybe it is some kind of interference, all the cables were very close in the back of the case

Random drive has error, not just the one from dmesg

I would appreciate any help. Let me know if any more logs are needed, i attached what i could think of

root@proxmox:~# pveversion
pve-manager/9.0.11/3bf5476b8a4699e2 (running kernel: 6.14.11-4-pve)

Code:
root@proxmox:~# ./lsi/Installer_P16_for_Linux/sas3flash_linux_x64_rel/sas3flash -list
Avago Technologies SAS3 Flash Utility
Version 17.00.00.00 (2018.04.02)
Copyright 2008-2018 Avago Technologies. All rights reserved.

    Adapter Selected is a Avago SAS: SAS3008(C0)

    Controller Number              : 0
    Controller                     : SAS3008(C0)
    PCI Address                    : 00:67:00:00
    SAS Address                    : 5003048-0-1bfc-0202
    NVDATA Version (Default)       : 0e.01.00.07
    NVDATA Version (Persistent)    : 0e.01.00.07
    Firmware Product ID            : 0x2221 (IT)
    Firmware Version               : 16.00.10.00
    NVDATA Vendor                  : LSI
    NVDATA Product ID              : SAS9300-8i
    BIOS Version                   : 08.37.00.00
    UEFI BSD Version               : 15.00.00.00
    FCODE Version                  : N/A
    Board Name                     : LSI3008-IR
    Board Assembly                 : N/A
    Board Tracer Number            : N/A

    Finished Processing Commands Successfully.
    Exiting SAS3Flash.

Code:
[1041462.919922] sd 6:0:17:0: [sdd] tag#1328 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=3s
[1041462.920099] sd 6:0:17:0: [sdd] tag#1328 Sense Key : Not Ready [current]
[1041462.920233] sd 6:0:17:0: [sdd] tag#1328 Add. Sense: Logical unit not ready, cause not reportable
[1041462.920362] sd 6:0:17:0: [sdd] tag#1328 CDB: Read(16) 88 00 00 00 00 00 e0 31 4b 48 00 00 00 08 00 00
[1041462.920494] I/O error, dev sdd, sector 3761326920 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[1041462.920628] zio pool=nas vdev=/dev/disk/by-id/ata-ST16000NE000-2WX103_ZR506VAA-part1 error=5 type=1 offset=1925798334464 size=4096 flags=3145856
[1041463.169934] sd 6:0:17:0: [sdd] tag#1329 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[1041463.170141] sd 6:0:17:0: [sdd] tag#1329 Sense Key : Not Ready [current]
[1041463.170288] sd 6:0:17:0: [sdd] tag#1329 Add. Sense: Logical unit not ready, cause not reportable
[1041463.170439] sd 6:0:17:0: [sdd] tag#1329 CDB: Read(16) 88 00 00 00 00 00 00 00 0a 10 00 00 00 10 00 00
[1041463.170589] I/O error, dev sdd, sector 2576 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[1041463.170745] zio pool=nas vdev=/dev/disk/by-id/ata-ST16000NE000-2WX103_ZR506VAA-part1 error=5 type=1 offset=270336 size=8192 flags=1245377
[1041463.170915] sd 6:0:17:0: [sdd] tag#1330 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[1041463.171091] sd 6:0:17:0: [sdd] tag#1330 Sense Key : Not Ready [current]
[1041463.171258] sd 6:0:17:0: [sdd] tag#1330 Add. Sense: Logical unit not ready, cause not reportable
[1041463.171427] sd 6:0:17:0: [sdd] tag#1330 CDB: Read(16) 88 00 00 00 00 07 46 bf b4 10 00 00 00 10 00 00
[1041463.171606] I/O error, dev sdd, sector 31251739664 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[1041463.171781] zio pool=nas vdev=/dev/disk/by-id/ata-ST16000NE000-2WX103_ZR506VAA-part1 error=5 type=1 offset=16000889659392 size=8192 flags=1245377
[1041463.171971] sd 6:0:17:0: [sdd] tag#1331 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[1041463.172159] sd 6:0:17:0: [sdd] tag#1331 Sense Key : Not Ready [current]
[1041463.172342] sd 6:0:17:0: [sdd] tag#1331 Add. Sense: Logical unit not ready, cause not reportable
[1041463.172530] sd 6:0:17:0: [sdd] tag#1331 CDB: Read(16) 88 00 00 00 00 07 46 bf b6 10 00 00 00 10 00 00
[1041463.172716] I/O error, dev sdd, sector 31251740176 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[1041463.172906] zio pool=nas vdev=/dev/disk/by-id/ata-ST16000NE000-2WX103_ZR506VAA-part1 error=5 type=1 offset=16000889921536 size=8192 flags=1245377

--------------------------------------
Ctrl_Prop Value
--------------------------------------
ROC temperature(Degree Celsius) 58
--------------------------------------

Code:
root@proxmox:~# ./zpool_status.sh
  pool: nas
 state: ONLINE
  scan: resilvered 226M in 00:00:03 with 0 errors on Thu Nov 13 09:57:18 2025
config:

    NAME                                  STATE     READ WRITE CKSUM
    nas                                   ONLINE       0     0     0
      raidz1-0                            ONLINE       0     0     0
        ata-ST16000NE000-2WX103_ZR503HPD  ONLINE       0     0     0  (1-1)
        ata-ST16000NE000-2WX103_ZR50C43M  ONLINE       0     0     0  (1-2)
        ata-ST16000NE000-2WX103_ZR506G8G  ONLINE       0     0     0  (1-3)
        ata-ST16000NE000-2WX103_ZR506VAA  ONLINE       0     0     0  (1-4)
        ata-ST16000NE000-2WX103_ZR503671  ONLINE       0     0     0  (2-1)
        ata-ST16000NE000-2WX103_ZR5039NG  ONLINE       0     0     0  (2-2)

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:21:13 with 0 errors on Sun Nov  9 00:45:27 2025
config:

    NAME                                 STATE     READ WRITE CKSUM
    rpool                                ONLINE       0     0     0
      mirror-0                           ONLINE       0     0     0
        nvme-eui.0025384941a17376-part3  ONLINE       0     0     0
        nvme-eui.0025384941a17369-part3  ONLINE       0     0     0

errors: No known data errors
 
Last edited:
Other people reported similar things with the same controller:
Maybe try a different drive controller or update the controller firmware?
 
  • Like
Reactions: pedja1
Firmware is the latest available for that controller
I don't have another controller to try with, but i guess ill try to find one