Connecting HDD causes ZFS pool to suspend

TDW0kxvche9

Member
Sep 1, 2022
24
7
8
Hi there,
i'm running a physical PBS 4 with
- ZFS Mirror SSD for OS (rpool), connected to the OnBoard SATA-Controller
- ZFS RAID10 with 6x 16TB for Backups (rpool_backup), connected to a HP H240 RAID-Controller (HBA)

I've installed a 5,25" HotSwap-Enclosure, because i want to sync backups to a portable harddisk.

My problem:
If i connect the 3,5" in the 5,25"-Bay, the system recognize the disk.
In the same moment, two of the 6 16"-HDDs seems to get dis- and reconnected, OR, they get a new devicename (sdc -> sdd).
In that moment, the ZFS-Pool get's suspended.
It's not relevant, if the external bay is connected to the OnBoard-SATA-Port or the Raid-Controller.

dmesg:
Code:
[  247.255791] INFO: task agents:1012 blocked for more than 122 seconds.
[  247.256200]       Tainted: P           O       6.14.11-1-pve #1
[  247.256627] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.257001] task:agents          state:D stack:0     pid:1012  tgid:1005  ppid:1      task_flags:0x480140 flags:0x00004002
[  247.257384] Call Trace:
[  247.257753]  <TASK>
[  247.258114]  __schedule+0x466/0x1400
[  247.258483]  ? ttwu_do_activate+0x8d/0x280
[  247.258853]  schedule+0x29/0x130
[  247.259223]  io_schedule+0x4c/0x80
[  247.259615]  cv_wait_common+0xb1/0x140 [spl]
[  247.259988]  ? __pfx_autoremove_wake_function+0x10/0x10
[  247.260393]  __cv_wait_io+0x18/0x30 [spl]
[  247.260791]  txg_wait_synced_flags+0xd8/0x130 [zfs]
[  247.261317]  txg_wait_synced+0x10/0x60 [zfs]
[  247.261825]  spa_vdev_state_exit+0x94/0x170 [zfs]
[  247.262336]  vdev_remove_wanted+0x95/0x120 [zfs]
[  247.262837]  zfs_ioc_vdev_set_state+0xb7/0x1d0 [zfs]
[  247.263363]  zfsdev_ioctl_common+0x7c4/0x980 [zfs]
[  247.263861]  zfsdev_ioctl+0x57/0xf0 [zfs]
[  247.264381]  __x64_sys_ioctl+0xa7/0xe0
[  247.264719]  x64_sys_call+0x1053/0x2310
[  247.265050]  do_syscall_64+0x7e/0x170
[  247.265381]  ? filemap_map_pages+0x56a/0x6e0
[  247.265705]  ? __mod_memcg_lruvec_state+0xc2/0x1d0
[  247.266026]  ? __lruvec_stat_mod_folio+0x8b/0xf0
[  247.266348]  ? set_ptes.isra.0+0x3b/0x90
[  247.266666]  ? do_anonymous_page+0x105/0x920
[  247.266985]  ? ___pte_offset_map+0x1c/0x1a0
[  247.267330]  ? __handle_mm_fault+0xbc0/0x1040
[  247.267639]  ? spl_kmem_free_impl+0x2c/0x40 [spl]
[  247.267966]  ? zfsdev_ioctl+0xa9/0xf0 [zfs]
[  247.268449]  ? __count_memcg_events+0xc0/0x160
[  247.268777]  ? count_memcg_events.constprop.0+0x2a/0x50
[  247.269076]  ? handle_mm_fault+0x22d/0x350
[  247.269380]  ? do_user_addr_fault+0x5e9/0x7e0
[  247.269705]  ? arch_exit_to_user_mode_prepare.isra.0+0x22/0xd0
[  247.269992]  ? irqentry_exit_to_user_mode+0x2d/0x1d0
[  247.270276]  ? irqentry_exit+0x43/0x50
[  247.270585]  ? exc_page_fault+0x96/0x1e0
[  247.270856]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  247.271124] RIP: 0033:0x7e4ebf4328db
[  247.271454] RSP: 002b:00007e4ebe0c33b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  247.271764] RAX: ffffffffffffffda RBX: 00007e4ea8039f50 RCX: 00007e4ebf4328db
[  247.272040] RDX: 00007e4ebe0c3420 RSI: 0000000000005a0d RDI: 000000000000000a
[  247.272388] RBP: 00007e4ebe0c6e10 R08: 0000000000000040 R09: 0000000000000000
[  247.272672] R10: 00007e4ebf4a95c0 R11: 0000000000000246 R12: 00007e4ebe0c69d0
[  247.272955] R13: 00005b6b705794a0 R14: 00007e4ea80377e0 R15: 00007e4ebe0c3420
[  247.273241]  </TASK>
[  297.737077] WARNING: Pool 'rpool_backup' was suspended and is being resumed. Failed I/O will be retried.

I can clear the pool (zfs clear rpool_backup) and all seems to be okay again:
Code:
root@sgb-pbs2:~# zpool status
  pool: rpool
 state: ONLINE
config:

        NAME                                                     STATE     READ WRITE CKSUM
        rpool                                                    ONLINE       0     0     0
          mirror-0                                               ONLINE       0     0     0
            ata-Samsung_SSD_860_PRO_512GB_S42YNS0T302157Y-part3  ONLINE       0     0     0
            ata-Samsung_SSD_860_PRO_512GB_S42YNS0T302145H-part3  ONLINE       0     0     0

errors: No known data errors

  pool: rpool_backup
 state: ONLINE
  scan: resilvered 0B in 00:00:01 with 0 errors on Fri Aug 29 10:45:35 2025
config:

        NAME                        STATE     READ WRITE CKSUM
        rpool_backup                ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            wwn-0x5000c500f7dfd243  ONLINE       0     0     0
            wwn-0x5000c500f7dfab6c  ONLINE       0     0     0
          mirror-1                  ONLINE       0     0     0
            wwn-0x5000c500f7dfb3c5  ONLINE       0     0     0
            wwn-0x5000c500f7dfae02  ONLINE       0     0     0
          mirror-2                  ONLINE       0     0     0
            wwn-0x5000c500f7df7c8e  ONLINE       0     0     0
            wwn-0x5000c500f7dfd141  ONLINE       0     0     0

errors: No known data errors

I'm new to ZFS, and i'm unsure, how i can get around this issue. I've read in another post (https://superuser.com/questions/125...adding-new-disk-to-server-mount-point-changed) that Linux could rename the device names (sda, sdb, etc.) if a new device is connected or remove; its possible to solve this, by addressing the disks with their ID, not their device name - but that's already the case (?).

Has anyone an idea, how to solve this problem? The external harddisk is not formated with any filesystem (i've wiped it trough the Web GUI).


Here's a log from the moment, when the new harddisk is connected:
Code:
[   20.160746] hpsa 0000:04:00.0: Acknowledging event: 0x80000000 (HP SSD Smart Path configuration change)
[   97.240106] hpsa 0000:04:00.0: SCSI status: LUN:0000000000800101 CDB:12010000040000000000000000000000
[   97.240296] hpsa 0000:04:00.0: SCSI Status = 02, Sense key = 0x05, ASC = 0x25, ASCQ = 0x00
[   97.245898] hpsa 0000:04:00.0: Acknowledging event: 0x80000012 (HP SSD Smart Path configuration change)
[   97.281271] hpsa 0000:04:00.0: scsi 0:0:1:0: removed Direct-Access     ATA      ST16000NM000J-2T PHYS DRV SSDSmartPathCap- En- Exp=1
[   97.281474] hpsa 0000:04:00.0: scsi 0:0:2:0: removed Direct-Access     ATA      ST16000NM000J-2T PHYS DRV SSDSmartPathCap- En- Exp=1
[   97.324607] sd 0:0:1:0: [sdc] Synchronizing SCSI cache
[   97.324870] sd 0:0:1:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[   97.334047] zio pool=rpool_backup vdev=/dev/disk/by-id/wwn-0x5000c500f7dfd243-part1 error=5 type=1 offset=270336 size=8192 flags=1245889
[   97.335584] zio pool=rpool_backup vdev=/dev/disk/by-id/wwn-0x5000c500f7dfab6c-part1 error=5 type=2 offset=962077265920 size=4096 flags=3145856
[   97.336582] sd 0:0:2:0: [sdd] Synchronizing SCSI cache
[   97.337064] sd 0:0:2:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[   97.372520] zio pool=rpool_backup vdev=/dev/disk/by-id/wwn-0x5000c500f7dfab6c-part1 error=5 type=5 offset=0 size=0 flags=2098304
[   97.392076] zio pool=rpool_backup vdev=/dev/disk/by-id/wwn-0x5000c500f7dfab6c-part1 error=5 type=5 offset=0 size=0 flags=2098304
[   97.392371] zio pool=rpool_backup vdev=/dev/disk/by-id/wwn-0x5000c500f7dfab6c-part1 error=5 type=5 offset=0 size=0 flags=2098304
[   97.406119] zio pool=rpool_backup vdev=/dev/disk/by-id/wwn-0x5000c500f7dfab6c-part1 error=5 type=5 offset=0 size=0 flags=2098304
[   97.406405] WARNING: Pool 'rpool_backup' has encountered an uncorrectable I/O failure and has been suspended.
[  112.598600] hpsa 0000:04:00.0: Acknowledging event: 0x80000002 (HP SSD Smart Path configuration change)
[  112.645033] hpsa 0000:04:00.0: scsi 0:0:1:0: added Direct-Access     ATA      ST16000NM000J-2T PHYS DRV SSDSmartPathCap- En- Exp=1
[  112.645337] hpsa 0000:04:00.0: scsi 0:0:2:0: added Direct-Access     ATA      ST16000NM000J-2T PHYS DRV SSDSmartPathCap- En- Exp=1
[  112.645698] hpsa can't handle SMP requests
[  112.647531] scsi 0:0:7:0: Direct-Access     ATA      ST16000NM000J-2T SC02 PQ: 0 ANSI: 6
[  112.656726] sd 0:0:7:0: Attached scsi generic sg3 type 0
[  112.657667] sd 0:0:7:0: [sdi] 31251759104 512-byte logical blocks: (16.0 TB/14.6 TiB)
[  112.658023] sd 0:0:7:0: [sdi] 4096-byte physical blocks
[  112.658905] sd 0:0:7:0: [sdi] Write Protect is off
[  112.659226] sd 0:0:7:0: [sdi] Mode Sense: 46 00 10 08
[  112.660205] hpsa can't handle SMP requests
[  112.660552] sd 0:0:7:0: [sdi] Write cache: enabled, read cache: enabled, supports DPO and FUA
[  112.662222] scsi 0:0:8:0: Direct-Access     ATA      ST16000NM000J-2T SC02 PQ: 0 ANSI: 6
[  112.671253] scsi 0:0:8:0: Attached scsi generic sg4 type 0
[  112.672394] sd 0:0:8:0: [sdj] 31251759104 512-byte logical blocks: (16.0 TB/14.6 TiB)
[  112.672734] sd 0:0:8:0: [sdj] 4096-byte physical blocks
[  112.673808] sd 0:0:8:0: [sdj] Write Protect is off
[  112.674154] sd 0:0:8:0: [sdj] Mode Sense: 46 00 10 08
[  112.675699] sd 0:0:8:0: [sdj] Write cache: enabled, read cache: enabled, supports DPO and FUA
[  112.725811]  sdi: sdi1 sdi9
[  112.727199] sd 0:0:7:0: [sdi] Attached SCSI disk
[  112.731659]  sdj: sdj1 sdj9
[  112.732354] sd 0:0:8:0: [sdj] Attached SCSI disk

Edit: I've tried it again; i've connected the external Bay to the OnBoard-SATA-Controller, on which the two OS-SSDs (sda,sdb) are connected.
If i connect the external drive, two of the 16TB-Disks (H240, Raid-Controller) gets dis- and reconnected and the pool gets suspended:
Code:
[  153.242348] ata6: link is slow to respond, please be patient (ready=0)
[  159.221095] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  159.229325] ata6.00: ATA-10: ST4000VN008-2DR166, SC60, max UDMA/133
[  159.230096] ata6.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 32), AA
[  159.230464] ata6.00: Features: NCQ-sndrcv
[  159.232316] ata6.00: configured for UDMA/133
[  159.232642] scsi 6:0:0:0: Direct-Access     ATA      ST4000VN008-2DR1 SC60 PQ: 0 ANSI: 5
[  159.233202] sd 6:0:0:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[  159.233432] sd 6:0:0:0: [sdc] 4096-byte physical blocks
[  159.233676] sd 6:0:0:0: Attached scsi generic sg2 type 0
[  159.233919] sd 6:0:0:0: [sdc] Write Protect is off
[  159.234188] sd 6:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[  159.234212] sd 6:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  159.234518] sd 6:0:0:0: [sdc] Preferred minimum I/O size 4096 bytes
[  159.260764] sd 6:0:0:0: [sdc] Attached SCSI disk
[  169.088885] hpsa 0000:04:00.0: SCSI status: LUN:0000000000800001 CDB:12010000040000000000000000000000
[  169.089156] hpsa 0000:04:00.0: SCSI Status = 02, Sense key = 0x05, ASC = 0x25, ASCQ = 0x00
[  169.090306] hpsa 0000:04:00.0: SCSI status: LUN:0000000000800101 CDB:12010000040000000000000000000000
[  169.090559] hpsa 0000:04:00.0: SCSI Status = 02, Sense key = 0x05, ASC = 0x25, ASCQ = 0x00
[  169.101826] hpsa 0000:04:00.0: Acknowledging event: 0x80000012 (HP SSD Smart Path configuration change)
[  169.137043] hpsa 0000:04:00.0: scsi 0:0:1:0: removed Direct-Access     ATA      ST16000NM000J-2T PHYS DRV SSDSmartPathCap- En- Exp=1
[  169.137315] hpsa 0000:04:00.0: scsi 0:0:2:0: removed Direct-Access     ATA      ST16000NM000J-2T PHYS DRV SSDSmartPathCap- En- Exp=1
[  169.174679] sd 0:0:1:0: [sdd] Synchronizing SCSI cache
[  169.175667] sd 0:0:1:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  169.178606] zio pool=rpool_backup vdev=/dev/disk/by-id/wwn-0x5000c500f7dfd243-part1 error=5 type=1 offset=270336 size=8192 flags=1245889
[  169.180500] zio pool=rpool_backup vdev=/dev/disk/by-id/wwn-0x5000c500f7dfab6c-part1 error=5 type=2 offset=979257036800 size=4096 flags=3145856
[  169.180643] zio pool=rpool_backup vdev=/dev/disk/by-id/wwn-0x5000c500f7dfab6c-part1 error=5 type=2 offset=979257040896 size=4096 flags=3145856
[  169.181002] zio pool=rpool_backup vdev=/dev/disk/by-id/wwn-0x5000c500f7dfab6c-part1 error=5 type=2 offset=979257044992 size=16384 flags=3145856
[  169.202738] sd 0:0:2:0: [sde] Synchronizing SCSI cache
[  169.203060] sd 0:0:2:0: [sde] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  169.486502] zio pool=rpool_backup vdev=/dev/disk/by-id/wwn-0x5000c500f7dfab6c-part1 error=5 type=5 offset=0 size=0 flags=2098304
[  169.511310] zio pool=rpool_backup vdev=/dev/disk/by-id/wwn-0x5000c500f7dfab6c-part1 error=5 type=5 offset=0 size=0 flags=2098304
[  169.511762] zio pool=rpool_backup vdev=/dev/disk/by-id/wwn-0x5000c500f7dfab6c-part1 error=5 type=5 offset=0 size=0 flags=2098304
[  169.518723] zio pool=rpool_backup vdev=/dev/disk/by-id/wwn-0x5000c500f7dfab6c-part1 error=5 type=5 offset=0 size=0 flags=2098304
[  169.519099] WARNING: Pool 'rpool_backup' has encountered an uncorrectable I/O failure and has been suspended.
[  184.453677] hpsa 0000:04:00.0: Acknowledging event: 0x80000002 (HP SSD Smart Path configuration change)
[  184.500199] hpsa 0000:04:00.0: scsi 0:0:1:0: added Direct-Access     ATA      ST16000NM000J-2T PHYS DRV SSDSmartPathCap- En- Exp=1
[  184.500555] hpsa 0000:04:00.0: scsi 0:0:2:0: added Direct-Access     ATA      ST16000NM000J-2T PHYS DRV SSDSmartPathCap- En- Exp=1
[  184.500951] hpsa can't handle SMP requests
[  184.502608] scsi 0:0:7:0: Direct-Access     ATA      ST16000NM000J-2T SC02 PQ: 0 ANSI: 6
[  184.512175] sd 0:0:7:0: Attached scsi generic sg4 type 0
[  184.513040] sd 0:0:7:0: [sdj] 31251759104 512-byte logical blocks: (16.0 TB/14.6 TiB)
[  184.513411] sd 0:0:7:0: [sdj] 4096-byte physical blocks
[  184.514302] sd 0:0:7:0: [sdj] Write Protect is off
[  184.514642] hpsa can't handle SMP requests
[  184.514661] sd 0:0:7:0: [sdj] Mode Sense: 46 00 10 08
[  184.516095] sd 0:0:7:0: [sdj] Write cache: enabled, read cache: enabled, supports DPO and FUA
[  184.516249] scsi 0:0:8:0: Direct-Access     ATA      ST16000NM000J-2T SC02 PQ: 0 ANSI: 6
[  184.526392] sd 0:0:8:0: Attached scsi generic sg5 type 0
[  184.528485] sd 0:0:8:0: [sdk] 31251759104 512-byte logical blocks: (16.0 TB/14.6 TiB)
[  184.528855] sd 0:0:8:0: [sdk] 4096-byte physical blocks
[  184.529779] sd 0:0:8:0: [sdk] Write Protect is off
[  184.530180] sd 0:0:8:0: [sdk] Mode Sense: 46 00 10 08
[  184.531242] sd 0:0:8:0: [sdk] Write cache: enabled, read cache: enabled, supports DPO and FUA
[  184.578297]  sdj: sdj1 sdj9
[  184.580144] sd 0:0:7:0: [sdj] Attached SCSI disk
[  184.590418]  sdk: sdk1 sdk9
[  184.590983] sd 0:0:8:0: [sdk] Attached SCSI disk
[  235.468100] WARNING: Pool 'rpool_backup' was suspended and is being resumed. Failed I/O will be retried.
 
Last edited:
Hi,

It looks like the system recognizes the new disk, triggering a rescan of the SATA/SCSI bus and re-enumerates existing drives, which may cause device names (e.g., /dev/sdc → /dev/sdd) to shift. ZFS sees this as a device loss, suspends the pool to prevent data corruption.
This behavior is not specific, it can be a power or signal disturbance affecting multiple drives.
Also, it can be a controller or backplane quirk that resets or renegotiates connections when a new device is added.
Check power supply and cables, ensure that PSU can handle all drives simultaneously.
Try to add it without Hot-Plugging.