Hi everyone,
i have a kind of weird behaviour with a PBS based on a HP DL380 Gen10.
Since the upgrade to PBS 4 the system sporadically "spits out" one of 10 SAS disks.
Interestingly enough its a different disk every time and it's mostly just one.
These disks get marked as failed in the ZFS mirror device and replaced with the hot spare.
After a cold reboot of the server, the drive comes back, gets resilvered and continues to work fine.
Had multiple of these events over the past few weeks, always with a different disk, so i can rule out a hard drive failure there.
journalctl of one of the reset events
The controller is configured in HBA mode with Caches disabled, so it shouldn't interfere with the communication.
Tried replacing the smartpqi driver with the current one from the repository but no change.
Firmware of the P816 is current: 7.81
Storage config for reference:
The problem is also independent of kernel 6.14 and 6.17 and came up with both versions.
Does anyone have any similar experiences or might know something i could try to further troubleshoot this?
Thanks in advance!
EDIT: Forgot to state: The pool consists of 3,5" Seagate Exos X18 SAS 12TB drives
During the time of the reset there were backups running, so it might have something to do with a timeout in a way?
i have a kind of weird behaviour with a PBS based on a HP DL380 Gen10.
Since the upgrade to PBS 4 the system sporadically "spits out" one of 10 SAS disks.
Interestingly enough its a different disk every time and it's mostly just one.
These disks get marked as failed in the ZFS mirror device and replaced with the hot spare.
After a cold reboot of the server, the drive comes back, gets resilvered and continues to work fine.
Had multiple of these events over the past few weeks, always with a different disk, so i can rule out a hard drive failure there.
journalctl of one of the reset events
Code:
Jan 31 21:05:05 kg-pbs1 proxmox-backup-proxy[1963]: Upload backup log to datastore 'kg-pbs1-hdd', namespace 'KG' vm/115/2026-01-31T20:03:26Z/client.log.blob
Jan 31 21:05:33 kg-pbs1 kernel: sd 0:0:0:0: Power-on or device reset occurred
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: attempting TASK ABORT on scsi 0:0:0:0 for SCSI cmd at 00000000458a83ba
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: resetting scsi 0:0:0:0 SCSI cmd at 00000000458a83ba due to cmd opcode 0x8a
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: TASK ABORT on scsi 0:0:0:0 for SCSI cmd at 00000000458a83ba: SUCCESS
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: attempting TASK ABORT on scsi 0:0:0:0 for SCSI cmd at 000000002920b10f
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: TASK ABORT on scsi 0:0:0:0 for SCSI cmd at 000000002920b10f: SUCCESS
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: attempting TASK ABORT on scsi 0:0:0:0 for SCSI cmd at 0000000054c3f49a
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: scsi 0:0:0:0 for SCSI cmd at 0000000054c3f49a already completed
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: attempting TASK ABORT on scsi 0:0:0:0 for SCSI cmd at 00000000c18ee9f1
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: scsi 0:0:0:0 for SCSI cmd at 00000000c18ee9f1 already completed
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: attempting TASK ABORT on scsi 0:0:0:0 for SCSI cmd at 0000000009dbe4e6
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: scsi 0:0:0:0 for SCSI cmd at 0000000009dbe4e6 already completed
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: attempting TASK ABORT on scsi 0:0:0:0 for SCSI cmd at 00000000f64791f0
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: scsi 0:0:0:0 for SCSI cmd at 00000000f64791f0 already completed
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: attempting TASK ABORT on scsi 0:0:0:0 for SCSI cmd at 00000000390261f2
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: scsi 0:0:0:0 for SCSI cmd at 00000000390261f2 already completed
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: attempting TASK ABORT on scsi 0:0:0:0 for SCSI cmd at 00000000737cb003
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: scsi 0:0:0:0 for SCSI cmd at 00000000737cb003 already completed
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: attempting TASK ABORT on scsi 0:0:0:0 for SCSI cmd at 000000009eb54a10
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: scsi 0:0:0:0 for SCSI cmd at 000000009eb54a10 already completed
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: attempting TASK ABORT on scsi 0:0:0:0 for SCSI cmd at 000000001b6475ef
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: scsi 0:0:0:0 for SCSI cmd at 000000001b6475ef already completed
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: reset of scsi 0:0:0:0: SUCCESS
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: resetting scsi 0:0:0:0 SCSI cmd at 000000002920b10f due to cmd opcode 0x8a
Jan 31 21:05:52 kg-pbs1 kernel: smartpqi 0000:5c:00.0: reset of scsi 0:0:0:0: SUCCESS
Jan 31 21:05:52 kg-pbs1 kernel: sd 0:0:0:0: Power-on or device reset occurred
Jan 31 21:05:52 kg-pbs1 zed[881611]: eid=1093 class=delay pool='kg-pbs1-hdd' vdev=wwn-0x5000c500f2977fe3-part1 size=8192 offset=9528072564736 priority=3 err=0 flags=0x80100480 delay=30133ms
Jan 31 21:05:52 kg-pbs1 zed[881613]: eid=1091 class=delay pool='kg-pbs1-hdd' vdev=wwn-0x5000c500f2977fe3-part1 size=655360 offset=11682020642816 priority=3 err=0 flags=0x80100480 delay=30200ms
Jan 31 21:05:52 kg-pbs1 zed[881610]: eid=1092 class=delay pool='kg-pbs1-hdd' vdev=wwn-0x5000c500f2977fe3-part1 size=4096 offset=9528067448832 priority=3 err=0 flags=0x300080 delay=30133ms bookmark=54:391667:1:0
Jan 31 21:05:52 kg-pbs1 zed[881609]: eid=1089 class=delay pool='kg-pbs1-hdd' vdev=wwn-0x5000c500f2977fe3-part1 size=4096 offset=9528113053696 priority=3 err=0 flags=0x300080 delay=30111ms bookmark=54:391804:1:0
Jan 31 21:05:52 kg-pbs1 zed[881612]: eid=1090 class=delay pool='kg-pbs1-hdd' vdev=wwn-0x5000c500f2977fe3-part1 size=917504 offset=11682051182592 priority=3 err=0 flags=0x80100480 delay=30182ms
Jan 31 21:05:52 kg-pbs1 zed[881618]: eid=1094 class=delay pool='kg-pbs1-hdd' vdev=wwn-0x5000c500f2977fe3-part1 size=786432 offset=11682050396160 priority=3 err=0 flags=0x80100480 delay=30201ms
Jan 31 21:05:52 kg-pbs1 zed[881620]: eid=1095 class=delay pool='kg-pbs1-hdd' vdev=wwn-0x5000c500f2977fe3-part1 size=4096 offset=9528067436544 priority=3 err=0 flags=0x300080 delay=30135ms bookmark=54:391506:1:0
Jan 31 21:05:52 kg-pbs1 zed[881622]: eid=1096 class=delay pool='kg-pbs1-hdd' vdev=wwn-0x5000c500f2977fe3-part1 size=196608 offset=9519235899392 priority=3 err=0 flags=0x80100480 delay=30154ms
Jan 31 21:05:52 kg-pbs1 zed[881624]: eid=1097 class=delay pool='kg-pbs1-hdd' vdev=wwn-0x5000c500f2977fe3-part1 size=98304 offset=9528109023232 priority=3 err=0 flags=0x80100480 delay=30129ms
Jan 31 21:05:52 kg-pbs1 zed[881626]: eid=1098 class=delay pool='kg-pbs1-hdd' vdev=wwn-0x5000c500f2977fe3-part1 size=1048576 offset=11682019594240 priority=3 err=0 flags=0x80100480 delay=30204ms
The controller is configured in HBA mode with Caches disabled, so it shouldn't interfere with the communication.
Tried replacing the smartpqi driver with the current one from the repository but no change.
Code:
root@kg-pbs1:~# modinfo smartpqi | grep version
version: 2.1.38-022
description: Driver for Microchip Smart Family Controller version 2.1.38-022 (d-6f8997e/s-e7f7d7c)
srcversion: E6792B179A3DF1290C1B99B
vermagic: 6.17.4-2-pve SMP preempt mod_unload modversions
Firmware of the P816 is current: 7.81
Code:
root@kg-pbs1:~# proxmox-backup-manager versions --verbose
proxmox-backup 4.0.0 running kernel: 6.17.4-2-pve
proxmox-backup-server 4.1.1-1 running version: 4.1.1
proxmox-kernel-helper 9.0.4
proxmox-kernel-6.17.4-2-pve-signed 6.17.4-2
proxmox-kernel-6.17 6.17.4-2
proxmox-kernel-6.17.4-1-pve-signed 6.17.4-1
proxmox-kernel-6.14.11-5-pve-signed 6.14.11-5
proxmox-kernel-6.14 6.14.11-5
proxmox-kernel-6.14.11-4-pve-signed 6.14.11-4
proxmox-kernel-6.8 6.8.12-17
proxmox-kernel-6.8.12-17-pve-signed 6.8.12-17
proxmox-kernel-6.8.4-2-pve-signed 6.8.4-2
ifupdown2 3.3.0-1+pmx11
libjs-extjs 7.0.0-5
proxmox-backup-docs 4.1.1-1
proxmox-backup-client 4.1.1-1
proxmox-mail-forward 1.0.2
proxmox-mini-journalreader 1.6
proxmox-offline-mirror-helper 0.7.3
proxmox-widget-toolkit 5.1.5
pve-xtermjs 5.5.0-3
smartmontools 7.4-pve1
zfsutils-linux 2.3.4-pve1
Storage config for reference:
Code:
root@kg-pbs1:~# zpool status
pool: kg-pbs1-hdd
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Feb 1 14:36:13 2026
3.27T / 42.7T scanned, 35.9G / 40.5T issued at 57.4M/s
36.2G resilvered, 0.09% done, 8 days 13:21:48 to go
config:
NAME STATE READ WRITE CKSUM
kg-pbs1-hdd ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wwn-0x5000c500f2975c23 ONLINE 0 0 0
spare-1 ONLINE 0 0 0
wwn-0x5000c500f2977fe3 ONLINE 0 0 0
wwn-0x5000c500f29485fb ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
wwn-0x5000c500ec5aa9ef ONLINE 0 0 1 (resilvering)
wwn-0x5000c500f2961acb ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
wwn-0x5000039b4851bc59 ONLINE 0 0 0
wwn-0x5000c500f2962167 ONLINE 0 0 2 (resilvering)
mirror-3 ONLINE 0 0 0
wwn-0x5000c500f2975bcb ONLINE 0 0 0
wwn-0x5000c500f296381b ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
wwn-0x5000c500f294cb23 ONLINE 0 0 0
wwn-0x5000c500f29650c3 ONLINE 0 0 1 (resilvering)
spares
wwn-0x5000c500f29485fb INUSE currently in use
errors: No known data errors
The problem is also independent of kernel 6.14 and 6.17 and came up with both versions.
Does anyone have any similar experiences or might know something i could try to further troubleshoot this?
Thanks in advance!
EDIT: Forgot to state: The pool consists of 3,5" Seagate Exos X18 SAS 12TB drives
During the time of the reset there were backups running, so it might have something to do with a timeout in a way?
Last edited: