Bash Deadlock after SCSI rescan with 9.1

Oct 30, 2025
9
0
1
Hi,

we are currently migrating to Proxmox, slowly adding more Nodes (HPE DL380 G10) to our cluster. The current nodes are on 9.0.10 and the new nodes which we are just adding now are on 9.1.1. (we are aware that we have to update all nodes in the cluster to the same version).
Now on those new 9.1.1 nodes in the cluster, we are having trouble when adding a new LUN over fibre channel. In our process we have to trigger a Bus rescan, since the newly presented disks don't show up without doing that. We trigger the rescan by running:
Bash:
for host in /sys/class/scsi_host/host*/scan; do echo "- - -" > $host; done
This works on the 9.0.10 nodes without any problem, but with 9.1.1 the command never finishes, locking up the bash. This is only resolved by rebooting the node.
Now if we run "dmesg | grep lun" we get the following information:
Bash:
[ 5677.064642] sd 2:1:0:0: lun4194304 has a LUN larger than allowed by the host adapter
Running lsscsi we find that this ID belongs to the HPE p408i-a adapter which holds our local RAID1. It does not belong to the HPE SN1100Q adapter which we use to access the fibre channel LUN.
Bash:
[2:0:0:0]    enclosu HPE      Smart Adapter    6.22  -       
[2:1:0:0]    disk    HPE      LOGICAL VOLUME   6.22  /dev/sda
[2:2:0:0]    storage HPE      P408i-a SR Gen10 6.22  -
This means we can work around this issue by running a rescan which excludes this adapter (host2).
Bash:
for host in $(ls /sys/class/scsi_host/ | grep -v host2); do echo "- - -" > /sys/class/scsi_host/$host/scan; done
Running the rescan without this adapter finishes without any problems.

We have found a similar problem here. https://forum.proxmox.com/threads/i...roblem-with-qlogic-fiber-channel-cards.78797/
But the P408i-a runs on the smartpqi module, which can't make use of the ql2xmaxlun option. According to the manpage it does not have such an option.
Now we are wondering how we can solve this to not accidentally run into a bash lockup.
thanks!

PS: All nodes have the same hardware and as far as we can tell the same configuration.
PPS: On the nodes running 9.0.10 this LUN also has a weirdly high ID, without causing issues:
Bash:
root@node:~# cat /sys/class/scsi_disk/2\:1\:0\:0/device/lunid
0x0000004000000000
 
Last edited:
which kernel are you running on the 9.0 nodes, and which on the 9.1 nodes?
 
could you try (installing and) booting the 6.14 kernel on the 9.1 machine?
 
  • Like
Reactions: Kingneutron
I’ve asked one of the developers to weigh in, and our assessment aligns with @fabian’s recommendation to try a different kernel.

The symptoms point to a potential regression in the newer kernel, most likely within the vendor-provided driver for this specific FC card.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Kingneutron