System lost disks storage

Jan 11, 2018
3
0
6
46
Update Virtual Environment 5.4-13
From:
Linux 4.15.18-21-pve #1 SMP PVE 4.15.18-48 (Fri, 20 Sep 2019 11:28:30 +0200)

[ 2.824301] scsi host0: qla2xxx
[ 3.496274] scsi host2: qla2xxx
[ 3.840218] scsi: waiting for bus probes to complete ...
[ 4.624673] scsi 0:0:0:0: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.625230] sd 0:0:0:0: Attached scsi generic sg2 type 0
[ 4.625750] scsi 0:0:0:1: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.626250] sd 0:0:0:1: Attached scsi generic sg3 type 0
[ 4.626558] scsi 0:0:0:2: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.627108] sd 0:0:0:2: Attached scsi generic sg4 type 0
[ 4.627368] scsi 0:0:0:3: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.627952] sd 0:0:0:3: Attached scsi generic sg5 type 0
[ 4.628336] scsi 0:0:0:4: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.629215] sd 0:0:0:4: Attached scsi generic sg6 type 0
[ 4.629473] scsi 0:0:0:5: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.630029] sd 0:0:0:5: Attached scsi generic sg7 type 0
[ 4.630385] scsi 0:0:0:6: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.630948] sd 0:0:0:6: Attached scsi generic sg8 type 0
[ 4.631310] scsi 0:0:0:7: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.631846] sd 0:0:0:7: Attached scsi generic sg9 type 0
[ 4.632139] scsi 0:0:0:8: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.632823] sd 0:0:0:8: Attached scsi generic sg10 type 0
[ 4.633171] scsi 0:0:0:9: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.633720] sd 0:0:0:9: Attached scsi generic sg11 type 0
[ 4.634337] scsi 0:0:1:0: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.634848] sd 0:0:1:0: Attached scsi generic sg12 type 0
[ 4.718498] iscsi: registered transport (tcp)
[ 4.735724] iscsi: registered transport (iser)


To:
Linux 4.15.18-26-pve #1 SMP PVE 4.15.18-54 (Sat, 15 Feb 2020 15:34:24 +0100)

[ 2.848291] scsi host0: qla2xxx
[ 3.520276] scsi host2: qla2xxx
[ 4.719000] iscsi: registered transport (tcp)
[ 4.738193] iscsi: registered transport (iser



Result
System lost disks storage


Boot server back with Linux 4.15.18-21-pve fixed problem
 
Last edited:
Sometimes a kernel just breaks things.
I also already have had issues my old Opteron rig not booting correctly after an update.
After the next kernel version has been published, the issue went away. So I would try to wait for the next patch and see if that solves things.
 
Same thing starting from kernel 4.15.18-24-pve to 4.15.18-26-pve.
4.15.18-23-pve last version that works fine.
 
Is it possible that the system tries to access the wrong controller of the MSA?
These storages are semi-active-active or even active-passive AFAIK.
Check the storage for message about "path thrashing" or similar.
 
There are no warning/error messages on any of the storages.

paths with kernel 4.15.18-23-pve, higher kernels doesn't detect any FC disks at all:

# multipath -ll
mpathd (3600c0ff0001e3913d50a6b5a01000000) dm-3 HP,MSA 2040 SAN
size=8.2T features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=30 status=active
|- 0:0:0:17 sdc 8:32 active ready running
`- 2:0:0:17 sdf 8:80 active ready running
mpathc (3600c0ff0001e3bb73fd1335a01000000) dm-2 HP,MSA 2040 SAN
size=466G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=30 status=active
|- 0:0:0:16 sdb 8:16 active ready running
`- 2:0:0:16 sde 8:64 active ready running
mpathb (3600c0ff000149892f1600e5601000000) dm-4 HP,P2000G3 FC/iSCSI
size=1.8T features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=30 status=active
|- 0:0:1:18 sdd 8:48 active ready running
`- 2:0:1:18 sdg 8:96 active ready running
 
Unfortunately, this is a known bug. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1861147

PVE 5.x will be EOL in July so you might want to consider upgrading to PVE 6 anyways.

In the meantime if you want to fix the Grub boot option to the -23 kernel you can do the following:

To get an overview of the boot options run the following one line that I admittely found on Stackoverflow ;)
Code:
awk -F\' '$1=="menuentry " || $1=="submenu " {print i++ " : " $2}; /\tmenuentry / {print "\t" i-1">"j++ " : " $2};' /boot/grub/grub.cfg

In the file /etc/default/grub look for the GRUB_DEFAULT option. Set it to the ID which correlates with the -23 kernel from the previous output.

For example
Code:
GRUB_DEFAULT="1>1"

Then run
Code:
update-grub
and the working -23 kernel should be booted the next time.