System lost disks storage

Jan 11, 2018
3
0
6
46
Update Virtual Environment 5.4-13
From:
Linux 4.15.18-21-pve #1 SMP PVE 4.15.18-48 (Fri, 20 Sep 2019 11:28:30 +0200)

[ 2.824301] scsi host0: qla2xxx
[ 3.496274] scsi host2: qla2xxx
[ 3.840218] scsi: waiting for bus probes to complete ...
[ 4.624673] scsi 0:0:0:0: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.625230] sd 0:0:0:0: Attached scsi generic sg2 type 0
[ 4.625750] scsi 0:0:0:1: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.626250] sd 0:0:0:1: Attached scsi generic sg3 type 0
[ 4.626558] scsi 0:0:0:2: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.627108] sd 0:0:0:2: Attached scsi generic sg4 type 0
[ 4.627368] scsi 0:0:0:3: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.627952] sd 0:0:0:3: Attached scsi generic sg5 type 0
[ 4.628336] scsi 0:0:0:4: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.629215] sd 0:0:0:4: Attached scsi generic sg6 type 0
[ 4.629473] scsi 0:0:0:5: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.630029] sd 0:0:0:5: Attached scsi generic sg7 type 0
[ 4.630385] scsi 0:0:0:6: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.630948] sd 0:0:0:6: Attached scsi generic sg8 type 0
[ 4.631310] scsi 0:0:0:7: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.631846] sd 0:0:0:7: Attached scsi generic sg9 type 0
[ 4.632139] scsi 0:0:0:8: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.632823] sd 0:0:0:8: Attached scsi generic sg10 type 0
[ 4.633171] scsi 0:0:0:9: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.633720] sd 0:0:0:9: Attached scsi generic sg11 type 0
[ 4.634337] scsi 0:0:1:0: Direct-Access HPE MSA 2050 SAN V270 PQ: 0 ANSI: 6
[ 4.634848] sd 0:0:1:0: Attached scsi generic sg12 type 0
[ 4.718498] iscsi: registered transport (tcp)
[ 4.735724] iscsi: registered transport (iser)


To:
Linux 4.15.18-26-pve #1 SMP PVE 4.15.18-54 (Sat, 15 Feb 2020 15:34:24 +0100)

[ 2.848291] scsi host0: qla2xxx
[ 3.520276] scsi host2: qla2xxx
[ 4.719000] iscsi: registered transport (tcp)
[ 4.738193] iscsi: registered transport (iser



Result
System lost disks storage


Boot server back with Linux 4.15.18-21-pve fixed problem
 
Last edited:
Sometimes a kernel just breaks things.
I also already have had issues my old Opteron rig not booting correctly after an update.
After the next kernel version has been published, the issue went away. So I would try to wait for the next patch and see if that solves things.
 
Same thing starting from kernel 4.15.18-24-pve to 4.15.18-26-pve.
4.15.18-23-pve last version that works fine.
 
Is it possible that the system tries to access the wrong controller of the MSA?
These storages are semi-active-active or even active-passive AFAIK.
Check the storage for message about "path thrashing" or similar.
 
There are no warning/error messages on any of the storages.

paths with kernel 4.15.18-23-pve, higher kernels doesn't detect any FC disks at all:

# multipath -ll
mpathd (3600c0ff0001e3913d50a6b5a01000000) dm-3 HP,MSA 2040 SAN
size=8.2T features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=30 status=active
|- 0:0:0:17 sdc 8:32 active ready running
`- 2:0:0:17 sdf 8:80 active ready running
mpathc (3600c0ff0001e3bb73fd1335a01000000) dm-2 HP,MSA 2040 SAN
size=466G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=30 status=active
|- 0:0:0:16 sdb 8:16 active ready running
`- 2:0:0:16 sde 8:64 active ready running
mpathb (3600c0ff000149892f1600e5601000000) dm-4 HP,P2000G3 FC/iSCSI
size=1.8T features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=30 status=active
|- 0:0:1:18 sdd 8:48 active ready running
`- 2:0:1:18 sdg 8:96 active ready running
 
Unfortunately, this is a known bug. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1861147

PVE 5.x will be EOL in July so you might want to consider upgrading to PVE 6 anyways.

In the meantime if you want to fix the Grub boot option to the -23 kernel you can do the following:

To get an overview of the boot options run the following one line that I admittely found on Stackoverflow ;)
Code:
awk -F\' '$1=="menuentry " || $1=="submenu " {print i++ " : " $2}; /\tmenuentry / {print "\t" i-1">"j++ " : " $2};' /boot/grub/grub.cfg

In the file /etc/default/grub look for the GRUB_DEFAULT option. Set it to the ID which correlates with the -23 kernel from the previous output.

For example
Code:
GRUB_DEFAULT="1>1"

Then run
Code:
update-grub
and the working -23 kernel should be booted the next time.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!