5.3.18-3-pve multipath issue

speedlnx

Active Member
Feb 7, 2016
36
1
28
41
Hello,
since the last upgrade of my servers with the last kernel 5.3.18-3-pve multipath stop works. Everything works great on 5.3.10-1-pve and I can manage my cluster storage with lvm on my san. Is there any change with the last kernel? Is there any option I have to add to makes it works?

When booting with 5.3.18-3-pve multipath -ll give me blank output and I've this error during boot:

device-mapper: table: 253:9: multipath: error getting device

and during the working kernel boot I've:

[ 17.617230] scsi 17:0:0:7: Attached scsi generic sg10 type 0
[ 17.617233] scsi 17:0:0:7: Embedded Enclosure Device
[ 17.617604] scsi 17:0:0:7: Power-on or device reset occurred
[ 17.617617] scsi 17:0:0:7: Failed to get diagnostic page 0x1
[ 17.617671] scsi 17:0:0:7: Failed to bind enclosure -19

but everything works out anyway.

My configuration is:

2x Lenovo server with QLogic Corp. ISP2722-based 16/32Gb
1x Lenovo SAN DE2000H

pve-manager/6.1-8/806edfe1 (running kernel: 5.3.10-1-pve) Working kernel!!!

multipath.conf

defaults {
find_multipaths no
polling_interval 2
path_selector "round-robin 0"
path_grouping_policy multibus
uid_attribute ID_SERIAL
rr_min_io 100
failback immediate
no_path_retry queue
user_friendly_names yes
}


blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z][[0-9]*]"
devnode "^sda[[0-9]*]"
devnode "^cciss!c[0-9]d[0-9]*"
}

multipaths {
multipath {
wwid 36d039ea00006a630000001855ea5d2ff
alias mpath1
}
multipath {
wwid 36d039ea00006a2480000014d5ea1c997
alias mpath2
}
multipath {
wwid 36d039ea00006a2480000018e5ea610f2
alias mpath3
}
}
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
6,176
862
148
You could try installing the pve-kernel-5.4 meta package and see if it works with this kernel
 

speedlnx

Active Member
Feb 7, 2016
36
1
28
41
It's a production server... do you think is it safe to play with the 5.4?
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
6,176
862
148
depends on your workload and your hardware I guess - it's been released since almost 2 months now and we haven't heard of any serious regressions compared to 5.3 (and it will become the new default kernel within the next weeks)
maybe you can try it during one of the next upgrade windows.
 

speedlnx

Active Member
Feb 7, 2016
36
1
28
41
It doesn't work

[Fri May 8 00:05:32 2020] device-mapper: multipath round-robin: version 1.2.0 loaded
[Fri May 8 00:05:32 2020] device-mapper: table: 253:5: multipath: error getting device
[Fri May 8 00:05:32 2020] device-mapper: ioctl: error adding target to table
 

speedlnx

Active Member
Feb 7, 2016
36
1
28
41
Still same errore on the last version (5.4.78-2-pve). Any news for my FC controller?
 
Last edited:

kofik

Active Member
Aug 5, 2011
58
8
28
Not news for your specific issues, however this FC array is an OEM-ed NetApp appliance sold under the umbrella of Lenovo from what I can tell. It could be helpful to look for similar error message while looking for Netapp appliances. However: Most FC arrays I've worked with in the past needed some tuning based on vendor recommendation. I've made some experiences back with IBM V7000 storage systems where adapting some settings to in multipathing area to get it working reliably.

For example some ONTAP-based Lenovo appliance recommend configuring blacklisting correctly:
I.e. https://thinksystem.lenovofiles.com...stem_storage_de_himg_11.60.2/IC_pdf_file.html
Also by looking at the error message I see similar KB articles popping up on Red Hat with similar recommendations. (https://access.redhat.com/solutions/38538)

Usually these kind of storage systems are only available under support contract so it might be worth a shot bumping the storage vendor to.
 

speedlnx

Active Member
Feb 7, 2016
36
1
28
41
Thank you for your response. As you think it's a Lenovo card connected to a Lenovo DS storage. I think that I can't have support from the vendor cause Debian/Proxmox it's not a certified OS for this kind of hardware. Anyway I will try to open a ticket. I will look at your links in the meanwhile...
 

kofik

Active Member
Aug 5, 2011
58
8
28
I should add that there is an article not directly linkable to Lenovo (previous post) called "Configuring DM-Multipath". It is quite likely your boot drive is internal and thus not multipathed, try to identify the drives the host sees and then blacklist all devices not located on the Lenovo SAN. It might also be worth to test "find_multipaths yes" instead of no. But please do read manpages and test this on a node that has not running VMs on at first.

All in all: I unfortunately know this situation very well, where storage vendors officially only support RHEL, SUSE, VMware and Windows. SUSE might be worth a shot since the SLES 15 SP2 ships with kernel 5.3. Usually if you can "show proof" the same issue appears on a supported distribution, they usually should help you. But I think the multipathing blacklisting might be worth a shot.
 

speedlnx

Active Member
Feb 7, 2016
36
1
28
41
Resolved by myself.

I download http://filedownloads.cavium.com/Files/TempDownlods/98262/qla2xxx-src-v10.02.04.00-k.tar.gz

installed kernel headers e build essential

I build and install modules then I removed multipath

Code:
apt purge multipath-tools

rm -fr /etc/multipath*

apt purge multipath-tools-boot

I blacklisted in lvm.conf everything except my local drive (sda*) and reboot

after reboot I installed again multipath and configured it again

Code:
apt install multipath-tools

multipath -v2 -ll

update-initramfs -u -k all

update-grub2

systemctl mask systemd-udev-settle.service

reboot
 

kofik

Active Member
Aug 5, 2011
58
8
28
Could you elaborate as to why this lead to the solution? It sounds interesting - at first sight - as to why downloading the driver source from QLogic fixed your issue. If there is a source for your findings, that could be helpful for someone else in the future.

Sidenote: QLogic was bought by Cavium in 2016, which itself got bought by Marvell in 2018, that is why the download goes to Cavium...
 

speedlnx

Active Member
Feb 7, 2016
36
1
28
41
I didn't find much on the net, I have to use just my head ;)
Source modules do the tricks. I think that the source used in kernel newer then 5.3.10 is different from the previous. Can't say if true, but with the compiled ones (downloaded from Cavium) my controller start works again. With the original module (the one shipped with pie kernel) I can't see any disks from the disk storage.. I don't have the time to investigate more looking at the kernel source, I just have to remember to recompile modules at every kernel upgrade if it doesn't work out of the box!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!