[SOLVED] Multipath with IBM Storage bug Kernel 5.x - blk_cloned_rq_check_limits: over max size limit

albindy

New Member
Sep 20, 2019
11
1
1
41
English:
Hi all!

I'm working on this since 2 months and could narrow it down a bit today .
The setup is
HP ML350 G10 and an IBM 1746 Storage connected via 8GBit FC.
The HP contains 2x Qlogic QLE2562 HBAs.
To the Problem:
When running on Proxmox 6 on the software level of the downloaded ISO or totally upgraded, both the multipath active and ghost are failing at the same time when moving one lvm from one LUN to the other.
When running on Proxmox 5 on the software level of the downloaded ISO the Multipath is working without problems and with the expected performance.
When running on Proxmox 5 on the totally upgraded patchlevel the same problems like on the Proxmox 6 installation show up.
I upgraded the pakets from the Iso level to the actual patchlevel manually and found the following 3 pakages as the cause:
pve-qemu-kvm libpve-common-perl qemu-server.
after step by step updates following a reboot each time i narrowed it dow. I could update pve-qemu-kvm and libpve-common-perl up to the actual patchlevel - libpve-common-perl 5.0-54 and pve-qemu-kvm 3.0.1-4 .
The Problem occurs as soon as i update qemu-server to 5.0-54 and disappears when i downgrade to 5.0.53.
multipath.conf was configured from minimal to advanced followed by detailed testing. Actually I'm using a minimal config of multipath.conf.
Besides I tested a ML350 G8 with a second identic storage and with 4Gbit HBAs and a FC switch.
Behaviour in this config and crossconfigs of the hardware stays the same.

because I want to work on buster and do updates and upgrades I'm thankful for any ideas and advices.
Thanks!
Alex

German:
Hallo zusammen

Ich beschäftige mich jetzt schon seit 2 Monaten mit dem Problem und heute konnte ichs eingrenzen.
Setup ist wie folgend
HP ML350 G10 und eine IBM 1746 Storage verbunden mit 8GBit FC.
Im HP sind 2x Qlogic QLE2562 HBAs verbaut.
Nun zum Problem:
Beim Betrieb mit Proxmox 6 auf level des heruntergeladenem ISO oder auch komplett aktualisiert failed sowohl activ als auch ghost beim verschieben einer lvm von einer LUN auf die andere.
Beim Betrieb mit Proxmox 5 auf level des heruntergeladenem ISO funktioniert der Multipath einwandfrei und mit zu erwartender Performance.
Beim Betrieb mit Proxmox 5 auf aktuellem Pathlevel zeigt sich das gleiche Problem wie auf Proxmox 6.
Ich habe nun die aktualisierung der Pakete vom Iso level auf das aktuelle Patchlevel von Hand vorgenommen und hat als Verursacher die 3 Pakete
pve-qemu-kvm libpve-common-perl qemu-server herausfiltern können.
nach schrittweisen Updates gefolgt von jeweiligem Serverreboot hab ich es nun soweit eingegrenzt das ich pve-qemu-kvm und libpve-common-perl bis zum aktuellen Patchlevel hochheben kann - libpve-common-perl 5.0-54 und pve-qemu-kvm 3.0.1-4 .
Das Problem tritt auf sobald ich das Paket qemu-server von 5.0-53 auf 5.0-54 hochhebe und verschwindet wieder wenn ich auf 5.0.53 downgrade.
multipath.conf wurde von minimal bis sehr ausführlich getestet. Bei den aktuellen Tests Minimalkonfiguration.
Getestet wurde weiters mit einem ML350 G8 einer 2. identen Storage und mit 4Gbit HBAs und einem FC Switch.
Verhalten bleibt in dieser Konfiguration und auch in Quertests der Hardware gleich.

Da ich den Server gerne auf Buster betreiben will und nat. auch Updates machen will bitte ich euch daher um Ratschläge.
Danke!
lG
Alex
 
Last edited:
hi, what exactly do you mean with
when moving one lvm from one LUN to the other.
a 'move disk' operation? a clone?

as for
The Problem occurs as soon as i update qemu-server to 5.0-54 and disappears when i downgrade to 5.0.53.

this does not really make sense, since the only change between those versions was a different behaviour for systemd scopes at the vm start (so nothing else changed)
 
Hi!
Thanks for the reply.
I tested with both operations, both fail but in the further testing I just tested the move disk operations.

It really doesn't make sense
That s why i wrote the Post.

The error Messages on the PVE 5, which is now up on the latest updates, are as following and it doesn't happen everytime that the pathes fail.

[ 415.865636] print_req_error: critical target error, dev dm-3, sector 142608384
[ 415.865750] print_req_error: critical target error, dev dm-3, sector 142673919
[ 415.865822] print_req_error: critical target error, dev dm-3, sector 142739454
[ 415.865906] print_req_error: critical target error, dev dm-3, sector 142804989
[ 415.866014] print_req_error: critical target error, dev dm-3, sector 142870524
[ 415.866148] print_req_error: critical target error, dev dm-3, sector 142936059
[ 415.866226] print_req_error: critical target error, dev dm-3, sector 143001594
[ 415.866304] print_req_error: critical target error, dev dm-3, sector 143067129
[ 415.866420] print_req_error: critical target error, dev dm-3, sector 143132664
[ 415.866575] print_req_error: critical target error, dev dm-3, sector 143198199

But when upgrading to PVE 6 and also full-upgrade to the latest
it fails on every try and the Errors are as following when moving from mpath3500_1.2TB to mpath3500_800GB
So the destination target fails in this try.

mpath3500_1.2TB (360080e50002d7a220000187a5d7f26b5) dm-6 IBM,1746 FAStT
size=1.2T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='service-time 0' prio=6 status=active
| `- 5:0:0:1 sdf 8:80 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
`- 3:0:0:1 sdd 8:48 active ghost running
mpath3500_800GB (360080e50001b9478000014225d7f1e87) dm-5 IBM,1746 FAStT
size=817G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='service-time 0' prio=6 status=enabled
| `- 3:0:0:0 sdc 8:32 failed ready running
`-+- policy='service-time 0' prio=1 status=enabled
`- 5:0:0:0 sde 8:64 failed ghost running

[ 1146.219994] blk_cloned_rq_check_limits: over max size limit.
[ 1146.220111] device-mapper: multipath: Failing path 8:32.
[ 1146.220169] device-mapper: multipath: Reinstating path 8:64.
[ 1146.220511] sd 5:0:0:0: rdac: array , ctlr 1, queueing MODE_SELECT command
[ 1146.220703] sd 5:0:0:0: rdac: array , ctlr 1, MODE_SELECT returned with sense 05/24/00
[ 1146.220705] device-mapper: multipath: Failing path 8:64.

INFO: task qemu-img:4052 blocked for more than 120 seconds.
[ 1330.619290] Tainted: P O 5.0.21-2-pve #1
[ 1330.619410] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

I was thinking about the module and the firmware and booted the old kernel (4.15.18-20-pve) on a not full-upgraded PVE 6 seems to "just" throw the critical target error messages but doesn't cause the pathes to fail and the move Disk operation finishes.

thank you!
KR Alex
 
Last edited:
Hi Dominik!

I tested some things based on your suggestion that it can't be qemu-server.
And you are totally right. It seems to be the Kernel. But i have to add some things.

When starting from PVE 5 and configure Multipath with PVE 5 and Upgrade with a fully working setup to PVE 6 up to the latest Updates the multipath works with the 4.15.18-20-pve kernel. But fails with 5.0.21-2-pve. Just to test I configured single path and had similar Problems which seemed to disappear when setting one LUN prefered to first HBA and the second LUN to second HBA. But thats not my goal.

When starting right away from PVE 6 i don't even have IDs in /etc/multipath/wwids and multipath -ll reports nothing. After adding the files (wwids, multipath.conf) from PVE 5 and and installing/booting the 4.15.18-20-pve kernel it is working too.

No success with 5.0.21-2-pve even when copying the /lib/firmware qlogic related firmware files from pve 5.

So seems to be the kernel module for the Qlogic ISP2532-based 8Gb Fibre Channel HBAs (QLE2562).
But I didn't check the changelog if there was a change in this module from 4.15 to 5.0.

Do you have further suggestions?

KR
Alex
 
Hi!

Thanks for the suggestion. Works like suggested. With the find_multipaths option WWIDS are found.
So starting off directly from PVE 6 is no problem any more.

But no change on the Kernel topic.
PVE 6 with 4.15.18-20-pve works and both active and ghost pathes work like expected.
PVE 6 with 5.0.21-2-pve causes the target pathes to fail on active and ghost immediately.

mpath3500_1.2TB (360080e50002d7a220000187a5d7f26b5) dm-6 IBM,1746 FAStT
size=1.2T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='service-time 0' prio=6 status=active
| `- 5:0:0:1 sdf 8:80 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
`- 3:0:0:1 sdd 8:48 active ghost running
mpath3500_800GB (360080e50001b9478000014225d7f1e87) dm-5 IBM,1746 FAStT
size=817G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='service-time 0' prio=6 status=enabled
| `- 3:0:0:0 sdc 8:32 failed ready running
`-+- policy='service-time 0' prio=1 status=enabled
`- 5:0:0:0 sde 8:64 failed ghost running

Sorry but I'm still stuck on this point because not changing from kernel 4 to 5 is only a temporary solution.
Maybe the
blk_cloned_rq_check_limits: over max size limit
message which is logged while the paths are failing leads somewhere.
I found a tech note:
Technote: IBM Spectrum Scale support: 'blk_cloned_rq_check_limits: over max size limit' errors
Problem(Abstract)
If you are using a GPFS block size larger than 512 Kbytes, you may encounter 'blk_cloned_rq_check_limits: over max size limit' errors
which lead to dm-multipath path failure. GPFS then will not be able to access the underlying block device.

Solution:
If the GPFS file system block size is 4MB, and the storage is from IBM, you need to create a new rule in /etc/udev/rules.d/54-custom.rules with the following content:
ACTION=="add|change", SUBSYSTEM=="block", ATTRS{vendor}=="IBM*", RUN+="/bin/sh -c '/bin/echo 4096 > /sys/block/%k/queue/max_sectors_kb'"

But I have no idea which values would be correct for the pve (lvm) configs.
Tried 512 1024 2048 4096.

Strange thing just noticed:
Base situation PVE6 booted with kernel 5.0.21-2-pve.
When i try to move a harddisk from lun1 to lun2 all pathes of lun2 are failing.
Then I restart the multipathd and the errors in the syslog stop and the process in the PVE Gui which is stuck at 0% finishes to 100% without errors.
When i try to move the harddisk back from lun2 to lun1 all pathes of lun1 are failing.
When i restart the multipathd again this process finished without errors too.
After that i can move the harddisk of every VM from whereever to whereever without problems.
When i reboot the pve I'm back in the base situation and i have to let every lun fail and restart multipathd to get it working again.

i dumped multipath -v3 before and after the multipath restarts and there was a change in the status line which can't be defined by me maybe you have an idea:

diff /tmp/multi1 /tmp/multi2
157c157
< status = 2 0 1 0 2 1 A 0 1 2 8:80 A 0 0 1 E 0 1 2 8:48 A 0 0 1
---
> status = 2 0 1 0 2 1 A 0 1 2 8:80 A 0 18350080 1 E 0 1 2 8:48 A 0 0 1
159c159
< mpath3500_1.2TB: disassemble status [2 0 1 0 2 1 A 0 1 2 8:80 A 0 0 1 E 0 1 2 8:48 A 0 0 1 ]
---
> mpath3500_1.2TB: disassemble status [2 0 1 0 2 1 A 0 1 2 8:80 A 0 18350080 1 E 0 1 2 8:48 A 0 0 1 ]
161c161
< status = 2 0 1 0 2 1 A 0 1 2 8:32 A 0 30408704 1 E 0 1 2 8:64 A 0 0 1
---
> status = 2 0 1 0 2 1 A 0 1 2 8:32 A 0 0 1 E 0 1 2 8:64 A 0 0 1
163c163
< mpath3500_800GB: disassemble status [2 0 1 0 2 1 A 0 1 2 8:32 A 0 30408704 1 E 0 1 2 8:64 A 0 0 1 ]
---
> mpath3500_800GB: disassemble status [2 0 1 0 2 1 A 0 1 2 8:32 A 0 0 1 E 0 1 2 8:64 A 0 0 1 ]

I also dumped the sector sizes but they stay the same:
Sys Block Node : Device max_sectors_kb max_hw_sectors_kb
/sys/block/sdc : IBM 1746 FAStT 512 32767
/sys/block/sdd : IBM 1746 FAStT 512 32767
/sys/block/sde : IBM 1746 FAStT 512 32767
/sys/block/sdf : IBM 1746 FAStT 512 32767

dmesg, messages and syslog do not give any hints about the change before and after letting the pathes fail and restart the multipahtd but there has to be some change because it is working afterwards.
/var/log/messages:
Moving started...
Sep 22 02:04:58 pve-citycom kernel: [ 443.523320] device-mapper: multipath: Reinstating path 8:48.
Sep 22 02:04:58 pve-citycom kernel: [ 443.523707] sd 3:0:0:1: rdac: array , ctlr 0, queueing MODE_SELECT command
Sep 22 02:04:58 pve-citycom kernel: [ 443.523800] device-mapper: multipath: Reinstating path 8:80.
Sep 22 02:04:58 pve-citycom kernel: [ 443.523904] sd 3:0:0:1: rdac: array , ctlr 0, MODE_SELECT returned with sense 05/24/00
Sep 22 02:04:58 pve-citycom kernel: [ 443.523905] device-mapper: multipath: Failing path 8:48.
Sep 22 02:04:58 pve-citycom kernel: [ 443.524842] device-mapper: multipath: Failing path 8:80.
Sep 22 02:05:01 pve-citycom kernel: [ 446.081697] print_req_error: 54 callbacks suppressed
Moving stopped and restart of multipathd....
Sep 22 02:05:01 pve-citycom kernel: [ 446.241803] device-mapper: ioctl: error adding target to table
now there are no errors any more and the new move process finishes.

Thanks by the way for even writing on the Weekend. I didn't expect an answer.

KR
Alex
 
Last edited:
Hi!

Yes, found that info too, thats why i added the udev rule i mentioned in the posting before, but as posted before, I can't figure out the size that works and it has no affect. The storage by the way is a IBM DS3500 series and doesn't use Spectrum scale.

The strange thing is that after the path fails and restarting multipath the failed path works. If it is related to the storage, it doesn't change on fail and restart of the daemon.
The only thing i noticed is that the block Device is reinstanciated as a new dm-*
But the parameters in the /sys/block of the new device are the same as in the dm- device before failing and restarting.

It's only the failing path thats working after the fail and restart. So i have to let every path fail before i can use is.

##############################################################################################

I contacted IBM because the system in the post of Dominik doesn't refer to my storages and the mentioned udev didn't work.
The answer was ambigous but i gave it a shot.
After upgrading to the most recent Firmware and NVRAM image the Problem seems to be cleared.


So.. yes, it is seems as a Kernel Bug and yes it is an IBM Firmware Bug.

Kernel:
https://lkml.org/lkml/2019/8/22/1260

IBM:
Problems fixed in IBM Spectrum Scale 4.2.2.3 [January 27, 2017]
Fix a multipath device failure that reads "blk_cloned_rq_check_limits: over max size limit" which can occur when kernel function bio_get_nr_vecs() returns a value which is larger than the value of max sectors of the block device.


It doesn't occur on
PVE 5 on Kernel 4.15.18-20-pve
or
PVE 6 on Kernel 4.15.18-20-pve

but occurs on
PVE 6 on Kernel 5.0.21-2-pve

Temporary solution:
as described, let every path fail on a move or clone operation (qemu-img convert) and restart the multipathd after every fail as fast as possible.
After that the pathes seem stable until PVE reboot.

Fixing the Problem:
IBM seems to have integrated a Firmware fix in some or all Firmwares of Q4 2017. So in My case the Dec-2017-version-8.20.27.00

Test Setup was:
HP DL360 G10 and HP ML 350 G8
With QLogic 8 and 4 GBit HBAs
2x IBM DS3524

Thanks for your help!
Solved for me!
Changed the Subject to a better matching one so that others can find it? qemu-server is misleading.

KR
Alex
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!