Proxmox VE 7.2 megaraid issues

kimtatt · Jun 7, 2022

Anyone having those issue? It appear in 7.2, the UBSAN: array-index-out-of-bounds in drivers/scsi/megaraid/megaraid_sas_fp.c appear few times during loading. the following is just 1 example.

[ 2.708175] ================================================================================
[ 2.708184] UBSAN: array-index-out-of-bounds in drivers/scsi/megaraid/megaraid_sas_fp.c:103:32
[ 2.708190] index 1 is out of range for type 'MR_LD_SPAN_MAP [1]'
[ 2.708194] CPU: 0 PID: 257 Comm: kworker/0:2 Not tainted 5.15.35-1-pve #1
[ 2.708197] Hardware name: Huawei Technologies Co., Ltd. RH2288H V2-12L/BC11SRSG1, BIOS RMIBV520 07/10/2019
[ 2.708198] Workqueue: events work_for_cpu_fn
[ 2.708212] Call Trace:
[ 2.708216] <TASK>
[ 2.708217] dump_stack_lvl+0x4a/0x5f
[ 2.708225] dump_stack+0x10/0x12
[ 2.708228] ubsan_epilogue+0x9/0x45
[ 2.708230] __ubsan_handle_out_of_bounds.cold+0x44/0x49
[ 2.708233] ? del_timer_sync+0x6c/0xb0
[ 2.708242] mr_update_load_balance_params+0xba/0xd0 [megaraid_sas]
[ 2.708260] MR_ValidateMapInfo+0x1f0/0xe40 [megaraid_sas]
[ 2.708269] ? __bpf_trace_tick_stop+0x10/0x10
[ 2.708273] ? wait_and_poll+0x5c/0xb0 [megaraid_sas]
[ 2.708282] ? megasas_issue_polled+0x5d/0x70 [megaraid_sas]
[ 2.708291] megasas_init_adapter_fusion+0xb0d/0xc90 [megaraid_sas]
[ 2.708300] megasas_probe_one.cold+0xbfd/0x195d [megaraid_sas]
[ 2.708311] ? finish_task_switch.isra.0+0xa6/0x2a0
[ 2.708318] local_pci_probe+0x4b/0x90
[ 2.708322] work_for_cpu_fn+0x1a/0x30
[ 2.708325] process_one_work+0x22b/0x3d0
[ 2.708329] worker_thread+0x223/0x410
[ 2.708331] ? process_one_work+0x3d0/0x3d0
[ 2.708333] kthread+0x12a/0x150
[ 2.708340] ? set_kthread_struct+0x50/0x50
[ 2.708343] ret_from_fork+0x22/0x30
[ 2.708360] </TASK>
[ 2.708361] ================================================================================

leesteken · Jun 7, 2022

There appear to be multiple UBSAN issues with the new 5.15 kernel and there are more reports about RAID-card issues recently. The only quick work-around I know is to use kernel 5.13 (apt install pve-kernel-5.13 and select it during boot) on PVE 7.2 for now.

strehi · Jun 7, 2022

I had to add the kernel parameters intel_iommu=on iommu=pt via /etc/default/grub for Kernel 5.15.x
May be this would help you too?

Bye
Bernd

leesteken · Jun 7, 2022

strehi said:
I had to add the kernel parameters intel_iommu=on iommu=pt via /etc/default/grub for Kernel 5.15.x

Good point! I think intel_iommu=on became the default in 5.15 and for some devices the identity mapping (iommu=pt) is needed instead.

gio2022 · Jun 10, 2022

Hello,
I added the parameters but I still have the error.
I run 5.15.35-2-pve.
Do you have a idea?
Thank you in Adwance,
Best Regrads,
Gio

================================================================================
Jun 10 09:52:29 xxx kernel: [ 2.258149] UBSAN: array-index-out-of-bounds in drivers/scsi/megaraid/megaraid_sas_fp.c:103:32
Jun 10 09:52:29 xxx kernel: [ 2.258212] index 1 is out of range for type 'MR_LD_SPAN_MAP [1]'
Jun 10 09:52:29 xxx kernel: [ 2.258270] CPU: 0 PID: 203 Comm: kworker/0:3 Not tainted 5.15.35-2-pve #1
Jun 10 09:52:29 xxx kernel: [ 2.258273] Hardware name: transtec CALLEO/X10DRi, BIOS 2.0 12/28/2015
Jun 10 09:52:29 xxx kernel: [ 2.258274] Workqueue: events work_for_cpu_fn
Jun 10 09:52:29 xxx kernel: [ 2.258281] Call Trace:
Jun 10 09:52:29 xxx kernel: [ 2.258283] <TASK>
Jun 10 09:52:29 xxx kernel: [ 2.258285] dump_stack_lvl+0x4a/0x5f
Jun 10 09:52:29 xxx kernel: [ 2.258292] dump_stack+0x10/0x12
Jun 10 09:52:29 xxx kernel: [ 2.258293] ubsan_epilogue+0x9/0x45
Jun 10 09:52:29 xxx kernel: [ 2.258295] __ubsan_handle_out_of_bounds.cold+0x44/0x49
Jun 10 09:52:29 xxx kernel: [ 2.258297] ? del_timer_sync+0x6c/0xb0
Jun 10 09:52:29 xxx kernel: [ 2.258302] mr_update_load_balance_params+0xba/0xd0 [megaraid_sas]
Jun 10 09:52:29 xxx kernel: [ 2.258315] MR_ValidateMapInfo+0x1f0/0xe40 [megaraid_sas]
Jun 10 09:52:29 xxx kernel: [ 2.258321] ? __bpf_trace_tick_stop+0x10/0x10
Jun 10 09:52:29 xxx kernel: [ 2.258323] ? wait_and_poll+0x5c/0xb0 [megaraid_sas]
Jun 10 09:52:29 xxx kernel: [ 2.258329] ? megasas_issue_polled+0x5d/0x70 [megaraid_sas]
Jun 10 09:52:29 xxx kernel: [ 2.258334] megasas_init_adapter_fusion+0xb0d/0xc90 [megaraid_sas]
Jun 10 09:52:29 xxx kernel: [ 2.258339] megasas_probe_one.cold+0xbfd/0x195d [megaraid_sas]
Jun 10 09:52:29 xxx kernel: [ 2.258346] ? finish_task_switch.isra.0+0xa6/0x2a0
Jun 10 09:52:29 xxx kernel: [ 2.258351] local_pci_probe+0x4b/0x90
Jun 10 09:52:29 xxx kernel: [ 2.258354] work_for_cpu_fn+0x1a/0x30
Jun 10 09:52:29 xxx kernel: [ 2.258356] process_one_work+0x22b/0x3d0
Jun 10 09:52:29 xxx kernel: [ 2.258358] worker_thread+0x223/0x410
Jun 10 09:52:29 xxx kernel: [ 2.258359] ? process_one_work+0x3d0/0x3d0
Jun 10 09:52:29 xxx kernel: [ 2.258361] kthread+0x12a/0x150
Jun 10 09:52:29 xxx kernel: [ 2.258364] ? set_kthread_struct+0x50/0x50
Jun 10 09:52:29 xxx kernel: [ 2.258366] ret_from_fork+0x22/0x30
Jun 10 09:52:29 xxx kernel: [ 2.258370] </TASK>
Jun 10 09:52:29 xxx kernel: [ 2.258371] ================================================================================
Jun 10 09:52:29 xxx kernel: [ 2.258432] ================================================================================
Jun 10 09:52:29 xxx kernel: [ 2.258490] UBSAN: array-index-out-of-bounds in drivers/scsi/megaraid/megaraid_sas_fp.c:103:32
Jun 10 09:52:29 xxx kernel: [ 2.258549] index 1 is out of range for type 'MR_LD_SPAN_MAP [1]'

leesteken · Jun 10, 2022

gio2022 said:
I added the parameters but I still have the error.
I run 5.15.35-2-pve.
Do you have a idea?

Running kernel 5.13 is a temporary option. Maybe the other threads on this forum about (mega)raid cards have found another fix or work-around for 5.15?

Stoiko Ivanov · Jun 10, 2022

gio2022 said:
Hello,
I added the parameters but I still have the error.
I run 5.15.35-2-pve.
Do you have a idea?
Thank you in Adwance,

The messages are just warnings of a sanitizer - and should not impact the functioning of the system (please correct me if I'm wrong)

See: https://forum.proxmox.com/threads/proxmox-7-2-upgrade-broke-my-raid.109368/post-471877 (and below)

for more details

gio2022 · Jun 10, 2022

Hello Stoiko,
thank you for your quick answer.

Unfortunately, I can't see my two SSDs in Proxmox. I thought it depends on these errors. But I'm not 100% sure.
Regards,
Gio

mac.linux.free · Jun 10, 2022

you have to fix the uefi health in the bios setup...i had the same problem, too.

Stoiko Ivanov · Jun 10, 2022

gio2022 said:
I thought it depends on these errors. But I'm not 100% sure.

Since this is the third similar report - just to make sure that it's really not related - we prepared a test-kernel where UBSAN is disabled:
http://download.proxmox.com/temp/kernel-5.15.35-noubsan/

it would be great if you could:
* fetch the packages
* compare the sha512sums
* install them via `apt install /path/to/packages/pve-kernel-5.15.35-3testubsan-pve_5.15.35-1~testubsan_amd64.deb` (or with `dpkg -i`)

and boot once to see if the issues go away.

gio2022 · Jun 13, 2022

Hello Stoiko,
Many Thanks

.
I tested the old Kernel (5.13.19-6-pve) and the warnig in dmseg were away, my disk too

.
Finally I found the problem. . . I'm a bit ashamed... but the raid didn't automatically present the SSD to the SO. I had to manually set it as a JBOD. Now I have 5.15.35-2-pve #1 and the waring in dmesg but Storage works.
The server hat to be in prod this evening or tomorrow. If you tell me, that I can go live with 5.15.35-2-pve I will.
Best regard,
Gio

Stoiko Ivanov · Jun 13, 2022

Thanks for trying it and verifying that the UBSAN warnings are indeed only warnings and turning it off does not help in that situation!

Much appreciated

gio2022 said:
The server hat to be in prod this evening or tomorrow. If you tell me, that I can go live with 5.15.35-2-pve I will.

It's the current kernel - and if the logs look ok for your system - I'd say go for it.

Of course - make sure to have a backup of all important data! especially with the server going into production

ipv4 · Aug 24, 2022

I have the same messages at system startup since upgrading to pve 7.2. (HW: Supermicro X9SRL-F/X9SRL-F, BIOS 3.3 11/13/2018 & AVAGO MegaRAID SAS 9361-8i FW-Version: 4.680.00-8577)

When will the kernel without UBSAN be in the repository?

santiagobiali · Sep 5, 2022

Same thing when booting up. All disks are shown as expected to proxmox, but still getting an error message.

Hardware: Dell PowerEdge R740, BIOS 2.14.2 03/21/2022
Storage Card: PERC H740p.

Bash:

uname -a
Linux p01 5.15.39-4-pve #1 SMP PVE 5.15.39-4 (Mon, 08 Aug 2022 15:11:15 +0200) x86_64 GNU/Linux   

root@p01:~# lspci | grep LSI
18:00.0 RAID bus controller: Broadcom / LSI MegaRAID Tri-Mode SAS3508 (rev 01)

[    4.964496] ================================================================================
[    4.964496] CPU: 18 PID: 485 Comm: kworker/18:1H Tainted: G          I       5.15.39-4-pve #1
[    4.964501] ================================================================================
[    4.964501] Hardware name: Dell Inc. PowerEdge R740, BIOS 2.14.2 03/21/2022
[    4.964502] Workqueue: kblockd blk_mq_run_work_fn
[    4.964505] UBSAN: array-index-out-of-bounds in drivers/scsi/megaraid/megaraid_sas_fp.c:125:9
[    4.964507] Call Trace:
[    4.964508]  <TASK>
[    4.964511] index 2 is out of range for type 'MR_LD_SPAN_MAP [1]'
[    4.964511]  dump_stack_lvl+0x4a/0x63
[    4.964515]  dump_stack+0x10/0x16
[    4.964519]  ubsan_epilogue+0x9/0x49
[    4.964522]  __ubsan_handle_out_of_bounds.cold+0x44/0x49
[    4.964525]  ? _printk+0x58/0x73
[    4.964530]  MR_GetPhyParams+0x516/0x710 [megaraid_sas]
[    4.964542]  MR_BuildRaidContext+0x3bb/0xb70 [megaraid_sas]
[    4.964554]  megasas_build_and_issue_cmd_fusion+0x1071/0x17e0 [megaraid_sas]
[    4.964565]  megasas_queue_command+0x1cb/0x210 [megaraid_sas]
[    4.964573]  scsi_queue_rq+0x3de/0xbe0
[    4.964576]  blk_mq_dispatch_rq_list+0x139/0x800
[    4.964580]  ? sbitmap_get+0xb4/0x1f0
[    4.964582]  ? __sbitmap_queue_get+0x1/0x10
[    4.964585]  blk_mq_do_dispatch_sched+0x2fe/0x340
[    4.964590]  __blk_mq_sched_dispatch_requests+0x105/0x150
[    4.964595]  blk_mq_sched_dispatch_requests+0x35/0x70
[    4.964599]  __blk_mq_run_hw_queue+0x34/0xc0
[    4.964601]  blk_mq_run_work_fn+0x1f/0x30
[    4.964604]  process_one_work+0x228/0x3d0
[    4.964607]  worker_thread+0x53/0x420
[    4.964610]  ? process_one_work+0x3d0/0x3d0
[    4.964613]  kthread+0x127/0x150
[    4.964617]  ? set_kthread_struct+0x50/0x50
[    4.964621]  ret_from_fork+0x1f/0x30
[    4.964626]  </TASK>

mikekgr · Sep 17, 2022

Stoiko Ivanov said:
Since this is the third similar report - just to make sure that it's really not related - we prepared a test-kernel where UBSAN is disabled:
http://download.proxmox.com/temp/kernel-5.15.35-noubsan/

it would be great if you could:
* fetch the packages
* compare the sha512sums
* install them via `apt install /path/to/packages/pve-kernel-5.15.35-3testubsan-pve_5.15.35-1~testubsan_amd64.deb` (or with `dpkg -i`)

and boot once to see if the issues go away.

Please tell us when we will have recent 5.15 kernels without UBSAN?
Thanks a lot

mow · Nov 3, 2022

mikekgr said:
Please tell us when we will have recent 5.15 kernels without UBSAN?
Thanks a lot

According to the bugzilla entry linked in that other thread, this (cosmetic) problem is fixed upstream, but it probably will take some time to reach the distribution-provided kernels.

(I came here because the IBM x3650M4 with IBM ServeRAID M5110e 3.46 triggers the same UBSAN message. Arrays still work, though.)

lpallard · Jan 6, 2023

@mow : same for me with "LSI MegaRAID SAS 2208 [Thunderbolt] (rev 03)" I still can see the arrays and everything else seems to be in order but I am running

"Linux proxmox 5.15.83-1-pve #1 SMP PVE 5.15.83-1 (2022-12-15T00:00Z) x86_64 GNU/Linux"

I understand kernel 5.15 should fix this? Obviously this is not the case for me... Anything I am missing?

mow · Jan 13, 2023

@lpallard : If I read the lkml mails right, those patches are integrated into the 6.1 kernel series, so unless someone backports those to 5.15, it could take quite some time to reach the distributions ...

tom · Jan 13, 2023

See https://forum.proxmox.com/threads/opt-in-linux-6-1-kernel-for-proxmox-ve-7-x-available.119483/

lpallard · Jan 13, 2023

Thanks @tom for the input.

@mow : Since it seems to be only a cosmetic issue, I think I will hold off on kernel 6.1 until its tested and in production for a while...

Proxmox VE 7.2 megaraid issues

New Member

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Proxmox Staff Member

Member

Renowned Member

Proxmox Staff Member

Member

Proxmox Staff Member

New Member

Member

Member

Member

Renowned Member

Member

Proxmox Staff Member

Renowned Member