Proxmox VE 7.2 megaraid issues

kimtatt

New Member
Jun 7, 2022
1
0
1
Anyone having those issue? It appear in 7.2, the UBSAN: array-index-out-of-bounds in drivers/scsi/megaraid/megaraid_sas_fp.c appear few times during loading. the following is just 1 example.

[ 2.708175] ================================================================================
[ 2.708184] UBSAN: array-index-out-of-bounds in drivers/scsi/megaraid/megaraid_sas_fp.c:103:32
[ 2.708190] index 1 is out of range for type 'MR_LD_SPAN_MAP [1]'
[ 2.708194] CPU: 0 PID: 257 Comm: kworker/0:2 Not tainted 5.15.35-1-pve #1
[ 2.708197] Hardware name: Huawei Technologies Co., Ltd. RH2288H V2-12L/BC11SRSG1, BIOS RMIBV520 07/10/2019
[ 2.708198] Workqueue: events work_for_cpu_fn
[ 2.708212] Call Trace:
[ 2.708216] <TASK>
[ 2.708217] dump_stack_lvl+0x4a/0x5f
[ 2.708225] dump_stack+0x10/0x12
[ 2.708228] ubsan_epilogue+0x9/0x45
[ 2.708230] __ubsan_handle_out_of_bounds.cold+0x44/0x49
[ 2.708233] ? del_timer_sync+0x6c/0xb0
[ 2.708242] mr_update_load_balance_params+0xba/0xd0 [megaraid_sas]
[ 2.708260] MR_ValidateMapInfo+0x1f0/0xe40 [megaraid_sas]
[ 2.708269] ? __bpf_trace_tick_stop+0x10/0x10
[ 2.708273] ? wait_and_poll+0x5c/0xb0 [megaraid_sas]
[ 2.708282] ? megasas_issue_polled+0x5d/0x70 [megaraid_sas]
[ 2.708291] megasas_init_adapter_fusion+0xb0d/0xc90 [megaraid_sas]
[ 2.708300] megasas_probe_one.cold+0xbfd/0x195d [megaraid_sas]
[ 2.708311] ? finish_task_switch.isra.0+0xa6/0x2a0
[ 2.708318] local_pci_probe+0x4b/0x90
[ 2.708322] work_for_cpu_fn+0x1a/0x30
[ 2.708325] process_one_work+0x22b/0x3d0
[ 2.708329] worker_thread+0x223/0x410
[ 2.708331] ? process_one_work+0x3d0/0x3d0
[ 2.708333] kthread+0x12a/0x150
[ 2.708340] ? set_kthread_struct+0x50/0x50
[ 2.708343] ret_from_fork+0x22/0x30
[ 2.708360] </TASK>
[ 2.708361] ================================================================================
 
There appear to be multiple UBSAN issues with the new 5.15 kernel and there are more reports about RAID-card issues recently. The only quick work-around I know is to use kernel 5.13 (apt install pve-kernel-5.13 and select it during boot) on PVE 7.2 for now.
 
Hello,
I added the parameters but I still have the error.
I run 5.15.35-2-pve.
Do you have a idea?
Thank you in Adwance,
Best Regrads,
Gio

================================================================================
Jun 10 09:52:29 xxx kernel: [ 2.258149] UBSAN: array-index-out-of-bounds in drivers/scsi/megaraid/megaraid_sas_fp.c:103:32
Jun 10 09:52:29 xxx kernel: [ 2.258212] index 1 is out of range for type 'MR_LD_SPAN_MAP [1]'
Jun 10 09:52:29 xxx kernel: [ 2.258270] CPU: 0 PID: 203 Comm: kworker/0:3 Not tainted 5.15.35-2-pve #1
Jun 10 09:52:29 xxx kernel: [ 2.258273] Hardware name: transtec CALLEO/X10DRi, BIOS 2.0 12/28/2015
Jun 10 09:52:29 xxx kernel: [ 2.258274] Workqueue: events work_for_cpu_fn
Jun 10 09:52:29 xxx kernel: [ 2.258281] Call Trace:
Jun 10 09:52:29 xxx kernel: [ 2.258283] <TASK>
Jun 10 09:52:29 xxx kernel: [ 2.258285] dump_stack_lvl+0x4a/0x5f
Jun 10 09:52:29 xxx kernel: [ 2.258292] dump_stack+0x10/0x12
Jun 10 09:52:29 xxx kernel: [ 2.258293] ubsan_epilogue+0x9/0x45
Jun 10 09:52:29 xxx kernel: [ 2.258295] __ubsan_handle_out_of_bounds.cold+0x44/0x49
Jun 10 09:52:29 xxx kernel: [ 2.258297] ? del_timer_sync+0x6c/0xb0
Jun 10 09:52:29 xxx kernel: [ 2.258302] mr_update_load_balance_params+0xba/0xd0 [megaraid_sas]
Jun 10 09:52:29 xxx kernel: [ 2.258315] MR_ValidateMapInfo+0x1f0/0xe40 [megaraid_sas]
Jun 10 09:52:29 xxx kernel: [ 2.258321] ? __bpf_trace_tick_stop+0x10/0x10
Jun 10 09:52:29 xxx kernel: [ 2.258323] ? wait_and_poll+0x5c/0xb0 [megaraid_sas]
Jun 10 09:52:29 xxx kernel: [ 2.258329] ? megasas_issue_polled+0x5d/0x70 [megaraid_sas]
Jun 10 09:52:29 xxx kernel: [ 2.258334] megasas_init_adapter_fusion+0xb0d/0xc90 [megaraid_sas]
Jun 10 09:52:29 xxx kernel: [ 2.258339] megasas_probe_one.cold+0xbfd/0x195d [megaraid_sas]
Jun 10 09:52:29 xxx kernel: [ 2.258346] ? finish_task_switch.isra.0+0xa6/0x2a0
Jun 10 09:52:29 xxx kernel: [ 2.258351] local_pci_probe+0x4b/0x90
Jun 10 09:52:29 xxx kernel: [ 2.258354] work_for_cpu_fn+0x1a/0x30
Jun 10 09:52:29 xxx kernel: [ 2.258356] process_one_work+0x22b/0x3d0
Jun 10 09:52:29 xxx kernel: [ 2.258358] worker_thread+0x223/0x410
Jun 10 09:52:29 xxx kernel: [ 2.258359] ? process_one_work+0x3d0/0x3d0
Jun 10 09:52:29 xxx kernel: [ 2.258361] kthread+0x12a/0x150
Jun 10 09:52:29 xxx kernel: [ 2.258364] ? set_kthread_struct+0x50/0x50
Jun 10 09:52:29 xxx kernel: [ 2.258366] ret_from_fork+0x22/0x30
Jun 10 09:52:29 xxx kernel: [ 2.258370] </TASK>
Jun 10 09:52:29 xxx kernel: [ 2.258371] ================================================================================
Jun 10 09:52:29 xxx kernel: [ 2.258432] ================================================================================
Jun 10 09:52:29 xxx kernel: [ 2.258490] UBSAN: array-index-out-of-bounds in drivers/scsi/megaraid/megaraid_sas_fp.c:103:32
Jun 10 09:52:29 xxx kernel: [ 2.258549] index 1 is out of range for type 'MR_LD_SPAN_MAP [1]'
 
  • Like
Reactions: leesteken
Hello Stoiko,
thank you for your quick answer.

Unfortunately, I can't see my two SSDs in Proxmox. I thought it depends on these errors. But I'm not 100% sure.
Regards,
Gio
 
I thought it depends on these errors. But I'm not 100% sure.
Since this is the third similar report - just to make sure that it's really not related - we prepared a test-kernel where UBSAN is disabled:
http://download.proxmox.com/temp/kernel-5.15.35-noubsan/

it would be great if you could:
* fetch the packages
* compare the sha512sums
* install them via `apt install /path/to/packages/pve-kernel-5.15.35-3testubsan-pve_5.15.35-1~testubsan_amd64.deb` (or with `dpkg -i`)

and boot once to see if the issues go away.
 
Hello Stoiko,
Many Thanks :).
I tested the old Kernel (5.13.19-6-pve) and the warnig in dmseg were away, my disk too :).
Finally I found the problem. . . I'm a bit ashamed... but the raid didn't automatically present the SSD to the SO. I had to manually set it as a JBOD. Now I have 5.15.35-2-pve #1 and the waring in dmesg but Storage works.
The server hat to be in prod this evening or tomorrow. If you tell me, that I can go live with 5.15.35-2-pve I will.
Best regard,
Gio
 
Last edited:
Thanks for trying it and verifying that the UBSAN warnings are indeed only warnings and turning it off does not help in that situation!

Much appreciated

The server hat to be in prod this evening or tomorrow. If you tell me, that I can go live with 5.15.35-2-pve I will.
It's the current kernel - and if the logs look ok for your system - I'd say go for it.

Of course - make sure to have a backup of all important data! especially with the server going into production
 
I have the same messages at system startup since upgrading to pve 7.2. (HW: Supermicro X9SRL-F/X9SRL-F, BIOS 3.3 11/13/2018 & AVAGO MegaRAID SAS 9361-8i FW-Version: 4.680.00-8577)

When will the kernel without UBSAN be in the repository?
 
Same thing when booting up. All disks are shown as expected to proxmox, but still getting an error message.

Hardware: Dell PowerEdge R740, BIOS 2.14.2 03/21/2022
Storage Card: PERC H740p.

Bash:
uname -a
Linux p01 5.15.39-4-pve #1 SMP PVE 5.15.39-4 (Mon, 08 Aug 2022 15:11:15 +0200) x86_64 GNU/Linux   

root@p01:~# lspci | grep LSI
18:00.0 RAID bus controller: Broadcom / LSI MegaRAID Tri-Mode SAS3508 (rev 01)

[    4.964496] ================================================================================
[    4.964496] CPU: 18 PID: 485 Comm: kworker/18:1H Tainted: G          I       5.15.39-4-pve #1
[    4.964501] ================================================================================
[    4.964501] Hardware name: Dell Inc. PowerEdge R740, BIOS 2.14.2 03/21/2022
[    4.964502] Workqueue: kblockd blk_mq_run_work_fn
[    4.964505] UBSAN: array-index-out-of-bounds in drivers/scsi/megaraid/megaraid_sas_fp.c:125:9
[    4.964507] Call Trace:
[    4.964508]  <TASK>
[    4.964511] index 2 is out of range for type 'MR_LD_SPAN_MAP [1]'
[    4.964511]  dump_stack_lvl+0x4a/0x63
[    4.964515]  dump_stack+0x10/0x16
[    4.964519]  ubsan_epilogue+0x9/0x49
[    4.964522]  __ubsan_handle_out_of_bounds.cold+0x44/0x49
[    4.964525]  ? _printk+0x58/0x73
[    4.964530]  MR_GetPhyParams+0x516/0x710 [megaraid_sas]
[    4.964542]  MR_BuildRaidContext+0x3bb/0xb70 [megaraid_sas]
[    4.964554]  megasas_build_and_issue_cmd_fusion+0x1071/0x17e0 [megaraid_sas]
[    4.964565]  megasas_queue_command+0x1cb/0x210 [megaraid_sas]
[    4.964573]  scsi_queue_rq+0x3de/0xbe0
[    4.964576]  blk_mq_dispatch_rq_list+0x139/0x800
[    4.964580]  ? sbitmap_get+0xb4/0x1f0
[    4.964582]  ? __sbitmap_queue_get+0x1/0x10
[    4.964585]  blk_mq_do_dispatch_sched+0x2fe/0x340
[    4.964590]  __blk_mq_sched_dispatch_requests+0x105/0x150
[    4.964595]  blk_mq_sched_dispatch_requests+0x35/0x70
[    4.964599]  __blk_mq_run_hw_queue+0x34/0xc0
[    4.964601]  blk_mq_run_work_fn+0x1f/0x30
[    4.964604]  process_one_work+0x228/0x3d0
[    4.964607]  worker_thread+0x53/0x420
[    4.964610]  ? process_one_work+0x3d0/0x3d0
[    4.964613]  kthread+0x127/0x150
[    4.964617]  ? set_kthread_struct+0x50/0x50
[    4.964621]  ret_from_fork+0x1f/0x30
[    4.964626]  </TASK>
 
Since this is the third similar report - just to make sure that it's really not related - we prepared a test-kernel where UBSAN is disabled:
http://download.proxmox.com/temp/kernel-5.15.35-noubsan/

it would be great if you could:
* fetch the packages
* compare the sha512sums
* install them via `apt install /path/to/packages/pve-kernel-5.15.35-3testubsan-pve_5.15.35-1~testubsan_amd64.deb` (or with `dpkg -i`)

and boot once to see if the issues go away.
Please tell us when we will have recent 5.15 kernels without UBSAN?
Thanks a lot
 
Please tell us when we will have recent 5.15 kernels without UBSAN?
Thanks a lot
According to the bugzilla entry linked in that other thread, this (cosmetic) problem is fixed upstream, but it probably will take some time to reach the distribution-provided kernels.

(I came here because the IBM x3650M4 with IBM ServeRAID M5110e 3.46 triggers the same UBSAN message. Arrays still work, though.)
 
  • Like
Reactions: mikekgr
@mow : same for me with "LSI MegaRAID SAS 2208 [Thunderbolt] (rev 03)" I still can see the arrays and everything else seems to be in order but I am running

"Linux proxmox 5.15.83-1-pve #1 SMP PVE 5.15.83-1 (2022-12-15T00:00Z) x86_64 GNU/Linux"

I understand kernel 5.15 should fix this? Obviously this is not the case for me... Anything I am missing?
 
  • Like
Reactions: hk@
@lpallard : If I read the lkml mails right, those patches are integrated into the 6.1 kernel series, so unless someone backports those to 5.15, it could take quite some time to reach the distributions ...
 
Thanks @tom for the input.

@mow : Since it seems to be only a cosmetic issue, I think I will hold off on kernel 6.1 until its tested and in production for a while...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!