Kernel 6.17 bug with megaraid-sas (HPE MR416)

nikybiasion

Renowned Member
May 28, 2012
31
5
73
Hi, after i've upgrade one of my nodes to kernel 6.17.2, i've an issue when run proxmox-boot-tool refresh
this is the log:
Code:
[  252.712321] sd 0:2:1:0: [sda] tag#561 page boundary ptr_sgl: 0x00000000866cde4d
[  252.712561] BUG: unable to handle page fault for address: ffffd33f82703000
[  252.712729] #PF: supervisor write access in kernel mode
[  252.712889] #PF: error_code(0x0002) - not-present page
[  252.713047] PGD 100000067 P4D 100000067 PUD 10030c067 PMD 109271067 PTE 0
[  252.713213] Oops: Oops: 0002 [#1] SMP NOPTI
[  252.713375] CPU: 16 UID: 0 PID: 402 Comm: kworker/16:1H Tainted: P           O        6.17.4-1-pve #1 PREEMPT(voluntary)
[  252.713546] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
[  252.713712] Hardware name: HPE ProLiant DL325 Gen10 Plus v2/ProLiant DL325 Gen10 Plus v2, BIOS A43 08/07/2024
[  252.713886] Workqueue: kblockd blk_mq_run_work_fn
[  252.714062] RIP: 0010:megasas_build_and_issue_cmd_fusion+0xeaa/0x1870 [megaraid_sas]
[  252.714246] Code: 20 48 89 d1 48 83 e1 fc 83 e2 01 48 0f 45 d9 4c 8b 73 10 44 8b 6b 18 4c 89 f9 4c 8d 79 08 45 85 fa 0f 84 fd 03 00 00 45 29 cc <4c> 89 31 48 83 c0 08 41 83 c0 01 45 29 cd 45 85 e4 7f ab 44 89 c0
[  252.714612] RSP: 0018:ffffd33f81687b50 EFLAGS: 00010206
[  252.714799] RAX: 00000000fedcf000 RBX: ffff8a8d6bcebdc0 RCX: ffffd33f82703000
[  252.714987] RDX: ffffd33f82703008 RSI: ffff8a8d6bcebc88 RDI: 0000000000000000
[  252.715177] RBP: ffffd33f81687c20 R08: 0000000000000200 R09: 0000000000001000
[  252.715366] R10: 0000000000000fff R11: 0000000000001000 R12: 0000000000001000
[  252.715555] R13: 0000000000002000 R14: 00000000a8e00000 R15: ffffd33f82703008
[  252.715759] FS:  0000000000000000(0000) GS:ffff8aacb8386000(0000) knlGS:0000000000000000
[  252.715965] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  252.716157] CR2: ffffd33f82703000 CR3: 0000001588c3a002 CR4: 0000000000f70ef0
[  252.716352] PKRU: 55555554
[  252.716544] Call Trace:
[  252.716746]  <TASK>
[  252.716962]  megasas_queue_command+0x125/0x1d0 [megaraid_sas]
[  252.717160]  scsi_queue_rq+0x40c/0xcc0
[  252.717354]  blk_mq_dispatch_rq_list+0x124/0x740
[  252.717550]  ? srso_alias_return_thunk+0x5/0xfbef5
[  252.717745]  ? sbitmap_get+0x73/0x180
[  252.717938]  ? sbitmap_get+0x73/0x180
[  252.718131]  __blk_mq_sched_dispatch_requests+0x408/0x600
[  252.718327]  blk_mq_sched_dispatch_requests+0x2d/0x80
[  252.718519]  blk_mq_run_work_fn+0x72/0x90
[  252.718710]  process_one_work+0x18b/0x370
[  252.718902]  worker_thread+0x33a/0x480
[  252.719091]  ? srso_alias_return_thunk+0x5/0xfbef5
[  252.719282]  ? __pfx_worker_thread+0x10/0x10
[  252.719472]  kthread+0x10b/0x220
[  252.719661]  ? __pfx_kthread+0x10/0x10
[  252.719850]  ret_from_fork+0x208/0x240
[  252.720038]  ? __pfx_kthread+0x10/0x10
[  252.720224]  ret_from_fork_asm+0x1a/0x30
[  252.720414]  </TASK>
[  252.720596] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter sctp ip6_udp_tunnel udp_tunnel nf_tables 8021q garp mrp bonding tls softdog sunrpc binfmt_misc nfnetlink_log ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl hpilo pcspkr mgag200 ses ccp ee1004 enclosure scsi_transport_sas k10temp input_leds acpi_tad ipmi_si acpi_power_meter acpi_ipmi joydev ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel vhost_net vhost vhost_iotlb tap efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 usbkbd usbmouse hid_generic usbhid hid zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq igb i2c_algo_bit dca ehci_pci xhci_pci ehci_hcd xhci_hcd bnxt_en i2c_piix4 ptdma megaraid_sas i2c_smbus wmi
[  252.722303] CR2: ffffd33f82703000
[  252.722511] ---[ end trace 0000000000000000 ]---
[  252.776411] RIP: 0010:megasas_build_and_issue_cmd_fusion+0xeaa/0x1870 [megaraid_sas]
[  252.776727] Code: 20 48 89 d1 48 83 e1 fc 83 e2 01 48 0f 45 d9 4c 8b 73 10 44 8b 6b 18 4c 89 f9 4c 8d 79 08 45 85 fa 0f 84 fd 03 00 00 45 29 cc <4c> 89 31 48 83 c0 08 41 83 c0 01 45 29 cd 45 85 e4 7f ab 44 89 c0
[  252.777184] RSP: 0018:ffffd33f81687b50 EFLAGS: 00010206
[  252.777263] sd 0:2:1:0: [sda] tag#563 page boundary ptr_sgl: 0x00000000a0732742
[  252.777417] RAX: 00000000fedcf000 RBX: ffff8a8d6bcebdc0 RCX: ffffd33f82703000
[  252.777803] BUG: unable to handle page fault for address: ffffd33f82707000
[  252.777863] RDX: ffffd33f82703008 RSI: ffff8a8d6bcebc88 RDI: 0000000000000000
[  252.778065] #PF: supervisor write access in kernel mode
[  252.778271] RBP: ffffd33f81687c20 R08: 0000000000000200 R09: 0000000000001000
[  252.778472] #PF: error_code(0x0002) - not-present page
[  252.778679] R10: 0000000000000fff R11: 0000000000001000 R12: 0000000000001000
[  252.778881] PGD 100000067
[  252.779088] R13: 0000000000002000 R14: 00000000a8e00000 R15: ffffd33f82703008
[  252.779088] P4D 100000067
[  252.779090] FS:  0000000000000000(0000) GS:ffff8aacb8386000(0000) knlGS:0000000000000000
[  252.779290] PUD 10030c067
[  252.779497] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  252.779697] PMD 109271067
[  252.779904] CR2: ffffd33f82703000 CR3: 0000001588c3a002 CR4: 0000000000f70ef0
[  252.780106] PTE 0
[  252.780317] PKRU: 55555554

[  252.780734] note: kworker/16:1H[402] exited with irqs disabled
[  252.780942] Oops: Oops: 0002 [#2] SMP NOPTI
[  252.781777] CPU: 3 UID: 0 PID: 221 Comm: kworker/u129:2 Tainted: P      D    O        6.17.4-1-pve #1 PREEMPT(voluntary)
[  252.782001] Tainted: [P]=PROPRIETARY_MODULE, [D]=DIE, [O]=OOT_MODULE
[  252.782221] Hardware name: HPE ProLiant DL325 Gen10 Plus v2/ProLiant DL325 Gen10 Plus v2, BIOS A43 08/07/2024
[  252.782448] Workqueue: writeback wb_workfn (flush-8:0)
[  252.782678] RIP: 0010:megasas_build_and_issue_cmd_fusion+0xeaa/0x1870 [megaraid_sas]
[  252.782915] Code: 20 48 89 d1 48 83 e1 fc 83 e2 01 48 0f 45 d9 4c 8b 73 10 44 8b 6b 18 4c 89 f9 4c 8d 79 08 45 85 fa 0f 84 fd 03 00 00 45 29 cc <4c> 89 31 48 83 c0 08 41 83 c0 01 45 29 cd 45 85 e4 7f ab 44 89 c0
[  252.783390] RSP: 0018:ffffd33f80abb300 EFLAGS: 00010206
[  252.783632] RAX: 00000000fedcd000 RBX: ffff8a8d6bcc7a40 RCX: ffffd33f82707000
[  252.783877] RDX: ffffd33f82707008 RSI: ffff8a8d6bcc7908 RDI: 0000000000000000
[  252.784124] RBP: ffffd33f80abb3d0 R08: 0000000000000200 R09: 0000000000001000
[  252.784373] R10: 0000000000000fff R11: 0000000000001000 R12: 0000000000001000
[  252.784622] R13: 0000000000002000 R14: 00000000a8600000 R15: ffffd33f82707008
[  252.784866] FS:  0000000000000000(0000) GS:ffff8aacb7d06000(0000) knlGS:0000000000000000
[  252.785111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  252.785351] CR2: ffffd33f82707000 CR3: 00000001c5fff003 CR4: 0000000000f70ef0
[  252.785591] PKRU: 55555554
[  252.785825] Call Trace:
[  252.786054]  <TASK>
[  252.786281]  megasas_queue_command+0x125/0x1d0 [megaraid_sas]
[  252.786508]  scsi_queue_rq+0x40c/0xcc0
[  252.786742]  blk_mq_dispatch_rq_list+0x124/0x740
[  252.787005]  ? srso_alias_return_thunk+0x5/0xfbef5
[  252.787226]  ? sbitmap_get+0x73/0x180
[  252.787441]  ? sbitmap_get+0x73/0x180
[  252.787649]  __blk_mq_sched_dispatch_requests+0x408/0x600
[  252.787857]  ? srso_alias_return_thunk+0x5/0xfbef5
[  252.788065]  blk_mq_sched_dispatch_requests+0x2d/0x80
[  252.788270]  blk_mq_run_hw_queue+0x2c3/0x330
[  252.788474]  blk_mq_dispatch_list+0x13e/0x460
[  252.788676]  blk_mq_flush_plug_list+0x62/0x1e0
[  252.788876]  ? bdev_count_inflight+0x22/0x50
[  252.789075]  blk_add_rq_to_plug+0xfc/0x1c0
[  252.789272]  blk_mq_submit_bio+0x61f/0x890
[  252.789469]  __submit_bio+0x74/0x290
[  252.789663]  ? srso_alias_return_thunk+0x5/0xfbef5
[  252.789857]  submit_bio_noacct_nocheck+0x28d/0x370
[  252.790050]  submit_bio_noacct+0x19b/0x5b0
[  252.790238]  submit_bio+0xb1/0x110
[  252.790422]  mpage_write_folio+0x538/0x7c0
[  252.790605]  ? srso_alias_return_thunk+0x5/0xfbef5
[  252.790788]  ? mod_memcg_lruvec_state+0xd3/0x1f0
[  252.790973]  mpage_writepages+0x87/0x110
[  252.791152]  ? __pfx_fat_get_block+0x10/0x10
[  252.791333]  fat_writepages+0x15/0x30
[  252.791511]  do_writepages+0xc4/0x180
[  252.791689]  __writeback_single_inode+0x44/0x350
[  252.791867]  writeback_sb_inodes+0x24e/0x550
[  252.792049]  wb_writeback+0x98/0x330
[  252.792222]  wb_workfn+0xb6/0x410
[  252.792389]  ? srso_alias_return_thunk+0x5/0xfbef5
[  252.792553]  process_one_work+0x18b/0x370
[  252.792716]  worker_thread+0x33a/0x480
[  252.792873]  ? __pfx_worker_thread+0x10/0x10
[  252.793024]  kthread+0x10b/0x220
[  252.793171]  ? __pfx_kthread+0x10/0x10
[  252.793316]  ret_from_fork+0x208/0x240
[  252.793459]  ? __pfx_kthread+0x10/0x10
[  252.793601]  ret_from_fork_asm+0x1a/0x30
[  252.793747]  </TASK>
[  252.793885] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter sctp ip6_udp_tunnel udp_tunnel nf_tables 8021q garp mrp bonding tls softdog sunrpc binfmt_misc nfnetlink_log ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl hpilo pcspkr mgag200 ses ccp ee1004 enclosure scsi_transport_sas k10temp input_leds acpi_tad ipmi_si acpi_power_meter acpi_ipmi joydev ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel vhost_net vhost vhost_iotlb tap efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 usbkbd usbmouse hid_generic usbhid hid zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq igb i2c_algo_bit dca ehci_pci xhci_pci ehci_hcd xhci_hcd bnxt_en i2c_piix4 ptdma megaraid_sas i2c_smbus wmi
[  252.795224] CR2: ffffd33f82707000
[  252.795389] ---[ end trace 0000000000000000 ]---
[  252.848262] RIP: 0010:megasas_build_and_issue_cmd_fusion+0xeaa/0x1870 [megaraid_sas]
[  252.848526] Code: 20 48 89 d1 48 83 e1 fc 83 e2 01 48 0f 45 d9 4c 8b 73 10 44 8b 6b 18 4c 89 f9 4c 8d 79 08 45 85 fa 0f 84 fd 03 00 00 45 29 cc <4c> 89 31 48 83 c0 08 41 83 c0 01 45 29 cd 45 85 e4 7f ab 44 89 c0
[  252.848912] RSP: 0018:ffffd33f81687b50 EFLAGS: 00010206
[  252.849109] RAX: 00000000fedcf000 RBX: ffff8a8d6bcebdc0 RCX: ffffd33f82703000
[  252.849308] RDX: ffffd33f82703008 RSI: ffff8a8d6bcebc88 RDI: 0000000000000000
[  252.849507] RBP: ffffd33f81687c20 R08: 0000000000000200 R09: 0000000000001000
[  252.849707] R10: 0000000000000fff R11: 0000000000001000 R12: 0000000000001000
[  252.849908] R13: 0000000000002000 R14: 00000000a8e00000 R15: ffffd33f82703008
[  252.850111] FS:  0000000000000000(0000) GS:ffff8aacb7d06000(0000) knlGS:0000000000000000
[  252.850315] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  252.850519] CR2: ffffd33f82707000 CR3: 00000001c5fff003 CR4: 0000000000f70ef0
[  252.850727] PKRU: 55555554
[  252.850932] note: kworker/u129:2[221] exited with irqs disabled
[  252.851158] ------------[ cut here ]------------
[  252.851367] WARNING: CPU: 3 PID: 221 at kernel/exit.c:898 do_exit+0x7d6/0xa20
[  252.851579] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter sctp ip6_udp_tunnel udp_tunnel nf_tables 8021q garp mrp bonding tls softdog sunrpc binfmt_misc nfnetlink_log ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl hpilo pcspkr mgag200 ses ccp ee1004 enclosure scsi_transport_sas k10temp input_leds acpi_tad ipmi_si acpi_power_meter acpi_ipmi joydev ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel vhost_net vhost vhost_iotlb tap efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 usbkbd usbmouse hid_generic usbhid hid zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq igb i2c_algo_bit dca ehci_pci xhci_pci ehci_hcd xhci_hcd bnxt_en i2c_piix4 ptdma megaraid_sas i2c_smbus wmi
[  252.853557] CPU: 3 UID: 0 PID: 221 Comm: kworker/u129:2 Tainted: P      D    O        6.17.4-1-pve #1 PREEMPT(voluntary)
[  252.853811] Tainted: [P]=PROPRIETARY_MODULE, [D]=DIE, [O]=OOT_MODULE
[  252.854063] Hardware name: HPE ProLiant DL325 Gen10 Plus v2/ProLiant DL325 Gen10 Plus v2, BIOS A43 08/07/2024
[  252.854324] Workqueue: writeback wb_workfn (flush-8:0)
[  252.854584] RIP: 0010:do_exit+0x7d6/0xa20
[  252.854842] Code: 4c 89 ab f0 0a 00 00 48 89 45 c0 48 8b 83 10 0d 00 00 e9 33 fe ff ff 48 8b bb d0 0a 00 00 31 f6 e8 2f e2 ff ff e9 e6 fd ff ff <0f> 0b e9 6d f8 ff ff 4c 89 e6 bf 05 06 00 00 e8 d6 41 01 00 e9 a6
[  252.855381] RSP: 0018:ffffd33f80abbec0 EFLAGS: 00010282
[  252.855655] RAX: 0000000000000286 RBX: ffff8a8d42835400 RCX: 0000000000000000
[  252.855932] RDX: 000000000000270f RSI: 0000000000002710 RDI: 0000000000000009
[  252.856215] RBP: ffffd33f80abbf10 R08: 0000000000000000 R09: 0000000000000000
[  252.856495] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000009
[  252.856775] R13: 0000000000000001 R14: ffff8a8d42835400 R15: 0000000000000000
[  252.857056] FS:  0000000000000000(0000) GS:ffff8aacb7d06000(0000) knlGS:0000000000000000
[  252.857339] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  252.857617] CR2: ffffd33f82707000 CR3: 00000001c5fff003 CR4: 0000000000f70ef0
[  252.857896] PKRU: 55555554
[  252.858190] Call Trace:
[  252.858457]  <TASK>
[  252.858717]  make_task_dead+0x93/0xa0
[  252.858974]  rewind_stack_and_make_dead+0x16/0x20
[  252.859267]  </TASK>
[  252.859516] ---[ end trace 0000000000000000 ]---

This is a list of process affected
Code:
root        5769  0.0  0.0   2688  1872 pts/0    S+   15:07   0:00 /bin/sh /usr/sbin/proxmox-boot-tool refresh
root        5826  0.0  0.0   2688  1884 pts/0    S+   15:07   0:00 /bin/sh /etc/kernel/postinst.d/zz-proxmox-boot
root        5828  0.0  0.0   2688  1880 pts/0    S+   15:07   0:00 /bin/sh /etc/kernel/postinst.d/zz-proxmox-boot
root        5872  0.0  0.0   2688  1112 pts/0    S+   15:07   0:00 /bin/sh /etc/kernel/postinst.d/zz-proxmox-boot
root        5918  0.0  0.0   4404  2328 pts/0    D+   15:07   0:00 umount /var/tmp/espmounts/19AB-AEDE

same problem with kernel 6.17.4, i need to revert to 6.14.11 and all works fine
Seems there's no problem running VMs, the only issue happened with proxmox-boot-tools
 
I circumvent that on supermicro with shutting down all vm and ct and then updating/upgrading everything. Try that.
 
Just ran into this issue when upgrading a customer system (HPE ProLiant DL385 Gen 10+ v2, with MR-216i-p, Fw: 52.32.3-6333)

Dear Proxmox team, please either add this to the pve8to9 precheck or add a known-issue to the "Upgrade from 8 to 9" wiki page.
Since this was enterprise repo, i didn't expect something this major to happen without previously knowing about it.

Pinning 6.14 with proxmox-boot-tool also fixed the issue for me.

EDIT: In my case the issue arose with Ceph OSDs crashing and lagging.
 
Last edited:
I tracked this down and it appears to be a real megaraid_sas driver bug, not just a generic firmware/BIOS issue.

Root cause: `megasas_make_prp_nvme()` can write past the end of the 4K PRP chain-frame buffer when building large NVMe PRP lists. The telltale signature is exactly what shows up in this thread:

- `page boundary ptr_sgl: ...`
- page fault on a not-present page
- `R08`/`RAX` showing `0x200` (512), i.e. the 512th 8-byte slot at the 4096-byte boundary

I validated a fix on an affected SAS39xx system with AMD EPYC + IOMMU/SEV-SNP enabled:

- 6.19 crashed repeatedly before the patch
- after patching `megaraid_sas`, the system boots cleanly and survives multi-GB direct I/O
- 6.14 remaining stable also fits this analysis

I’ve posted a proposed kernel patch to linux-scsi: https://lore.kernel.org/linux-scsi/...N_0CX2hsfV7kTf_t0MTf6vdAAaSEc=@magik.net/T/#u

If anyone in this thread can test the patch on their affected hardware, that would be useful.
 
Last edited:
I tracked this down and it appears to be a real megaraid_sas driver bug, not just a generic firmware/BIOS issue.

Root cause: `megasas_make_prp_nvme()` can write past the end of the 4K PRP chain-frame buffer when building large NVMe PRP lists. The telltale signature is exactly what shows up in this thread:

- `page boundary ptr_sgl: ...`
- page fault on a not-present page
- `R08`/`RAX` showing `0x200` (512), i.e. the 512th 8-byte slot at the 4096-byte boundary

That matches the overflow point exactly.
I validated a fix on an affected SAS39xx system with AMD EPYC + IOMMU/SEV-SNP enabled:

- 6.19 crashed repeatedly before the patch
- after patching `megaraid_sas`, the system boots cleanly and survives multi-GB direct I/O
- 6.14 remaining stable also fits this analysis

I’ve posted a proposed kernel patch to linux-scsi here:
<lore link>

If anyone in this thread can test the patch on their affected hardware, that would be useful.
Hi @magik6k thanks in advance for the time you put into that!
The link to the patch is currently missing, but if you add that i'd be happy to try it out.
 
Lol sorry, had this message drafted and waiting until lore picked up my email, but forgot to fill in the link when it was ready. Edited the message
 
  • Like
Reactions: jtheisen