Kernel 6.17 bug with megaraid-sas (HPE MR416)

nikybiasion

Renowned Member
May 28, 2012
30
5
73
Hi, after i've upgrade one of my nodes to kernel 6.17.2, i've an issue when run proxmox-boot-tool refresh
this is the log:
Code:
[  252.712321] sd 0:2:1:0: [sda] tag#561 page boundary ptr_sgl: 0x00000000866cde4d
[  252.712561] BUG: unable to handle page fault for address: ffffd33f82703000
[  252.712729] #PF: supervisor write access in kernel mode
[  252.712889] #PF: error_code(0x0002) - not-present page
[  252.713047] PGD 100000067 P4D 100000067 PUD 10030c067 PMD 109271067 PTE 0
[  252.713213] Oops: Oops: 0002 [#1] SMP NOPTI
[  252.713375] CPU: 16 UID: 0 PID: 402 Comm: kworker/16:1H Tainted: P           O        6.17.4-1-pve #1 PREEMPT(voluntary)
[  252.713546] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
[  252.713712] Hardware name: HPE ProLiant DL325 Gen10 Plus v2/ProLiant DL325 Gen10 Plus v2, BIOS A43 08/07/2024
[  252.713886] Workqueue: kblockd blk_mq_run_work_fn
[  252.714062] RIP: 0010:megasas_build_and_issue_cmd_fusion+0xeaa/0x1870 [megaraid_sas]
[  252.714246] Code: 20 48 89 d1 48 83 e1 fc 83 e2 01 48 0f 45 d9 4c 8b 73 10 44 8b 6b 18 4c 89 f9 4c 8d 79 08 45 85 fa 0f 84 fd 03 00 00 45 29 cc <4c> 89 31 48 83 c0 08 41 83 c0 01 45 29 cd 45 85 e4 7f ab 44 89 c0
[  252.714612] RSP: 0018:ffffd33f81687b50 EFLAGS: 00010206
[  252.714799] RAX: 00000000fedcf000 RBX: ffff8a8d6bcebdc0 RCX: ffffd33f82703000
[  252.714987] RDX: ffffd33f82703008 RSI: ffff8a8d6bcebc88 RDI: 0000000000000000
[  252.715177] RBP: ffffd33f81687c20 R08: 0000000000000200 R09: 0000000000001000
[  252.715366] R10: 0000000000000fff R11: 0000000000001000 R12: 0000000000001000
[  252.715555] R13: 0000000000002000 R14: 00000000a8e00000 R15: ffffd33f82703008
[  252.715759] FS:  0000000000000000(0000) GS:ffff8aacb8386000(0000) knlGS:0000000000000000
[  252.715965] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  252.716157] CR2: ffffd33f82703000 CR3: 0000001588c3a002 CR4: 0000000000f70ef0
[  252.716352] PKRU: 55555554
[  252.716544] Call Trace:
[  252.716746]  <TASK>
[  252.716962]  megasas_queue_command+0x125/0x1d0 [megaraid_sas]
[  252.717160]  scsi_queue_rq+0x40c/0xcc0
[  252.717354]  blk_mq_dispatch_rq_list+0x124/0x740
[  252.717550]  ? srso_alias_return_thunk+0x5/0xfbef5
[  252.717745]  ? sbitmap_get+0x73/0x180
[  252.717938]  ? sbitmap_get+0x73/0x180
[  252.718131]  __blk_mq_sched_dispatch_requests+0x408/0x600
[  252.718327]  blk_mq_sched_dispatch_requests+0x2d/0x80
[  252.718519]  blk_mq_run_work_fn+0x72/0x90
[  252.718710]  process_one_work+0x18b/0x370
[  252.718902]  worker_thread+0x33a/0x480
[  252.719091]  ? srso_alias_return_thunk+0x5/0xfbef5
[  252.719282]  ? __pfx_worker_thread+0x10/0x10
[  252.719472]  kthread+0x10b/0x220
[  252.719661]  ? __pfx_kthread+0x10/0x10
[  252.719850]  ret_from_fork+0x208/0x240
[  252.720038]  ? __pfx_kthread+0x10/0x10
[  252.720224]  ret_from_fork_asm+0x1a/0x30
[  252.720414]  </TASK>
[  252.720596] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter sctp ip6_udp_tunnel udp_tunnel nf_tables 8021q garp mrp bonding tls softdog sunrpc binfmt_misc nfnetlink_log ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl hpilo pcspkr mgag200 ses ccp ee1004 enclosure scsi_transport_sas k10temp input_leds acpi_tad ipmi_si acpi_power_meter acpi_ipmi joydev ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel vhost_net vhost vhost_iotlb tap efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 usbkbd usbmouse hid_generic usbhid hid zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq igb i2c_algo_bit dca ehci_pci xhci_pci ehci_hcd xhci_hcd bnxt_en i2c_piix4 ptdma megaraid_sas i2c_smbus wmi
[  252.722303] CR2: ffffd33f82703000
[  252.722511] ---[ end trace 0000000000000000 ]---
[  252.776411] RIP: 0010:megasas_build_and_issue_cmd_fusion+0xeaa/0x1870 [megaraid_sas]
[  252.776727] Code: 20 48 89 d1 48 83 e1 fc 83 e2 01 48 0f 45 d9 4c 8b 73 10 44 8b 6b 18 4c 89 f9 4c 8d 79 08 45 85 fa 0f 84 fd 03 00 00 45 29 cc <4c> 89 31 48 83 c0 08 41 83 c0 01 45 29 cd 45 85 e4 7f ab 44 89 c0
[  252.777184] RSP: 0018:ffffd33f81687b50 EFLAGS: 00010206
[  252.777263] sd 0:2:1:0: [sda] tag#563 page boundary ptr_sgl: 0x00000000a0732742
[  252.777417] RAX: 00000000fedcf000 RBX: ffff8a8d6bcebdc0 RCX: ffffd33f82703000
[  252.777803] BUG: unable to handle page fault for address: ffffd33f82707000
[  252.777863] RDX: ffffd33f82703008 RSI: ffff8a8d6bcebc88 RDI: 0000000000000000
[  252.778065] #PF: supervisor write access in kernel mode
[  252.778271] RBP: ffffd33f81687c20 R08: 0000000000000200 R09: 0000000000001000
[  252.778472] #PF: error_code(0x0002) - not-present page
[  252.778679] R10: 0000000000000fff R11: 0000000000001000 R12: 0000000000001000
[  252.778881] PGD 100000067
[  252.779088] R13: 0000000000002000 R14: 00000000a8e00000 R15: ffffd33f82703008
[  252.779088] P4D 100000067
[  252.779090] FS:  0000000000000000(0000) GS:ffff8aacb8386000(0000) knlGS:0000000000000000
[  252.779290] PUD 10030c067
[  252.779497] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  252.779697] PMD 109271067
[  252.779904] CR2: ffffd33f82703000 CR3: 0000001588c3a002 CR4: 0000000000f70ef0
[  252.780106] PTE 0
[  252.780317] PKRU: 55555554

[  252.780734] note: kworker/16:1H[402] exited with irqs disabled
[  252.780942] Oops: Oops: 0002 [#2] SMP NOPTI
[  252.781777] CPU: 3 UID: 0 PID: 221 Comm: kworker/u129:2 Tainted: P      D    O        6.17.4-1-pve #1 PREEMPT(voluntary)
[  252.782001] Tainted: [P]=PROPRIETARY_MODULE, [D]=DIE, [O]=OOT_MODULE
[  252.782221] Hardware name: HPE ProLiant DL325 Gen10 Plus v2/ProLiant DL325 Gen10 Plus v2, BIOS A43 08/07/2024
[  252.782448] Workqueue: writeback wb_workfn (flush-8:0)
[  252.782678] RIP: 0010:megasas_build_and_issue_cmd_fusion+0xeaa/0x1870 [megaraid_sas]
[  252.782915] Code: 20 48 89 d1 48 83 e1 fc 83 e2 01 48 0f 45 d9 4c 8b 73 10 44 8b 6b 18 4c 89 f9 4c 8d 79 08 45 85 fa 0f 84 fd 03 00 00 45 29 cc <4c> 89 31 48 83 c0 08 41 83 c0 01 45 29 cd 45 85 e4 7f ab 44 89 c0
[  252.783390] RSP: 0018:ffffd33f80abb300 EFLAGS: 00010206
[  252.783632] RAX: 00000000fedcd000 RBX: ffff8a8d6bcc7a40 RCX: ffffd33f82707000
[  252.783877] RDX: ffffd33f82707008 RSI: ffff8a8d6bcc7908 RDI: 0000000000000000
[  252.784124] RBP: ffffd33f80abb3d0 R08: 0000000000000200 R09: 0000000000001000
[  252.784373] R10: 0000000000000fff R11: 0000000000001000 R12: 0000000000001000
[  252.784622] R13: 0000000000002000 R14: 00000000a8600000 R15: ffffd33f82707008
[  252.784866] FS:  0000000000000000(0000) GS:ffff8aacb7d06000(0000) knlGS:0000000000000000
[  252.785111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  252.785351] CR2: ffffd33f82707000 CR3: 00000001c5fff003 CR4: 0000000000f70ef0
[  252.785591] PKRU: 55555554
[  252.785825] Call Trace:
[  252.786054]  <TASK>
[  252.786281]  megasas_queue_command+0x125/0x1d0 [megaraid_sas]
[  252.786508]  scsi_queue_rq+0x40c/0xcc0
[  252.786742]  blk_mq_dispatch_rq_list+0x124/0x740
[  252.787005]  ? srso_alias_return_thunk+0x5/0xfbef5
[  252.787226]  ? sbitmap_get+0x73/0x180
[  252.787441]  ? sbitmap_get+0x73/0x180
[  252.787649]  __blk_mq_sched_dispatch_requests+0x408/0x600
[  252.787857]  ? srso_alias_return_thunk+0x5/0xfbef5
[  252.788065]  blk_mq_sched_dispatch_requests+0x2d/0x80
[  252.788270]  blk_mq_run_hw_queue+0x2c3/0x330
[  252.788474]  blk_mq_dispatch_list+0x13e/0x460
[  252.788676]  blk_mq_flush_plug_list+0x62/0x1e0
[  252.788876]  ? bdev_count_inflight+0x22/0x50
[  252.789075]  blk_add_rq_to_plug+0xfc/0x1c0
[  252.789272]  blk_mq_submit_bio+0x61f/0x890
[  252.789469]  __submit_bio+0x74/0x290
[  252.789663]  ? srso_alias_return_thunk+0x5/0xfbef5
[  252.789857]  submit_bio_noacct_nocheck+0x28d/0x370
[  252.790050]  submit_bio_noacct+0x19b/0x5b0
[  252.790238]  submit_bio+0xb1/0x110
[  252.790422]  mpage_write_folio+0x538/0x7c0
[  252.790605]  ? srso_alias_return_thunk+0x5/0xfbef5
[  252.790788]  ? mod_memcg_lruvec_state+0xd3/0x1f0
[  252.790973]  mpage_writepages+0x87/0x110
[  252.791152]  ? __pfx_fat_get_block+0x10/0x10
[  252.791333]  fat_writepages+0x15/0x30
[  252.791511]  do_writepages+0xc4/0x180
[  252.791689]  __writeback_single_inode+0x44/0x350
[  252.791867]  writeback_sb_inodes+0x24e/0x550
[  252.792049]  wb_writeback+0x98/0x330
[  252.792222]  wb_workfn+0xb6/0x410
[  252.792389]  ? srso_alias_return_thunk+0x5/0xfbef5
[  252.792553]  process_one_work+0x18b/0x370
[  252.792716]  worker_thread+0x33a/0x480
[  252.792873]  ? __pfx_worker_thread+0x10/0x10
[  252.793024]  kthread+0x10b/0x220
[  252.793171]  ? __pfx_kthread+0x10/0x10
[  252.793316]  ret_from_fork+0x208/0x240
[  252.793459]  ? __pfx_kthread+0x10/0x10
[  252.793601]  ret_from_fork_asm+0x1a/0x30
[  252.793747]  </TASK>
[  252.793885] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter sctp ip6_udp_tunnel udp_tunnel nf_tables 8021q garp mrp bonding tls softdog sunrpc binfmt_misc nfnetlink_log ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl hpilo pcspkr mgag200 ses ccp ee1004 enclosure scsi_transport_sas k10temp input_leds acpi_tad ipmi_si acpi_power_meter acpi_ipmi joydev ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel vhost_net vhost vhost_iotlb tap efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 usbkbd usbmouse hid_generic usbhid hid zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq igb i2c_algo_bit dca ehci_pci xhci_pci ehci_hcd xhci_hcd bnxt_en i2c_piix4 ptdma megaraid_sas i2c_smbus wmi
[  252.795224] CR2: ffffd33f82707000
[  252.795389] ---[ end trace 0000000000000000 ]---
[  252.848262] RIP: 0010:megasas_build_and_issue_cmd_fusion+0xeaa/0x1870 [megaraid_sas]
[  252.848526] Code: 20 48 89 d1 48 83 e1 fc 83 e2 01 48 0f 45 d9 4c 8b 73 10 44 8b 6b 18 4c 89 f9 4c 8d 79 08 45 85 fa 0f 84 fd 03 00 00 45 29 cc <4c> 89 31 48 83 c0 08 41 83 c0 01 45 29 cd 45 85 e4 7f ab 44 89 c0
[  252.848912] RSP: 0018:ffffd33f81687b50 EFLAGS: 00010206
[  252.849109] RAX: 00000000fedcf000 RBX: ffff8a8d6bcebdc0 RCX: ffffd33f82703000
[  252.849308] RDX: ffffd33f82703008 RSI: ffff8a8d6bcebc88 RDI: 0000000000000000
[  252.849507] RBP: ffffd33f81687c20 R08: 0000000000000200 R09: 0000000000001000
[  252.849707] R10: 0000000000000fff R11: 0000000000001000 R12: 0000000000001000
[  252.849908] R13: 0000000000002000 R14: 00000000a8e00000 R15: ffffd33f82703008
[  252.850111] FS:  0000000000000000(0000) GS:ffff8aacb7d06000(0000) knlGS:0000000000000000
[  252.850315] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  252.850519] CR2: ffffd33f82707000 CR3: 00000001c5fff003 CR4: 0000000000f70ef0
[  252.850727] PKRU: 55555554
[  252.850932] note: kworker/u129:2[221] exited with irqs disabled
[  252.851158] ------------[ cut here ]------------
[  252.851367] WARNING: CPU: 3 PID: 221 at kernel/exit.c:898 do_exit+0x7d6/0xa20
[  252.851579] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter sctp ip6_udp_tunnel udp_tunnel nf_tables 8021q garp mrp bonding tls softdog sunrpc binfmt_misc nfnetlink_log ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl hpilo pcspkr mgag200 ses ccp ee1004 enclosure scsi_transport_sas k10temp input_leds acpi_tad ipmi_si acpi_power_meter acpi_ipmi joydev ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel vhost_net vhost vhost_iotlb tap efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 usbkbd usbmouse hid_generic usbhid hid zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq igb i2c_algo_bit dca ehci_pci xhci_pci ehci_hcd xhci_hcd bnxt_en i2c_piix4 ptdma megaraid_sas i2c_smbus wmi
[  252.853557] CPU: 3 UID: 0 PID: 221 Comm: kworker/u129:2 Tainted: P      D    O        6.17.4-1-pve #1 PREEMPT(voluntary)
[  252.853811] Tainted: [P]=PROPRIETARY_MODULE, [D]=DIE, [O]=OOT_MODULE
[  252.854063] Hardware name: HPE ProLiant DL325 Gen10 Plus v2/ProLiant DL325 Gen10 Plus v2, BIOS A43 08/07/2024
[  252.854324] Workqueue: writeback wb_workfn (flush-8:0)
[  252.854584] RIP: 0010:do_exit+0x7d6/0xa20
[  252.854842] Code: 4c 89 ab f0 0a 00 00 48 89 45 c0 48 8b 83 10 0d 00 00 e9 33 fe ff ff 48 8b bb d0 0a 00 00 31 f6 e8 2f e2 ff ff e9 e6 fd ff ff <0f> 0b e9 6d f8 ff ff 4c 89 e6 bf 05 06 00 00 e8 d6 41 01 00 e9 a6
[  252.855381] RSP: 0018:ffffd33f80abbec0 EFLAGS: 00010282
[  252.855655] RAX: 0000000000000286 RBX: ffff8a8d42835400 RCX: 0000000000000000
[  252.855932] RDX: 000000000000270f RSI: 0000000000002710 RDI: 0000000000000009
[  252.856215] RBP: ffffd33f80abbf10 R08: 0000000000000000 R09: 0000000000000000
[  252.856495] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000009
[  252.856775] R13: 0000000000000001 R14: ffff8a8d42835400 R15: 0000000000000000
[  252.857056] FS:  0000000000000000(0000) GS:ffff8aacb7d06000(0000) knlGS:0000000000000000
[  252.857339] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  252.857617] CR2: ffffd33f82707000 CR3: 00000001c5fff003 CR4: 0000000000f70ef0
[  252.857896] PKRU: 55555554
[  252.858190] Call Trace:
[  252.858457]  <TASK>
[  252.858717]  make_task_dead+0x93/0xa0
[  252.858974]  rewind_stack_and_make_dead+0x16/0x20
[  252.859267]  </TASK>
[  252.859516] ---[ end trace 0000000000000000 ]---

This is a list of process affected
Code:
root        5769  0.0  0.0   2688  1872 pts/0    S+   15:07   0:00 /bin/sh /usr/sbin/proxmox-boot-tool refresh
root        5826  0.0  0.0   2688  1884 pts/0    S+   15:07   0:00 /bin/sh /etc/kernel/postinst.d/zz-proxmox-boot
root        5828  0.0  0.0   2688  1880 pts/0    S+   15:07   0:00 /bin/sh /etc/kernel/postinst.d/zz-proxmox-boot
root        5872  0.0  0.0   2688  1112 pts/0    S+   15:07   0:00 /bin/sh /etc/kernel/postinst.d/zz-proxmox-boot
root        5918  0.0  0.0   4404  2328 pts/0    D+   15:07   0:00 umount /var/tmp/espmounts/19AB-AEDE

same problem with kernel 6.17.4, i need to revert to 6.14.11 and all works fine
Seems there's no problem running VMs, the only issue happened with proxmox-boot-tools