Problem with MegaRAID SAS3508 controller

Jan 19, 2026
3
0
1
Hi,
I’m running Proxmox VE 9.1.4 on Debian 13 on multiple Huawei 2288H V5 nodes. The servers use an OEM Broadcom/LSI MegaRAID SAS3508 controller.

Hardware / software details:
  • Server: Huawei 2288H V5
  • RAID controller: Broadcom / LSI SAS3508 (OEM Huawei)
  • RAID firmware: 5.140.00-3319 (Huawei confirmed this firmware is EOL, last supported on Debian 10)
  • Proxmox VE: 9.1.4
  • OS: Debian 13
  • Kernel : 6.17.4-2-pve
  • Driver: megaraid_sas (in-kernel driver from the Linux kernel, no out-of-tree module)
We are seeing random MegaRAID firmware crashes. The controller reports a fatal firmware error, goes into FAULT state and performs an Online Controller Reset (OCR). On one node this caused a full reboot, on others the controller reset and recovered without rebooting the host. All VMs run on shared SAN storage; the local RAID is basically only used for the OS (two ssd's in raid 1), so this doesn’t look like heavy local I/O either.

Key kernel logs:

Code:
megaraid_sas 0000:1c:00.0: Fatal firmware error: Line 169 in fw/raid/utils.c
megaraid_sas 0000:1c:00.0: FW in FAULT state Fault code:0x10000
megaraid_sas 0000:1c:00.0: resetting fusion adapter
megaraid_sas 0000:1c:00.0: Reset successful
megaraid_sas 0000:1c:00.0: Controller encountered an error and was reset
At this point it looks like a compatibility issue between newer Linux kernels and old MegaRAID firmware, not something Proxmox-specific.

Has anyone seen the same SAS3508 + Proxmox (Debian 12/13) behavior? I’m considering rolling back to kernel 6.14.11-5-pve. I’ve also found multiple reports online where stability issues with MegaRAID controllers were mitigated by adding the following kernel parameters via GRUB:

pcie_aspm=off
pci=noaer
megaraid_sas.msix_disable=1

These seem to reduce firmware hangs and unexpected controller resets on older MegaRAID firmware when running newer kernels.
 
Hi Mira,

quick update from our side.

We’ve pinned the kernel to 6.14 on all affected nodes and since then the issue has not reoccurred. The systems have been stable so far.

In the meantime, we also purchased Enterprise repositories for all servers.

Could you please let us know whether this issue is already resolved in the latest 6.17 kernel, or if you currently still recommend staying on 6.14 for setups with SAS3508 / older MegaRAID firmware?

Thanks for the update and your help.
 
We don't have a reproducer. So far we couldn't narrow it down to any of the changes in the kernel.
We're still trying, but so far we recommend staying on kernel 6.14 with that issue.
 
I recently attempted to upgrade my PBS to kernel 6.17.9-1-pve and starting having all kinds of issues. I assumed it was my controller card going bad, but I'm thinking it's an incompatibility issue with the kernel. I have a Supermicro Broadcom SAS 3408 card that locks everything up when I attempt a kernel upgrade.
 
@mira,

Since 6.17 is now the default kernel in Proxmox VE 9.x, could you please clarify how long the 6.14 kernel series is expected to receive updates within the 9.x branch? Will it continue to receive security and stability fixes for a defined period, even though 6.17 is now the default?

In my case, two nodes started resetting the controller after upgrading beyond 6.14, and stability returned after reverting. At the moment, this makes 6.14 the only safe production option for us.

As visible in this thread, other users are also reporting issues with Broadcom-based controllers.

Any guidance on the expected lifecycle of 6.14 would help us plan our next steps.

Thank you.
 
I can also back this. 6.17 has megaraid problems even with a newer hardware.


Code:
Mar 09 13:28:24 pve kernel: CPU: 6 UID: 0 PID: 745 Comm: kworker/6:1H T
Mar 09 13:28:24 pve kernel: BUG: unable to handle page fault for address: ff5c2d2e81ada000
Mar 09 13:28:24 pve kernel: #PF: supervisor write access in kernel mode
Mar 09 13:28:24 pve kernel: #PF: error_code(0x0002) - not-present page
Mar 09 13:28:24 pve kernel: PGD 100000067 P4D 100379067 PUD 10037a067 PMD 108a1e067 PTE 0
Mar 09 13:28:24 pve kernel: Oops: Oops: 0002 [#2] SMP NOPTI
Mar 09 13:28:24 pve kernel: CPU: 6 UID: 0 PID: 745 Comm: kworker/6:1H T


It stops on upgrade if new iso is used:
Code:
Random seed file /var/tmp/espmounts/8C8E-7882/loader/random-seed successfully written
Created EFI boot entry "Linux Boot Manager".
Configuring systemd-boot..
Unmounting '/dev/sdf2'.
Adding '/dev/sdf2' to list of synced ESPs..
Refreshing kernels and initrds..
Running hook script 'proxmox-auto-removal'..
Running hook script 'zz-proxmox-boot'..
Copying and configuring kernels on /dev/disk/by-uuid/8C8D-8EFB
        Copying kernel and creating boot-entry for 6.17.2-1-pve
[  746.109538] megaraid_sas 0000:c1:00.0: [115]waiting for 1 commands to complete fo


Solution (for the moment) is to do the new install via older ISO 9.0.1. with 6.14 on: https://enterprise.proxmox.com/iso/
pin it to latest 6.14.

I evend did:
Code:
cat /etc/kernel/cmdline
pcie_aspm=off pci=noaer megaraid_sas.msix_disable=1

proxmox-boot-tool refresh


+ BIOS now has some presets and switched from undefined to "Virtualization" (because of C states...)