Problem with MegaRAID SAS3508 controller

Jan 19, 2026
3
0
1
Hi,
I’m running Proxmox VE 9.1.4 on Debian 13 on multiple Huawei 2288H V5 nodes. The servers use an OEM Broadcom/LSI MegaRAID SAS3508 controller.

Hardware / software details:
  • Server: Huawei 2288H V5
  • RAID controller: Broadcom / LSI SAS3508 (OEM Huawei)
  • RAID firmware: 5.140.00-3319 (Huawei confirmed this firmware is EOL, last supported on Debian 10)
  • Proxmox VE: 9.1.4
  • OS: Debian 13
  • Kernel : 6.17.4-2-pve
  • Driver: megaraid_sas (in-kernel driver from the Linux kernel, no out-of-tree module)
We are seeing random MegaRAID firmware crashes. The controller reports a fatal firmware error, goes into FAULT state and performs an Online Controller Reset (OCR). On one node this caused a full reboot, on others the controller reset and recovered without rebooting the host. All VMs run on shared SAN storage; the local RAID is basically only used for the OS (two ssd's in raid 1), so this doesn’t look like heavy local I/O either.

Key kernel logs:

Code:
megaraid_sas 0000:1c:00.0: Fatal firmware error: Line 169 in fw/raid/utils.c
megaraid_sas 0000:1c:00.0: FW in FAULT state Fault code:0x10000
megaraid_sas 0000:1c:00.0: resetting fusion adapter
megaraid_sas 0000:1c:00.0: Reset successful
megaraid_sas 0000:1c:00.0: Controller encountered an error and was reset
At this point it looks like a compatibility issue between newer Linux kernels and old MegaRAID firmware, not something Proxmox-specific.

Has anyone seen the same SAS3508 + Proxmox (Debian 12/13) behavior? I’m considering rolling back to kernel 6.14.11-5-pve. I’ve also found multiple reports online where stability issues with MegaRAID controllers were mitigated by adding the following kernel parameters via GRUB:

pcie_aspm=off
pci=noaer
megaraid_sas.msix_disable=1

These seem to reduce firmware hangs and unexpected controller resets on older MegaRAID firmware when running newer kernels.
 
Hi Mira,

quick update from our side.

We’ve pinned the kernel to 6.14 on all affected nodes and since then the issue has not reoccurred. The systems have been stable so far.

In the meantime, we also purchased Enterprise repositories for all servers.

Could you please let us know whether this issue is already resolved in the latest 6.17 kernel, or if you currently still recommend staying on 6.14 for setups with SAS3508 / older MegaRAID firmware?

Thanks for the update and your help.
 
We don't have a reproducer. So far we couldn't narrow it down to any of the changes in the kernel.
We're still trying, but so far we recommend staying on kernel 6.14 with that issue.
 
I recently attempted to upgrade my PBS to kernel 6.17.9-1-pve and starting having all kinds of issues. I assumed it was my controller card going bad, but I'm thinking it's an incompatibility issue with the kernel. I have a Supermicro Broadcom SAS 3408 card that locks everything up when I attempt a kernel upgrade.
 
@mira,

Since 6.17 is now the default kernel in Proxmox VE 9.x, could you please clarify how long the 6.14 kernel series is expected to receive updates within the 9.x branch? Will it continue to receive security and stability fixes for a defined period, even though 6.17 is now the default?

In my case, two nodes started resetting the controller after upgrading beyond 6.14, and stability returned after reverting. At the moment, this makes 6.14 the only safe production option for us.

As visible in this thread, other users are also reporting issues with Broadcom-based controllers.

Any guidance on the expected lifecycle of 6.14 would help us plan our next steps.

Thank you.
 
I can also back this. 6.17 has megaraid problems even with a newer hardware.


Code:
Mar 09 13:28:24 pve kernel: CPU: 6 UID: 0 PID: 745 Comm: kworker/6:1H T
Mar 09 13:28:24 pve kernel: BUG: unable to handle page fault for address: ff5c2d2e81ada000
Mar 09 13:28:24 pve kernel: #PF: supervisor write access in kernel mode
Mar 09 13:28:24 pve kernel: #PF: error_code(0x0002) - not-present page
Mar 09 13:28:24 pve kernel: PGD 100000067 P4D 100379067 PUD 10037a067 PMD 108a1e067 PTE 0
Mar 09 13:28:24 pve kernel: Oops: Oops: 0002 [#2] SMP NOPTI
Mar 09 13:28:24 pve kernel: CPU: 6 UID: 0 PID: 745 Comm: kworker/6:1H T


It stops on upgrade if new iso is used:
Code:
Random seed file /var/tmp/espmounts/8C8E-7882/loader/random-seed successfully written
Created EFI boot entry "Linux Boot Manager".
Configuring systemd-boot..
Unmounting '/dev/sdf2'.
Adding '/dev/sdf2' to list of synced ESPs..
Refreshing kernels and initrds..
Running hook script 'proxmox-auto-removal'..
Running hook script 'zz-proxmox-boot'..
Copying and configuring kernels on /dev/disk/by-uuid/8C8D-8EFB
        Copying kernel and creating boot-entry for 6.17.2-1-pve
[  746.109538] megaraid_sas 0000:c1:00.0: [115]waiting for 1 commands to complete fo


Solution (for the moment) is to do the new install via older ISO 9.0.1. with 6.14 on: https://enterprise.proxmox.com/iso/
pin it to latest 6.14.

I evend did:
Code:
cat /etc/kernel/cmdline
pcie_aspm=off pci=noaer megaraid_sas.msix_disable=1

proxmox-boot-tool refresh


+ BIOS now has some presets and switched from undefined to "Virtualization" (because of C states...)
 
Another victim.
Recent Supermicro hardware with an Broadcom SAS 3808 iMR.

Megaraid throwing a fit when the proxmox-boot-tool runs during upgrades with new 6.17 kernel and as a result corrupting the ESPs. Ending up in Linux PSOD kernel panics on server boot.
 
We got a test system with a Broadcom / LSI Fusion-MPT SAS38xx we are currently trying to reproduce the issues here, and the other issues we've encountered, on.

Code:
Serial Attached SCSI controller [0107]: Broadcom / LSI Fusion-MPT 12GSAS/PCIe Secure SAS38xx [1000:00e6]
Subsystem: Broadcom / LSI 9500-16i Tri-Mode HBA [1000:4050]
 
Hi.

Another victim here, I've got two Supermicro PBS servers with MegaRAID 9540-2M2 for boot drives with the same problem.

Bricked one, had to reinstall with 4.0-1, pinned kernel 6.14 and did the update / upgrade to 4.1.8 with kernel 6.17 included but unused.

For the second one, pinned kernel 6.8 that was running did the upgrade, installed kernel 6.14 and pinned it.

Seems fine for now but I'm afraid to switch kernels in the future...

Does someone know if it's being worked on ?
 
I’ve never had any problems using the 9500-16i...

If the problem doesn't occur when you don't use IR Mode, isn't that the solution?
 
Last edited:
I’ve never had any problems using the 9500-16i...

If the problem doesn't occur when you don't use IR Mode, isn't that the solution?
Hello, in my case it's not a 9500-16i which is causing the problem.
I'm also using one for datastore storage, but it's the 9540-2M2 PCIe Gen 4.0 Boot Storage Adapter, a Tri-Mode NVMe adapter for my OS storage I'm using as simple disks (JBOD) in ZFS RAID 1 mirror mode.

It's based on SAS3808 I/O controller.

Strangley I didn't notice I/O problems on 9500-16i for now even though I'm backing up 45 VMs hourly with hourly gc, prune, sync to a second pbs and verify.

I guess, it's not that easy to pinpoint the exact source of the problem.
 
Doesn’t this simply mean that there’s an issue with IR-mode controllers that use megaraid_sas, but no problem with IT-mode controllers that use mpt3sas?

*Since the 9500-16i uses the SAS3816, they are essentially equivalent. The differences are likely limited to firmware and drivers, so I don’t think it’s due to PVE.
 
Last edited: