Problem with MegaRAID SAS3508 controller

admingsi · Jan 19, 2026

Hi,
I’m running Proxmox VE 9.1.4 on Debian 13 on multiple Huawei 2288H V5 nodes. The servers use an OEM Broadcom/LSI MegaRAID SAS3508 controller.

Hardware / software details:

Server: Huawei 2288H V5
RAID controller: Broadcom / LSI SAS3508 (OEM Huawei)
RAID firmware: 5.140.00-3319 (Huawei confirmed this firmware is EOL, last supported on Debian 10)
Proxmox VE: 9.1.4
OS: Debian 13
Kernel : 6.17.4-2-pve
Driver: megaraid_sas (in-kernel driver from the Linux kernel, no out-of-tree module)

We are seeing random MegaRAID firmware crashes. The controller reports a fatal firmware error, goes into FAULT state and performs an Online Controller Reset (OCR). On one node this caused a full reboot, on others the controller reset and recovered without rebooting the host. All VMs run on shared SAN storage; the local RAID is basically only used for the OS (two ssd's in raid 1), so this doesn’t look like heavy local I/O either.

Key kernel logs:

Code:

megaraid_sas 0000:1c:00.0: Fatal firmware error: Line 169 in fw/raid/utils.c
megaraid_sas 0000:1c:00.0: FW in FAULT state Fault code:0x10000
megaraid_sas 0000:1c:00.0: resetting fusion adapter
megaraid_sas 0000:1c:00.0: Reset successful
megaraid_sas 0000:1c:00.0: Controller encountered an error and was reset

At this point it looks like a compatibility issue between newer Linux kernels and old MegaRAID firmware, not something Proxmox-specific.

Has anyone seen the same SAS3508 + Proxmox (Debian 12/13) behavior? I’m considering rolling back to kernel 6.14.11-5-pve. I’ve also found multiple reports online where stability issues with MegaRAID controllers were mitigated by adding the following kernel parameters via GRUB:

pcie_aspm=off
pci=noaer
megaraid_sas.msix_disable=1

These seem to reduce firmware hangs and unexpected controller resets on older MegaRAID firmware when running newer kernels.

mira · Jan 20, 2026

Please try downgrading to kernel 6.14 [0].
We've had a case where this helped, so getting feedback on that would be great.

For now we're investigating the issues internally and together with some other users.

[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysboot_kernel_pin

admingsi · Feb 10, 2026

Hi Mira,

quick update from our side.

We’ve pinned the kernel to 6.14 on all affected nodes and since then the issue has not reoccurred. The systems have been stable so far.

In the meantime, we also purchased Enterprise repositories for all servers.

Could you please let us know whether this issue is already resolved in the latest 6.17 kernel, or if you currently still recommend staying on 6.14 for setups with SAS3508 / older MegaRAID firmware?

Thanks for the update and your help.

mira · Feb 10, 2026

We don't have a reproducer. So far we couldn't narrow it down to any of the changes in the kernel.
We're still trying, but so far we recommend staying on kernel 6.14 with that issue.

ajgreenawalt · Feb 20, 2026

I recently attempted to upgrade my PBS to kernel 6.17.9-1-pve and starting having all kinds of issues. I assumed it was my controller card going bad, but I'm thinking it's an incompatibility issue with the kernel. I have a Supermicro Broadcom SAS 3408 card that locks everything up when I attempt a kernel upgrade.

admingsi · Feb 26, 2026

@mira,

Since 6.17 is now the default kernel in Proxmox VE 9.x, could you please clarify how long the 6.14 kernel series is expected to receive updates within the 9.x branch? Will it continue to receive security and stability fixes for a defined period, even though 6.17 is now the default?

In my case, two nodes started resetting the controller after upgrading beyond 6.14, and stability returned after reverting. At the moment, this makes 6.14 the only safe production option for us.

As visible in this thread, other users are also reporting issues with Broadcom-based controllers.

Any guidance on the expected lifecycle of 6.14 would help us plan our next steps.

Thank you.

mrapajic · Mar 10, 2026

I can also back this. 6.17 has megaraid problems even with a newer hardware.

Code:

Mar 09 13:28:24 pve kernel: CPU: 6 UID: 0 PID: 745 Comm: kworker/6:1H T
Mar 09 13:28:24 pve kernel: BUG: unable to handle page fault for address: ff5c2d2e81ada000
Mar 09 13:28:24 pve kernel: #PF: supervisor write access in kernel mode
Mar 09 13:28:24 pve kernel: #PF: error_code(0x0002) - not-present page
Mar 09 13:28:24 pve kernel: PGD 100000067 P4D 100379067 PUD 10037a067 PMD 108a1e067 PTE 0
Mar 09 13:28:24 pve kernel: Oops: Oops: 0002 [#2] SMP NOPTI
Mar 09 13:28:24 pve kernel: CPU: 6 UID: 0 PID: 745 Comm: kworker/6:1H T

It stops on upgrade if new iso is used:

Code:

Random seed file /var/tmp/espmounts/8C8E-7882/loader/random-seed successfully written
Created EFI boot entry "Linux Boot Manager".
Configuring systemd-boot..
Unmounting '/dev/sdf2'.
Adding '/dev/sdf2' to list of synced ESPs..
Refreshing kernels and initrds..
Running hook script 'proxmox-auto-removal'..
Running hook script 'zz-proxmox-boot'..
Copying and configuring kernels on /dev/disk/by-uuid/8C8D-8EFB
        Copying kernel and creating boot-entry for 6.17.2-1-pve
[  746.109538] megaraid_sas 0000:c1:00.0: [115]waiting for 1 commands to complete fo

Solution (for the moment) is to do the new install via older ISO 9.0.1. with 6.14 on: https://enterprise.proxmox.com/iso/
pin it to latest 6.14.

I evend did:

Code:

cat /etc/kernel/cmdline
pcie_aspm=off pci=noaer megaraid_sas.msix_disable=1

proxmox-boot-tool refresh

+ BIOS now has some presets and switched from undefined to "Virtualization" (because of C states...)

jester · Mar 21, 2026

Another victim.
Recent Supermicro hardware with an Broadcom SAS 3808 iMR.

Megaraid throwing a fit when the proxmox-boot-tool runs during upgrades with new 6.17 kernel and as a result corrupting the ESPs. Ending up in Linux PSOD kernel panics on server boot.

jester · Mar 25, 2026

Bug-report filed: https://bugzilla.proxmox.com/show_bug.cgi?id=7438
Need something to track so we know when we can unpin from 6.14 kernel.

mira · Mar 25, 2026

We got a test system with a Broadcom / LSI Fusion-MPT SAS38xx we are currently trying to reproduce the issues here, and the other issues we've encountered, on.

Code:

Serial Attached SCSI controller [0107]: Broadcom / LSI Fusion-MPT 12GSAS/PCIe Secure SAS38xx [1000:00e6]
Subsystem: Broadcom / LSI 9500-16i Tri-Mode HBA [1000:4050]

odobrev · Apr 24, 2026

Hi.

Another victim here, I've got two Supermicro PBS servers with MegaRAID 9540-2M2 for boot drives with the same problem.

Bricked one, had to reinstall with 4.0-1, pinned kernel 6.14 and did the update / upgrade to 4.1.8 with kernel 6.17 included but unused.

For the second one, pinned kernel 6.8 that was running did the upgrade, installed kernel 6.14 and pinned it.

Seems fine for now but I'm afraid to switch kernels in the future...

Does someone know if it's being worked on ?

uzumo · Apr 24, 2026

I’ve never had any problems using the 9500-16i...

If the problem doesn't occur when you don't use IR Mode, isn't that the solution?

odobrev · Apr 24, 2026

uzumo said:
I’ve never had any problems using the 9500-16i...

If the problem doesn't occur when you don't use IR Mode, isn't that the solution?

Hello, in my case it's not a 9500-16i which is causing the problem.
I'm also using one for datastore storage, but it's the 9540-2M2 PCIe Gen 4.0 Boot Storage Adapter, a Tri-Mode NVMe adapter for my OS storage I'm using as simple disks (JBOD) in ZFS RAID 1 mirror mode.

It's based on SAS3808 I/O controller.

Strangley I didn't notice I/O problems on 9500-16i for now even though I'm backing up 45 VMs hourly with hourly gc, prune, sync to a second pbs and verify.

I guess, it's not that easy to pinpoint the exact source of the problem.

uzumo · Apr 24, 2026

Doesn’t this simply mean that there’s an issue with IR-mode controllers that use megaraid_sas, but no problem with IT-mode controllers that use mpt3sas?

*Since the 9500-16i uses the SAS3816, they are essentially equivalent. The differences are likely limited to firmware and drivers, so I don’t think it’s due to PVE.

vmwombat · Apr 30, 2026

Hey. My Thread to a similar problem with 6.17 kernel in the german forum: https://forum.proxmox.com/threads/i...-mit-pdual-cp300-bleibt-bei-99-hängen.183151/

waltar · Apr 30, 2026

Why not updating ctrl firmware first ?!

odobrev · Apr 30, 2026

waltar said:
Why not updating ctrl firmware first ?!

Not sure for others but I did via storcli64.

Did also update BMC and BIOS to the latest available versions :

Didn't change anything.

odobrev · Apr 30, 2026

Actually it's a known issue since Dec 24 2025...
=> https://forum.proxmox.com/threads/kernel-6-17-bug-with-megaraid-sas-hpe-mr416.178370/

It's the same issue type.

They also speak about memory leak with network card drivers, I guess they're talking about the bug on some Intel interfaces when you need to disable lldp on the network card itself if you want to use kernel 6.14...

waltar · Apr 30, 2026

What about the available test-kernel 7.0.x ?

odobrev · May 5, 2026

waltar said:
What about the available test-kernel 7.0.x ?

So I did try with kernel 7.0.0-3-pve included in PBS 4.2.0 (not the test one but did upgrade properly).

Still the same problem, crashed on the same steps, corrupted efi.

Had to chroot from a usb rescue system and restore uefi, pin 6.14 again and everything went back to normal so for my case at least it's still not solved.
Also had to reformat both uefi of the pool and refresh, clean boot entries...

The USB stick was booted on kernel 7.0.0-3-pve on ext4 (external HDD) and I had to install kernel 6.14 to repair UEFI or it would crash there too.

Problem with MegaRAID SAS3508 controller

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

Active Member

New Member

Active Member

Renowned Member

Renowned Member

Proxmox Staff Member

Member

Well-Known Member

Member

Well-Known Member

Member

Famous Member

Member

Attachments

Member

Famous Member

Member

We value your privacy