Problem with MegaRAID SAS3508 controller

mira · May 6, 2026

So we've tried to reproduce it on a test system provided to us, but couldn't so far.

Could you detail your setups, the workload and any steps necessary to trigger this?

uzumo · May 6, 2026

mira said:
We got a test system with a Broadcom / LSI Fusion-MPT SAS38xx we are currently trying to reproduce the issues here, and the other issues we've encountered, on.

Code:

Serial Attached SCSI controller [0107]: Broadcom / LSI Fusion-MPT 12GSAS/PCIe Secure SAS38xx [1000:00e6] Subsystem: Broadcom / LSI 9500-16i Tri-Mode HBA [1000:4050]

mira said:
So we've tried to reproduce it on a test system provided to us, but couldn't so far.

Could you detail your setups, the workload and any steps necessary to trigger this?

Since we have confirmed that no issues occur with the LSI 9500-16i Tri-Mode HBA, I believe testing with the SAS3808 (LSI 9500-16i) or SAS3816 (LSI 9500-8i) operating in IT mode would not be meaningful.

* I use the LSI 9500-16i and LSI 9400-16i, and I have never had any problems with them.

Although they are the same SAS3808 and SAS3816 models, I believe testing will not be effective unless they are the iMR 9540-16i and 9540-8i versions.

The driver they use that causes the problem always appears to be `megaraid_sas`.

* Since I don't have these devices myself, this is based on what I observed in their logs.
* The LSI 9500-16i Tri-Mode HBA is an mpt3sas.

mira · May 7, 2026

uzumo said:
Since we have confirmed that no issues occur with the LSI 9500-16i Tri-Mode HBA, I believe testing with the SAS3808 (LSI 9500-16i) or SAS3816 (LSI 9500-8i) operating in IT mode would not be meaningful.

* I use the LSI 9500-16i and LSI 9400-16i, and I have never had any problems with them.

Although they are the same SAS3808 and SAS3816 models, I believe testing will not be effective unless they are the iMR 9540-16i and 9540-8i versions.

The driver they use that causes the problem always appears to be `megaraid_sas`.

* Since I don't have these devices myself, this is based on what I observed in their logs.
* The LSI 9500-16i Tri-Mode HBA is an mpt3sas.

The testsystem in question has a Broadcom MegaRAID 9540-8i. It is one of the affected controllers.

As mentioned, we weren't able to trigger the issues yet, so please provide details about the setups (including controller firmware and connected disks + firmware), the RAID configuration, filesystems/usage and the steps that usually trigger it.

jester · May 12, 2026

For anyone that pinned on the 6.14 the kernel:
the 6.14.11-8-pve kernel (with Dirty Frag fix) seems to also be running without MegaRAID issues so far.

bpedrant · May 15, 2026

Hello,

I have an affected system also: Supermicro MB, model #: H11DSi rev 2.x
BIOS: v3.5
Firmware: v1.52.23

MegaRAID 9660-16i
- Firmware: v8.17.1 (but also had issues on an older version)
- I tried the Proxmox-delivered driver and updated to the Broadcom driver: v8.17.1 (verified that the new driver was in use via "modinfo".

The controller has 4, 3.8TB Micron NVMe drives connected.
(This worked for the past year on ESXi v7 perfectly, so I know the controller and drives are working in this server.)

Proxmox v9.1.9, Enterprise repo. Fully patched.

I have tried configuring the drives as "JBOD", using ZFS raidz. Fails after any data migration.
Now, I have it configured back to hardware RAID5 and LVM on Proxmox.

I can reproduce the error every time via a simple VM clone operation. I get about 40GB copied, and the controller basically shuts down.

I also tried the kernel parameters "iommu=pt" and "amd_iommu=on". THIS IS WEIRD. Before the parameters, the controller would die after copying 47GB. Now, with the parameters, is has a long 2-3 minute pause, then continues for another 40-50GB, rinse, repeat. Nothing in dmesg this time.

This is a long running and hard to detect/fix issue. What are my other options? I read the downgrading the kernel to an older version helps, but I do not know the exact steps for that.

-Brian

waltar · May 16, 2026

Did you install storcli already ?
storcli /call show # show number and model of controllers, first is 0, second is 1
In cmd set "x" to your controller number and try the available profiles, a profile change need a controller restart !
storcli /cx show profile
storcli /cx set profile profileid=<value> ; storcli /cx restart

bpedrant · May 18, 2026

Hello, since I have a 9600 series Tri-mode adapter, I am using the storcli2 utility.
There is no "profile" command (it was removed for some reason).

But, thanks for turning me to digging into the options of storcli2!

I ran './storcli2 /c0 show events', and saw many 'consistency' errors that were being corrected during an initialization of the single VG.

Then, a few hours later, the controller was logging hi-temp errors. Hmmm. The server case is huge, many fans, no load on the controller or host. Seems like I need to dig into what is happening there.

-Brian

odobrev · May 19, 2026

mira said:
So we've tried to reproduce it on a test system provided to us, but couldn't so far.

Could you detail your setups, the workload and any steps necessary to trigger this?

In my case :
- 2 x Supermicro SuperServer 2U 2014S-TR

For each system :
- EPYC 7313 DP/UP
- 8 x 32 GB of Registered DDR4 3200 ECC (brand supermicro)
- 2 x 400 GB of Micron 7450 MAX on Broadcom MR 9540-2M2 JBOD mode as plain disks for RAID 1 ZFS
- 8 x Samsung PM897 3.84TB on Broadcom HBA 9500-16i
- Supermicro AOC-STGF-I2S-O dual port 10 GbE SFP+

As for motherboard firmware :

Firmware Version	01.08.01
Firmware Build Time	12/17/2025
Redfish Version	1.21.1
BIOS Firmware Version	BIOS Date: 12/17/2025 Ver 3.6
CPLD Version	F0.A6.44

Controller specs and firmware :

I use "2 x 400 GB of Micron 7450 MAX on Broadcom MR 9540-2M2 JBOD mode as plain disks for RAID 1 ZFS" for my boot FS where Proxmox Backup server currently version 4.2.0 with pinned kernel 6.14.11-7-pve (2026-04-30T09:27Z).

There is not much "workload" on these disks as they are boot drives and "8 x Samsung PM897 3.84TB on Broadcom HBA 9500-16i" combo seems to be working fine (even when I was using kernel 6.17 or 7.0.0-3-pve).

Steps to reproduce :
> Case 1 : fresh installation
- Download latest proxmox ISO with kernel 6.17 or 7 ;
- Write ISO to USB Stick ;
- Boot from USB Stick ;
- Go until the end of installation process ;
=> 50% of the time, process freezes on writing EFI ;

> Case 2 : OS upgrade from kernel >= 6.17
- Install system with kernel 6.17 ;
- Try upgrading system to PBS 4.2.X with kernel 7 ;
=> System crashes 100% of the time with similar errors to : https://forum.proxmox.com/threads/problem-with-megaraid-sas3508-controller.179378/post-842596

What's strange is that sometimes writing works for one disk but not the other, sometimes it crashes on the first one (but mostly when writing second one).

mira · May 19, 2026

We have a setup with JBOD and Ceph OSDs on top where we can reliably crash it.
At the moment @dherzig is in the process of bisecting since 7.0 doesn't seem affected in our tests.

In our tests we have KIOXIA disks attached to it, the same ones we saw the mpt3sas issues with.

NIKOLYA · May 29, 2026

I would like to confirm that I am seeing what appears to be the same issue on my system.

Hardware:

Controller: Broadcom / LSI MegaRAID SAS-3 3008 [Fury]
PCI ID: 1000:005f
Subsystem: 1000:9341
The controller is currently configured in JBOD mode
Two disks are connected directly to onboard SATA and are visible
Two disks are connected through the MegaRAID controller and are not visible in Proxmox

System:

proxmox-ve: 9.2.0
pve-manager: 9.2.2
kernel: 7.0.2-6-pve

lspci output:

05:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS-3 3008 [Fury] [1000:005f] (rev 02)
Subsystem: Broadcom / LSI Device [1000:9341]
Kernel modules: megaraid_sas

There is no Kernel driver in use: megaraid_sas line shown for the controller.

Relevant dmesg messages:

megaraid_sas 0000:05:00.0: FW now in Ready state
megaraid_sas 0000:05:00.0: controller type : iMR(0MB)
megaraid_sas 0000:05:00.0: Secure JBOD support : Yes
megaraid_sas 0000:05:00.0: JBOD sequence map support : Yes
megaraid_sas 0000:05:00.0: megasas_get_ld_map_info DCMD timed out, RAID map is disabled
megaraid_sas 0000:05:00.0: DCMD(opcode: 0x200e102) is timed out, func:megasas_issue_blocked_cmd
megaraid_sas 0000:05:00.0: megasas_sync_pd_seq_num DCMD timed out, continue without JBOD sequence map
megaraid_sas 0000:05:00.0: DCMD(opcode: 0x2010100) is timed out, func:megasas_issue_blocked_cmd
megaraid_sas 0000:05:00.0: Ignore DCMD timeout: megasas_get_pd_list
megaraid_sas 0000:05:00.0: DCMD(opcode: 0x3010100) is timed out, func:megasas_issue_blocked_cmd
megaraid_sas 0000:05:00.0: Ignore DCMD timeout: megasas_ld_list_query
megaraid_sas 0000:05:00.0: failed to get LD list
megaraid_sas 0000:05:00.0: megasas_init_fw: megasas_get_device_list failed
megaraid_sas 0000:05:00.0: Failed from megasas_init_fw

The controller itself is detected by PCI, but the disks behind it do not appear in lsblk.

I have not yet tested with an older Proxmox kernel, but this looks similar to the issue reported here with MegaRAID SAS-3 3008 / JBOD disks not being detected on newer kernels.

eddor · Jun 8, 2026

Hello.
I wanted to mention, since we had a very similar issue.
We have many huawei 1288H v5 servers, and many having IO issues.
I followed this thread because it looked like our issue.
But at the end it was a faulty Firmware on Micron SSDs
We updated around 30+ disks and the problem went away.
Just in case anyone encounters this.

NIKOLYA · Jun 15, 2026

The solution itself is simple, I flashed the controller in IT-mode.
https://github.com/EverLand1/9300-8i_IT-Mode
hard drives determined, zraid, production.

admingsi · Jul 9, 2026

@mira Any updates? Are we stuck on kernel 6.14.11-9-pve forever?

mira · Jul 9, 2026

So far our reproducer was fixed by kernel 7.0. Some people mentioned it fixed it in their situation.
We are currently trying to use the RAID controller for the root fs, to see if we can somehow reproduce an issue there.

In some cases reducing the max_sectors_kb helped, similar to the workaround for mpt3sas before a fix was provided:

Code:

cat /sys/block/sdX/queue/max_sectors_kb

But to be sure, we would first need a reproducer ourselves that's not fixed by kernel 7.0.

jester · Jul 10, 2026

Tested 7.0.14-4-pve kernel today on our hardware, same problem.

jester · 2026-07-24T17:12:17+0200

Flashed one of our production servers to IT mode (Supermicro SAS 3808 iMR + 2x 400GB Micron 7450) ...same problem.

Problem with MegaRAID SAS3508 controller

mira

Proxmox Staff Member

uzumo

Well-Known Member

Attachments

mira

Proxmox Staff Member

jester

Renowned Member

bpedrant

New Member

waltar

Famous Member

bpedrant

New Member

odobrev

Member

mira

Proxmox Staff Member

NIKOLYA

Member

eddor

Member

NIKOLYA

Member

admingsi

New Member

mira

Proxmox Staff Member

jester

Renowned Member

jester

Renowned Member

We value your privacy