HPE MR408i Raid Controller crashes with Proxmox

Necaro · Jul 31, 2025

Dear Proxmox-Community,

I'm relatively new to Proxmox and was working with VMware primarily before.

I have a new Host with the following Specs:

HPE ProLiant DL360 Gen11 SFF
2x Intel Xeon-Gold 5515+ (3.2Ghz/8-Core/165W)
8x HPE 32GB Dual Rank DDR5-4800 Registered Memory
1x HPE MR408i-o Gen11 SPDM Storage Controller
1x HPE 96W Smart Storage Battery with 145mm Cable
4x HPE 1.92TB SAS 12G Read Intensive SFF (2.5in) SSD
1x HPE NS204i-u Gen11 Hot Plug Boot Device
1x Broadcom BCM5719 1Gb 4-port BASE-T Adapter
1x HPE Ethernet 10Gb 2-port BASE-T BCM57416 Adapter

Proxmox is installed on the NS204i boot device. The SSDs are configured as a Raid 5 with the MR408i raid controller and with LVM-Thin which serves as the datastore.
VMs were migrated/imported from VMware which worked without problems.

My problem is that 1-2 times a day the raid controller seems to crash. During that time all VMs and the host stop responding and after a few minutes resume normal operation. Firmware of all hardware devices is up to date und also the Proxmox installation is up to date. I suspect that this happens when there is a spike in I/O operations.

I attached a screenshot of the Proxmox information and of the CPU usage graph during such a crash. One can see the gap in the graph where the host hangs.
I also attached a log snippet from when the crash happens. iLO also logs th crash with the following event:
EVENT (31-Jul-2025 08:00): ControllerPreviousError (Slot=14, 0x7f833119) Redfish event from /redfish/v1/Systems/1/Storage/DE00B000/Controllers/0

Except from the crashes everything runs fast and normally.

Has anyone an idea what could cause this?
I would be grateful for any tips as I am out of ideas at the moment.
If I can provide more useful information, please let me know.

P.S.: Ditching the raid controller or configuring it with pass through for ZFS Raid is sadly not an option atm.

Best regards,
Alex

LnxBil · Jul 31, 2025

Sadly, there is nothing anyone here can do about that:

Code:

Jul 31 10:00:13 prox1 kernel: megaraid_sas 0000:3b:00.0: 3259 (807271183s/0x0020/DEAD) - Fatal firmware error: Line 411 in fw\hw\debug\SnapDumpHelper.c
Jul 31 10:00:13 prox1 kernel: megaraid_sas 0000:3b:00.0: 3263 (807271191s/0x0020/CRIT) - Controller encountered an error and was reset

You may contact HPE support and send the dmesg output to them for further diagnostic.

Necaro · Jul 31, 2025

Thanks for your reply.
Yeah I did see those messages. I was wondering if perhaps some raid controller settings or anything could cause these crashes in combination with Proxmox.
Perhaps someone has a similar constellation running without problems and can share his raid (controller) settings.

I will also try contacting HPE Support but I'm not sure if they are willing to help as afaik Promox is not officially supported by HPE.

Explisit · Aug 1, 2025

Are you running the latest firmware for this controller and SPP for this server ?

I'm about to pull the trigger and purchase 9 servers with the mentioned boot controller, but now I'm a bit hesitant.

Please let us know if HPE support are coming back you with solution.

I have searched for similar errors , and it might worth a try to test the following:

1: quiet pcie_aspm=off
2: Disable legacy boot, Use UEFI boot only.

Also are there any errors in the AHS or IML logs ?

emunt6 · Aug 7, 2025

Hi!

Firmware problem, use HPE SPP DVD to upgrade.

Code:

https://downloadmirror.intel.com/776844/35xx_MR_iMR_FWPKG-51.23.0-4637_Release_notes.txt

DCSG01177464    iMR: Fatal firmware error: Line 379 in fw\hw\debug\SnapDumpHelper.c detected after converted SATA SSD PD to JBOD

Necaro · Aug 8, 2025

emunt6 said:

Hi!

Firmware problem, use HPE SPP DVD to upgrade.

Code:

https://downloadmirror.intel.com/776844/35xx_MR_iMR_FWPKG-51.23.0-4637_Release_notes.txt

DCSG01177464    iMR: Fatal firmware error: Line 379 in fw\hw\debug\SnapDumpHelper.c detected after converted SATA SSD PD to JBOD

Thanks for the tip, Raid Controller firmware was my first guess as well but I have already the newest version from HPE (52.32.3-6118) installed.

Necaro · Aug 8, 2025

Explisit said:
Are you running the latest firmware for this controller and SPP for this server ?

I'm about to pull the trigger and purchase 9 servers with the mentioned boot controller, but now I'm a bit hesitant.

Please let us know if HPE support are coming back you with solution.

I have searched for similar errors , and it might worth a try to test the following:

1: quiet pcie_aspm=off
2: Disable legacy boot, Use UEFI boot only.

Also are there any errors in the AHS or IML logs ?

Hey,

Are you running the latest firmware for this controller and SPP for this server ?
>> Yeah I have all latest available firmware from HPE installed.

I'm about to pull the trigger and purchase 9 servers with the mentioned boot controller, but now I'm a bit hesitant.
>> The boot controller (NS204i) is not the problem I guess. It's the MR408i raid controller which is used for VM storage, which is the problem and crashes.

Also are there any errors in the AHS or IML logs ?
>> No, only the one error I posted in my first post gets logged in IML.

Please let us know if HPE support are coming back you with solution.
>> I'm on holiday the next two weeks and will open a case with HPE after that. If that doesn't help I might also open a paid Proxmox ticket and see if they have any ideas.

I have searched for similar errors , and it might worth a try to test the following:

1: quiet pcie_aspm=off
2: Disable legacy boot, Use UEFI boot only.
>> The first might be worth a try. I only use UEFI already.

I observed the behaviour a bit more. As soon as there is a bit more load on the storage read/writes get slower and slower until finally the controller crashes. After that everthing is smooth and fast again until it slowly gets worse again until the next crash.

Suprisingly Veeam can backup the VMs at night without problems at full speed (~120MB/s, only Gigabit conenction to backup storage).
I'm all ears for more suggestions to try after my holiday

Beste regards,
Alex

ratboy4 · Sep 4, 2025

Interested as well as I am looking at this server as well. any update on the situation. Has it gotten any better?

Necaro · Oct 17, 2025

Just a last Update incase someone is still watching this thread.
We didn't find a solution for the problem and sadly had to switch back to VMware.

With ESXi 8 everything is running smooth again.

DomTou · Nov 5, 2025

Hello every
we face on the same situation with a HPE MR408i Raid Controller
Any ideas for a solution?

cwitsup · Dec 10, 2025

Same Problem here with a HPE MR416i-o Gen11 - latest SPP.
pcie_aspm=off did not fix the Problem.

DomTou · Dec 10, 2025

Good news : proxmox V9 with its new debian 13 linux kernel solved the trick. I performed the migration by following the well-documented procedure.

HPE MR408i Raid Controller crashes with Proxmox

Necaro

New Member

Attachments

LnxBil

Distinguished Member

Necaro

New Member

Explisit

New Member

emunt6

Active Member

Necaro

New Member

Necaro

New Member

ratboy4

New Member

Necaro

New Member

DomTou

New Member

cwitsup

New Member

DomTou

New Member

We value your privacy