DELL fatal error was detected after Proxmox install

paolone

New Member
May 6, 2024
26
1
3
Hello

I've a DELL PowerEdge T440 updated with last firmware, bios, hd, network avaiable.
I've a DELL Ethernet Quad Port Broadcom 5719.
Installed Proxmox 8.2 at each reboot I've a idrac error: A fatal error was detected on a component at bus 1 device 0 function 0 1 2 3 .
We did some troubleshooting and Bus 1 is the quad port ethernet card.
Removed the quad port ethernet card and remains the onboard Broadcom 5720 dual ethernet card I get the same error: A fatal error was detected on a component at bus 4 device 0 function 0 1 .
Theese errors appears on every reboot. If I shutdown and power on I don't have this error.
I've downgraded firmware of the integrated network card but problem remains.
Re installed the server with Microsoft Windows Server 2022 this error doesn't appear.

Do you know this problem ?
My best regards
 
Seeing if anyone resolved this.

I have having the exact same error on a R440 which is the rack version of the server above. Are there Broadcom specific drivers that need to be loaded in Proxmox to clear the error? I am running Proxmox 8.4.1. Have two identical servers with the exact same error with both running Proxmox. I have another R440 running Windows Server and the error is not there.
 
Doesn't show Proxmox log. Proxmox boots clean. It shows in the iDRAC system logs. See the screenshot I attached for reference. It only occurs during a reboot otherwise the system runs fine. I have updated all the various firmware on the system so everything is current.

Edit - The cards work in Proxmox just fine. I use them for corosync. This only shows on iDRAC and only during a reboot.


Screenshot 2025-05-30 at 08.48.24.png
 
Last edited:
** Update **

So I spoke to a Dell engineer. I provided the log dump from the server and they confirmed the NICs are working correctly and its not a hardware related issue. I also dug through all the logs on the server. The error only started when Proxmox was installed. Previous the server was running Ubuntu 16.04 and looking through the logs there was no error. I also have a R440 running Windows 2016 and the error is not there.

Working with the engineer Dell basically said that Proxmox is not officially supported. They don't know why the error is happening but it is not hardware. I confirmed the cards are working correctly once Proxmox boots. The concur it's likely something with Proxmox, the iDRAC and the NICs not working correctly during the reboot but once it fully posts everything works fine. The "fix" was install a supported OS but they understood that we were running Proxmox.

If someone has something more to add to clear it up but basically it's just the hardware and Proxmox not being fully compatible during the reboot that triggers the error.
 
and if Ubuntu is supported, just install the version that has the same kernel as PVE does and see if you can replicate the problem and then raise the issue with dell.
 
Not wrong, but when it comes to Linux not accurate; its the Kernel that will matter. PVE 8 had 4 different kernels during its lifespan to this point (6.2, 6.5, 6.8, 6.11) Its possible that one or more of these will work, and can be pinned for the duration.
True on that. We are shooting for keeping things as current to release as possible. We are converting over from OpenStack to PVE so want to keep things as current as possible by using the most recent version of the kernel. With confirmation it's just PVE and Dell hardware not being totally in sync it's a problem we can live with.