DELL fatal error was detected after Proxmox install

paolone

New Member
May 6, 2024
27
1
3
Hello

I've a DELL PowerEdge T440 updated with last firmware, bios, hd, network avaiable.
I've a DELL Ethernet Quad Port Broadcom 5719.
Installed Proxmox 8.2 at each reboot I've a idrac error: A fatal error was detected on a component at bus 1 device 0 function 0 1 2 3 .
We did some troubleshooting and Bus 1 is the quad port ethernet card.
Removed the quad port ethernet card and remains the onboard Broadcom 5720 dual ethernet card I get the same error: A fatal error was detected on a component at bus 4 device 0 function 0 1 .
Theese errors appears on every reboot. If I shutdown and power on I don't have this error.
I've downgraded firmware of the integrated network card but problem remains.
Re installed the server with Microsoft Windows Server 2022 this error doesn't appear.

Do you know this problem ?
My best regards
 
Seeing if anyone resolved this.

I have having the exact same error on a R440 which is the rack version of the server above. Are there Broadcom specific drivers that need to be loaded in Proxmox to clear the error? I am running Proxmox 8.4.1. Have two identical servers with the exact same error with both running Proxmox. I have another R440 running Windows Server and the error is not there.
 
Doesn't show Proxmox log. Proxmox boots clean. It shows in the iDRAC system logs. See the screenshot I attached for reference. It only occurs during a reboot otherwise the system runs fine. I have updated all the various firmware on the system so everything is current.

Edit - The cards work in Proxmox just fine. I use them for corosync. This only shows on iDRAC and only during a reboot.


Screenshot 2025-05-30 at 08.48.24.png
 
Last edited:
** Update **

So I spoke to a Dell engineer. I provided the log dump from the server and they confirmed the NICs are working correctly and its not a hardware related issue. I also dug through all the logs on the server. The error only started when Proxmox was installed. Previous the server was running Ubuntu 16.04 and looking through the logs there was no error. I also have a R440 running Windows 2016 and the error is not there.

Working with the engineer Dell basically said that Proxmox is not officially supported. They don't know why the error is happening but it is not hardware. I confirmed the cards are working correctly once Proxmox boots. The concur it's likely something with Proxmox, the iDRAC and the NICs not working correctly during the reboot but once it fully posts everything works fine. The "fix" was install a supported OS but they understood that we were running Proxmox.

If someone has something more to add to clear it up but basically it's just the hardware and Proxmox not being fully compatible during the reboot that triggers the error.
 
and if Ubuntu is supported, just install the version that has the same kernel as PVE does and see if you can replicate the problem and then raise the issue with dell.
 
Not wrong, but when it comes to Linux not accurate; its the Kernel that will matter. PVE 8 had 4 different kernels during its lifespan to this point (6.2, 6.5, 6.8, 6.11) Its possible that one or more of these will work, and can be pinned for the duration.
True on that. We are shooting for keeping things as current to release as possible. We are converting over from OpenStack to PVE so want to keep things as current as possible by using the most recent version of the kernel. With confirmation it's just PVE and Dell hardware not being totally in sync it's a problem we can live with.
 
Was it resolved?
I got the same problem, finally upgraded one of the Dell 740XD from pve 7 to pve 8 (Linux pve-18 6.8.12-13-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-13 (2025-07-22T10:00Z) x86_64 GNU/Linux) and every reboot:

Screen lost. Server stuck.
Code:
Event Message: A fatal error was detected on a component at bus 1 device 0 function 1.

The bus device, It's this?
Code:
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
 
Last edited:
Hello

no news from Proxmox !
Fortunately, I’m not a visionary, and someone else has the same problem as me.
 
Hi, i have same problem with Dell R540 and NetXtreme BCM5720 Gigabit Ethernet PCIe.

This is not a Proxmox problem. Pure Debian 13 causes the same error on reboot.

Disable this network adapter in bios resolve problem for me.
 
Last edited:
The OS and iDRAC report a bus fatal error on BCM57XX cards when reboot is issued. This occurs because the driver accesses the device after it is powered off during shutdown. The message can be ignored.

Solution: Fixed in Ubuntu 20.04.4 SRU kernel linux-5.4.0-110.124.

Affected systems: All Dell EMC PowerEdge servers with BCM57XX NIC.

https://www.dell.com/support/manual...6a14cb-db3f-4ea7-915f-d451ce9dd6ba&lang=en-us
 
Last edited:
Hello people, I've got the same issue.

I am running Proxmox 9.I've tried other kernels in the hope that it would work, but I could not find a workaround...yet.
Any feedback is appreciated ! thank you ! :)
 
Hello evereyone,
The same thing bring me here

A fatal error was detected on a component at bus 4 device 0 function 1.

proxmox-ve: 8.4.0 (running kernel: 6.8.12-15-pve)
DellT440


Write here just get notified when someone had something to say : ))
 
Last edited:
Hello,

After installing Ubuntu 20.04.6 LTS (GNU/Linux 5.4.0-216-generic x86_64) and upgrading it to Ubuntu 22.04.5 LTS (GNU/Linux 5.15.0-153-generic x86_64), a Dell T440 started rebooting unexpectedly. Another update to Ubuntu 22.04.5 LTS (GNU/Linux 5.15.0-153-generic x86_64) caused the error "Screen lost and server stuck again". I then restored the kernel from the previous Ubuntu release:

5.15.0-153-generic #163-Ubuntu SMP Thu Aug 7 16:37:18 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
and the server was able to reboot correctly again.

While searching for fixes related to tg3, I came across this patch:

https://github.com/AnggaR96s/stable...down-device-only-on-system_power_off.patch#L4
I then installed Debian Trixie (this is my second home :D) and started experimenting.
I removed the following fragment:
/* ================================= ??? REMOVE THIS
if (system_state == SYSTEM_POWER_OFF)
// tg3_power_down(tp);
return;

else if (system_state == SYSTEM_RESTART &&
dmi_first_match(tg3_restart_aer_quirk_table) &&
pdev->current_state != PCI_D3cold &&
pdev->current_state != PCI_UNKNOWN) {
// Disable PCIe AER on the tg3 to avoid a fatal
// error during this system restart.
//
pcie_capability_clear_word(pdev, PCI_EXP_DEVCTL,
PCI_EXP_DEVCTL_CERE |
PCI_EXP_DEVCTL_NFERE |
PCI_EXP_DEVCTL_FERE |
PCI_EXP_DEVCTL_URRE);
}
================================= REMOVE THIS ??? */

Now the function looks like this:

static void tg3_shutdown(struct pci_dev *pdev)
{
struct net_device *dev = pci_get_drvdata(pdev);
struct tg3 *tp = netdev_priv(dev);

tg3_reset_task_cancel(tp);

rtnl_lock();

netif_device_detach(dev);

if (netif_running(dev))
dev_close(dev);

tg3_power_down(tp);

rtnl_unlock();

pci_disable_device(pdev);
}

After compiling the kernel, the Dell R440 server reboots without any problem.
It looks like something was modernized and the patch in the driver stopped working (see previous commits to the driver).
Since I don’t feel confident submitting kernel patches myself – please take a look at this (together with Debian colleagues).

Best regards,
Andrzej
 
  • Like
Reactions: waltar
I have the exact same error. First it said:
A fatal error was detected on a component at bus 1 device 0 function 1.
So I simply removed the NIC card from slot1, but then it started doing it with the RAID controller:

A fatal error was detected on a component at bus 4 device 0 function 1.
Dunno what to do.
 
Here is the list of firmware in my system. Unfortunately, the issue still persists.

Power Supply.Slot.1 00.1B.53
Power Supply.Slot.2 00.1B.53
PERC H730P Adapter 25.5.9.0001
OS COLLECTOR, v6.0, A00 6.0
Internal Dual SD Module 2.0
Backplane 1 4.35
Dell iDRAC Service Module Embedded Package v5.2.0.0, A00 5.2.0.0
Broadcom Gigabit Ethernet BCM5720 - 22.31.6
Broadcom Gigabit Ethernet BCM5720 - 22.31.6
BIOS 2.24.0
Dell OS Driver Pack, 22.12.06, A00 22.12.06
Integrated Dell Remote Access Controller 7.00.00.182
Dell 64 Bit uEFI Diagnostics, version 4301, 4301A73, 4301.74 4301A73
System CPLD 1.0.1
Lifecycle Controller 7.00.00.182

A fatal error was detected on a component at bus 4 device 0 function 1.

======================

Power Supply.Slot.1 00.1B.53
Power Supply.Slot.2 00.1B.53
PERC H730P Adapter 25.5.9.0001
OS COLLECTOR, v6.0, A00 6.0
Internal Dual SD Module 2.0
Backplane 1 4.35
Dell iDRAC Service Module Embedded Package v5.2.0.0, A00 5.2.0.0
Broadcom Gigabit Ethernet BCM5720 - 23.31.0
Broadcom Gigabit Ethernet BCM5720 - 23.31.0
BIOS 2.24.0
Dell OS Driver Pack, 22.12.06, A00 22.12.06
Integrated Dell Remote Access Controller 7.00.00.182
Dell 64 Bit uEFI Diagnostics, version 4301, 4301A73, 4301.74 4301A73
System CPLD 1.0.1
Lifecycle Controller 7.00.00.182

A fatal error was detected on a component at bus 4 device 0 function 1.

======================

Linux version 6.14.11-2-pve (build@proxmox) (gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC PMX 6.14.11-2 (2025-09-12T09:46Z) ()