Hello everyone,
I'm experiencing a recurring network failure on a Proxmox VE host using a Broadcom NetXtreme-E 100G NIC (BCM57504).
The NIC stops transmitting, triggers a TX timeout, and then fails to reset correctly. After that, the interface is disabled and vmbr0 loses connectivity until the host is rebooted.
Interestingly, the same host also has another Broadcom NIC (BCM57414 dual 10G/25G) using the same driver and kernel, and it works perfectly without any issues. The problem only happens on the 100G card.
Environment:
- Proxmox VE: 8.1.5
- Kernel: 6.5.13-3-pve
- Hardware: Datacom DM-SV01 server
- Bridge: vmbr0
- Workload: multiple VMs with moderate to high network traffic
Problematic NIC (100G):
- Model: Broadcom BCM57504 NetXtreme-E 100Gb
- PCI ID: 14e4:1751
- Interface: enp33s0np0
- Driver: bnxt_en
- Firmware: 226.0.145.1 / pkg 226.1.107.1
lspci -nn | grep -i ethernet
ethtool -i enp33s0np0
Stable NIC on the same host (no issues):
lspci -nn | grep -i ethernet
ethtool -i enp65s0f0np0
Problem description
After some time under load, the kernel reports:
Once this happens:
Questions to the community
Any feedback, patches, or workarounds would be greatly appreciated.
Thanks in advance for your help.
Regards,
Walisson
I'm experiencing a recurring network failure on a Proxmox VE host using a Broadcom NetXtreme-E 100G NIC (BCM57504).
The NIC stops transmitting, triggers a TX timeout, and then fails to reset correctly. After that, the interface is disabled and vmbr0 loses connectivity until the host is rebooted.
Interestingly, the same host also has another Broadcom NIC (BCM57414 dual 10G/25G) using the same driver and kernel, and it works perfectly without any issues. The problem only happens on the 100G card.
Environment:
- Proxmox VE: 8.1.5
- Kernel: 6.5.13-3-pve
- Hardware: Datacom DM-SV01 server
- Bridge: vmbr0
- Workload: multiple VMs with moderate to high network traffic
Problematic NIC (100G):
- Model: Broadcom BCM57504 NetXtreme-E 100Gb
- PCI ID: 14e4:1751
- Interface: enp33s0np0
- Driver: bnxt_en
- Firmware: 226.0.145.1 / pkg 226.1.107.1
lspci -nn | grep -i ethernet
Code:
21:00.0 Ethernet controller [0200]: Broadcom Inc. BCM57504 NetXtreme-E [14e4:1751] (rev 11)
ethtool -i enp33s0np0
Code:
driver: bnxt_en
version: 6.5.13-3-pve
firmware-version: 226.0.145.1/pkg 226.1.107.1
Stable NIC on the same host (no issues):
- Model: Broadcom BCM57414 NetXtreme-E dual 10G/25G
- PCI IDs: 14e4:16d7
- Interfaces: enp65s0f0np0 / enp65s0f1np1
- Driver: bnxt_en
- Firmware: 214.4.91.1 / pkg 216.0.333.11
lspci -nn | grep -i ethernet
Code:
41:00.0 Ethernet controller [0200]: Broadcom BCM57414 NetXtreme-E [14e4:16d7] (rev 01)
41:00.1 Ethernet controller [0200]: Broadcom BCM57414 NetXtreme-E [14e4:16d7] (rev 01)
ethtool -i enp65s0f0np0
Code:
driver: bnxt_en
version: 6.5.13-3-pve
firmware-version: 214.4.91.1/pkg 216.0.333.11
Problem description
After some time under load, the kernel reports:
Code:
NETDEV WATCHDOG: enp33s0np0 (bnxt_en): transmit queue 0 timed out
bnxt_en: TX timeout detected, starting reset task!
hwrm_ring_free failed
hwrm_ring_alloc failed
bnxt_init_nic err
nic open fail
vmbr0: port enp33s0np0 entered disabled state
Once this happens:
- Network connectivity is lost
- The interface does not recover automatically
- Only a full reboot restores the NIC
What I already verified
- Cable and switch ports are OK
- Happens multiple times, not a one-time event
- No SR-IOV enabled
- Using standard Linux bridge (vmbr0)
- No PCIe errors in dmesg besides the bnxt_en errors
- Only the BCM57504 (100G NIC) is affected
Questions to the community
- Has anyone experienced similar issues with Broadcom BCM57504 or other NetXtreme-E cards on Proxmox 8?
- Is this a known bug with kernel 6.5.x and the bnxt_en driver?
- Would upgrading to kernel 6.8 help?
- Is there a recommended firmware version for this NIC on Proxmox?
- Are there any known driver module parameters or offload settings that improve stability?
Any feedback, patches, or workarounds would be greatly appreciated.
Thanks in advance for your help.
Regards,
Walisson