Broadcom BCM57504 (100G) bnxt_en TX timeout and NIC reset on Proxmox 8.1.5 — while BCM57414 (25G) works fine on same host

walissongois

New Member
Jan 26, 2026
1
1
1
Hello everyone,

I'm experiencing a recurring network failure on a Proxmox VE host using a Broadcom NetXtreme-E 100G NIC (BCM57504).
The NIC stops transmitting, triggers a TX timeout, and then fails to reset correctly. After that, the interface is disabled and vmbr0 loses connectivity until the host is rebooted.
Interestingly, the same host also has another Broadcom NIC (BCM57414 dual 10G/25G) using the same driver and kernel, and it works perfectly without any issues. The problem only happens on the 100G card.

Environment:
- Proxmox VE: 8.1.5
- Kernel: 6.5.13-3-pve
- Hardware: Datacom DM-SV01 server
- Bridge: vmbr0
- Workload: multiple VMs with moderate to high network traffic

Problematic NIC (100G):
- Model: Broadcom BCM57504 NetXtreme-E 100Gb
- PCI ID: 14e4:1751
- Interface: enp33s0np0
- Driver: bnxt_en
- Firmware: 226.0.145.1 / pkg 226.1.107.1

lspci -nn | grep -i ethernet
Code:
21:00.0 Ethernet controller [0200]: Broadcom Inc. BCM57504 NetXtreme-E [14e4:1751] (rev 11)

ethtool -i enp33s0np0
Code:
driver: bnxt_en
version: 6.5.13-3-pve
firmware-version: 226.0.145.1/pkg 226.1.107.1


Stable NIC on the same host (no issues):
  • Model: Broadcom BCM57414 NetXtreme-E dual 10G/25G
  • PCI IDs: 14e4:16d7
  • Interfaces: enp65s0f0np0 / enp65s0f1np1
  • Driver: bnxt_en
  • Firmware: 214.4.91.1 / pkg 216.0.333.11

lspci -nn | grep -i ethernet
Code:
41:00.0 Ethernet controller [0200]: Broadcom BCM57414 NetXtreme-E [14e4:16d7] (rev 01)
41:00.1 Ethernet controller [0200]: Broadcom BCM57414 NetXtreme-E [14e4:16d7] (rev 01)


ethtool -i enp65s0f0np0
Code:
driver: bnxt_en
version: 6.5.13-3-pve
firmware-version: 214.4.91.1/pkg 216.0.333.11

Problem description

After some time under load, the kernel reports:
Code:
NETDEV WATCHDOG: enp33s0np0 (bnxt_en): transmit queue 0 timed out
bnxt_en: TX timeout detected, starting reset task!
hwrm_ring_free failed
hwrm_ring_alloc failed
bnxt_init_nic err
nic open fail
vmbr0: port enp33s0np0 entered disabled state

Once this happens:
  • Network connectivity is lost
  • The interface does not recover automatically
  • Only a full reboot restores the NIC

What I already verified

  • Cable and switch ports are OK
  • Happens multiple times, not a one-time event
  • No SR-IOV enabled
  • Using standard Linux bridge (vmbr0)
  • No PCIe errors in dmesg besides the bnxt_en errors
  • Only the BCM57504 (100G NIC) is affected


Questions to the community

  1. Has anyone experienced similar issues with Broadcom BCM57504 or other NetXtreme-E cards on Proxmox 8?
  2. Is this a known bug with kernel 6.5.x and the bnxt_en driver?
  3. Would upgrading to kernel 6.8 help?
  4. Is there a recommended firmware version for this NIC on Proxmox?
  5. Are there any known driver module parameters or offload settings that improve stability?

Any feedback, patches, or workarounds would be greatly appreciated.


Thanks in advance for your help.

Regards,
Walisson
 

Attachments

  • Like
Reactions: Sunilkumar
we had a full kernel panic (eventually) on pve 9.1.x on kernel 7.0.0-3-pve due to wedged driver on a similar card, BCM57508.

topology: each node has two dual 100g port cards, and each node has two lacp bonds, one for front of house networking, one for backend (ceph, migrations, etc).

failure mode: basically you can see bnxt_en panic each cpu until it gets all the way around and then the computer hardlocked

Code:
May 05 13:49:23 proxmox7 kernel: bnxt_en 0000:11:00.0 enhe0p0: NETDEV WATCHDOG: CPU: 35: transmit queue 16 timed out 5081 ms
May 05 13:49:23 proxmox7 kernel: bnxt_en 0000:11:00.0 enhe0p0: TX timeout detected, starting reset task!
May 05 13:49:23 proxmox7 kernel: bnxt_en 0000:11:00.0 enhe0p0: [0.0]: tx{fw_ring: 1025 prod: 44f0 cons: 44f0}
May 05 13:49:23 proxmox7 kernel: bnxt_en 0000:11:00.0 enhe0p0: [0]: rx{fw_ring: 2 prod: 75b} rx_agg{fw_ring: 3 agg_prod: e204 sw_agg_prod: 204}
May 05 13:49:23 proxmox7 kernel: bnxt_en 0000:11:00.0 enhe0p0: [0]: cp{fw_ring: 0 raw_cons: 1344945}
May 05 13:49:23 proxmox7 kernel: bnxt_en 0000:11:00.0 enhe0p0: [0.0]: cp{fw_ring: 49 raw_cons: 17be4c0}
May 05 13:49:23 proxmox7 kernel: bnxt_en 0000:11:00.0 enhe0p0: [0.1]: cp{fw_ring: 33 raw_cons: a9199d}
May 05 13:49:23 proxmox7 kernel: bnxt_en 0000:11:00.0 enhe0p0: [1.0]: tx{fw_ring: 1026 prod: 6caa cons: 6caa}




info:

Code:
from dmesg:

[    2.323354] bnxt_en 0000:11:00.0 eth0: Broadcom BCM57508 NetXtreme-E 10Gb/25Gb/50Gb/100Gb/200Gb Ethernet found at mem 70490010000, node addr <mac addr>

ethtool -i enhe0p0:
driver: bnxt_en
version: 6.17.13-12-pve
firmware-version: 233.0.152.6/pkg 233.1.135.7
expansion-rom-version:
bus-info: 0000:11:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

on kernel 6.17 under proxmox 9.1.x (and previously proxmox 9.0.x under its various 6.x kernels) we've had no issues. We did blacklist the bnxt_re module after the issue but, that probably wont be enough to stop it happening again. In googling around i did find suggestion that kernel 7 has some bnxt_en jank.
 
Last edited:
  • Like
Reactions: Sunilkumar
In googling around i did find suggestion that kernel 7 has some bnxt_en jank.

Could you post your full network configuration? I've seen several reports now and usually they were using some BCM cards with bonds and VLANs, but I couldn't yet tie them together...

Code:
cat /etc/network/interfaces
 
  • Like
Reactions: Sunilkumar
/etc/network/interfaces contents:


Basic gist is: two LACPs, each containing two of the ports, each nic has one port for each of the two LACPs (thus a nic card dying entirely should be ignored)

of those two bonds:

po7 / bond0 is for use by VMs via individiual subnets in SDN, which are built atop vmbr0

the other bond1 / po17 is for ceph and any back of house proxmox comms, including migration, etc.

IPs mildly redacted

lmk if you have any other questions.

Code:
auto lo
iface lo inet loopback

auto enhe0p0
iface enhe0p0 inet manual
        mtu 9000
#Port 1B - Onboard

auto enhe0p1
iface enhe0p1 inet manual
        mtu 9000
#Port 2B - Onboard

auto enhe1p0
iface enhe1p0 inet manual
        mtu 9000
#Port 1A - PCIe

auto enhe1p1
iface enhe1p1 inet manual
        mtu 9000
#Port 2A - PCIe

iface enipmi0p0 inet manual
#unused hw bond with IPMI

auto bond0
iface bond0 inet manual
        bond-slaves enhe0p0 enhe1p0
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer3+4
        mtu 9000
#bond for usernets on po7

auto bond1
iface bond1 inet manual
        bond-slaves enhe0p1 enhe1p1
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer3+4
        mtu 9000
#bond for admin/storage on po17

auto bond1.1549
iface bond1.1549 inet manual
        mtu 9000
#storage ip interface base bond slice (ceph)

auto bond1.1508
iface bond1.1508 inet manual
        mtu 9000
#admin ip interface base bond slice (admin/migration)

auto vmbr0
iface vmbr0 inet manual
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
        mtu 9000
        bridge-disable-mac-learning 1
#base switch for usernets (used as basis for SDN individual vlans listed in pve)

auto vmbr1v1508
iface vmbr1v1508 inet static
        address 10.y.y.y/23
        gateway 10.y.y.254
        bridge-ports bond1.1508
        bridge-stp off
        bridge-fd 0
#admin interface ( migration, pve login, etc)

auto vmbr1v1549
iface vmbr1v1549 inet static
        address 10.x.x.x/24
        bridge-ports bond1.1549
        bridge-stp off
        bridge-fd 0
        mtu 9000
#storage interface (ceph)

also: yes, all 4 100g nics are the same broadcom model (but two are part of a supermicro "onboard" AOM card, and the other two are on a standard broadcom PCIe card. all 4 are running the same firmware listed in the first post.)
 
Last edited:
  • Like
Reactions: Sunilkumar