10GbE Broadcom 57810S Controller bnx2x MDC/MDIO access timeout

ulf.kosack

Renowned Member
Jan 28, 2012
49
7
73
Wachtberg
www.edvnet-uk.com
Hallo zusammen,

ich versuche gerade herauszufinden, warum meine 10G-NIC immer wieder "abstürzt". Nach einiger Zeit im läuft das Journal mit den folgenden Meldungen voll:

Code:
Sep 11 16:41:25 pve05 kernel: bnx2x 0000:10:00.0 enp16s0f0: MDC/MDIO access timeout
Sep 11 16:41:25 pve05 kernel: bnx2x 0000:10:00.0 enp16s0f0: MDC/MDIO access timeout
Sep 11 16:41:26 pve05 kernel: bnx2x: [bnx2x_acquire_hw_lock:2023(enp16s0f0)]lock_status 0xffffffff  resource_bit 0x1
Sep 11 16:41:26 pve05 kernel: bnx2x 0000:10:00.0 enp16s0f0: MDC/MDIO access timeout
Sep 11 16:41:26 pve05 kernel: bnx2x 0000:10:00.0 enp16s0f0: MDC/MDIO access timeout
Sep 11 16:41:27 pve05 kernel: bnx2x: [bnx2x_acquire_hw_lock:2023(enp16s0f0)]lock_status 0xffffffff  resource_bit 0x1
Sep 11 16:41:27 pve05 kernel: bnx2x 0000:10:00.0 enp16s0f0: MDC/MDIO access timeout
Sep 11 16:41:27 pve05 kernel: bnx2x 0000:10:00.0 enp16s0f0: MDC/MDIO access timeout
Sep 11 16:41:27 pve05 kernel: bnx2x: [bnx2x_state_wait:312(enp16s0f0)]timeout waiting for state 2
Sep 11 16:41:27 pve05 kernel: bnx2x: [bnx2x_func_stop:9129(enp16s0f0)]FUNC_STOP ramrod failed. Running a dry transaction
Sep 11 16:41:27 pve05 kernel: bnx2x: [bnx2x_igu_int_disable:902(enp16s0f0)]BUG! Proper val not read from IGU!
Sep 11 16:41:27 pve05 kernel: bnx2x: [bnx2x_func_hw_reset:6126(enp16s0f0)]Unknown reset_phase (0x0) from MCP
Sep 11 16:41:37 pve05 kernel: bnx2x: [bnx2x_fw_command:3055(enp16s0f0)]FW failed to respond!
Sep 11 16:41:37 pve05 kernel: bnx2x 0000:10:00.0 enp16s0f0: bc 7.13.75
Sep 11 16:41:37 pve05 kernel: [63B blob data]
Sep 11 16:41:37 pve05 kernel: bnx2x: [bnx2x_fw_dump_lvl:816(enp16s0f0)]Trace buffer signature is missing.
Sep 11 16:41:37 pve05 kernel: bnx2x: [bnx2x_acquire_hw_lock:2023(enp16s0f0)]lock_status 0xffffffff  resource_bit 0x800
Sep 11 16:41:37 pve05 kernel: bnx2x: [bnx2x_acquire_hw_lock:2023(enp16s0f0)]lock_status 0xffffffff  resource_bit 0x800
Sep 11 16:41:37 pve05 kernel: bnx2x: [bnx2x_acquire_hw_lock:2023(enp16s0f0)]lock_status 0xffffffff  resource_bit 0x800
Sep 11 16:41:37 pve05 kernel: bnx2x: [bnx2x_acquire_hw_lock:2023(enp16s0f0)]lock_status 0xffffffff  resource_bit 0x800
Sep 11 16:41:47 pve05 kernel: bnx2x: [bnx2x_fw_command:3055(enp16s0f0)]FW failed to respond!
Sep 11 16:41:47 pve05 kernel: bnx2x 0000:10:00.0 enp16s0f0: bc 7.13.75
Sep 11 16:41:47 pve05 kernel: [63B blob data]
Sep 11 16:41:47 pve05 kernel: bnx2x: [bnx2x_fw_dump_lvl:816(enp16s0f0)]Trace buffer signature is missing.
Sep 11 16:41:47 pve05 kernel: bnx2x: [bnx2x_nic_load_request:2343(enp16s0f0)]MCP response failure, aborting
Sep 11 16:41:47 pve05 kernel: bnx2x: [bnx2x_acquire_hw_lock:2023(enp16s0f0)]lock_status 0xffffffff  resource_bit 0x800

Karte: 57810S-10G-2S-X8
PVE: pve-manager/7.2-7/d0dd0e85 (running kernel: 5.15.53-1-pve)
Board: ASRockRack X570D4U
CPU: AMD Ryzen 7 PRO 5750G with Radeon Graphics
RAM: 128 GB ECC

Danach ist der Container, der die Netzwerkkarte nutzt, nicht mehr erreichbar. Ein Neustart des PVE-Server hat mir bisher geholfen.

Hat jemand eine Idee für mich?

Danke
Ulf
 
I fixed the issue with changing 10GbE and 1GbE Cards to Intel Chips. That works without errors.
Did you try changing the firmware on the card? I have multiple cards all seem to have different firmware versions

firmware-version: FFV14.04.11 bc 7.14.10
firmware-version: FFV08.07.26 bc 7.13.54
firmware-version: mbi 255.255.255 FFV08.07.26 bc
 
No, I doesn't. Replace the cards was the faster way for me. I could return it to the shop.

I have an old server, there I can test it. But not in the next weeks.
Did you try any other motherboards? I think it's just the ASRock X570D4U as my other server H12DSi-N6 is working fine.