Trouble with bnx2 after upgrade to pve 6 due config issue inside VM

udo

Distinguished Member
Apr 22, 2009
5,977
199
163
Ahrensburg; Germany
Hi,
I updated yesterday an two-node cluster from 5.4 to 6.0-5 (pve-no-subscription).
They are only one VM on this cluster, which are running before the upgrade (live migrate to the updatred second node and later live migrate back).
Due an configuration error, the second VM-nic was tagged in the vm-config (ok) and used vlan tagging also inside the VM (bad).
With this setting the whole host stop working after some hours...

We got every six seconds kernel traces on the host:
Code:
Aug  5 18:08:43 pve01 kernel: [ 4638.207914] bnx2 0000:01:00.1 eno2: <--- start FTQ dump --->
Aug  5 18:08:43 pve01 kernel: [ 4638.214148] bnx2 0000:01:00.1 eno2: RV2P_PFTQ_CTL 00010000
Aug  5 18:08:43 pve01 kernel: [ 4638.220035] bnx2 0000:01:00.1 eno2: RV2P_TFTQ_CTL 00020000
Aug  5 18:08:43 pve01 kernel: [ 4638.225894] bnx2 0000:01:00.1 eno2: RV2P_MFTQ_CTL 00004000
Aug  5 18:08:43 pve01 kernel: [ 4638.231744] bnx2 0000:01:00.1 eno2: TBDR_FTQ_CTL 00004000
Aug  5 18:08:43 pve01 kernel: [ 4638.237503] bnx2 0000:01:00.1 eno2: TDMA_FTQ_CTL 00010000
Aug  5 18:08:43 pve01 kernel: [ 4638.243268] bnx2 0000:01:00.1 eno2: TXP_FTQ_CTL 00010000
Aug  5 18:08:43 pve01 kernel: [ 4638.248948] bnx2 0000:01:00.1 eno2: TXP_FTQ_CTL 00010000
Aug  5 18:08:43 pve01 kernel: [ 4638.254620] bnx2 0000:01:00.1 eno2: TPAT_FTQ_CTL 00010000
Aug  5 18:08:43 pve01 kernel: [ 4638.260378] bnx2 0000:01:00.1 eno2: RXP_CFTQ_CTL 00008000
Aug  5 18:08:43 pve01 kernel: [ 4638.266134] bnx2 0000:01:00.1 eno2: RXP_FTQ_CTL 00100000
Aug  5 18:08:43 pve01 kernel: [ 4638.271804] bnx2 0000:01:00.1 eno2: COM_COMXQ_FTQ_CTL 00010000
Aug  5 18:08:43 pve01 kernel: [ 4638.277999] bnx2 0000:01:00.1 eno2: COM_COMTQ_FTQ_CTL 00020000
Aug  5 18:08:43 pve01 kernel: [ 4638.284202] bnx2 0000:01:00.1 eno2: COM_COMQ_FTQ_CTL 00010000
Aug  5 18:08:43 pve01 kernel: [ 4638.290309] bnx2 0000:01:00.1 eno2: CP_CPQ_FTQ_CTL 00004000
Aug  5 18:08:43 pve01 kernel: [ 4638.296250] bnx2 0000:01:00.1 eno2: CPU states:
Aug  5 18:08:43 pve01 kernel: [ 4638.301147] bnx2 0000:01:00.1 eno2: 045000 mode b84c state 80001000 evt_mask 500 pc 8001284 pc 8001288 instr 8e030000
Aug  5 18:08:43 pve01 kernel: [ 4638.312160] bnx2 0000:01:00.1 eno2: 085000 mode b84c state 80001000 evt_mask 500 pc 8000a5c pc 8000a50 instr 38420001
Aug  5 18:08:43 pve01 kernel: [ 4638.323160] bnx2 0000:01:00.1 eno2: 0c5000 mode b84c state 80001000 evt_mask 500 pc 8004c20 pc 8004c14 instr 10e00088
Aug  5 18:08:43 pve01 kernel: [ 4638.334166] bnx2 0000:01:00.1 eno2: 105000 mode b8cc state 80000000 evt_mask 500 pc 8000b28 pc 8000b28 instr 3c028000
Aug  5 18:08:43 pve01 kernel: [ 4638.345166] bnx2 0000:01:00.1 eno2: 145000 mode b880 state 80000000 evt_mask 500 pc 800b148 pc 800b020 instr afb30034
Aug  5 18:08:43 pve01 kernel: [ 4638.356178] bnx2 0000:01:00.1 eno2: 185000 mode b8cc state 80000000 evt_mask 500 pc 8000c6c pc 8000c6c instr 3c058000
Aug  5 18:08:43 pve01 kernel: [ 4638.367181] bnx2 0000:01:00.1 eno2: <--- end FTQ dump --->
Aug  5 18:08:43 pve01 kernel: [ 4638.373070] bnx2 0000:01:00.1 eno2: <--- start TBDC dump --->
Aug  5 18:08:43 pve01 kernel: [ 4638.379224] bnx2 0000:01:00.1 eno2: TBDC free cnt: 32
Aug  5 18:08:43 pve01 kernel: [ 4638.384680] bnx2 0000:01:00.1 eno2: LINE     CID  BIDX   CMD  VALIDS
Aug  5 18:08:43 pve01 kernel: [ 4638.391449] bnx2 0000:01:00.1 eno2: 00    001300  00a0   00    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.398040] bnx2 0000:01:00.1 eno2: 01    000800  0018   00    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.404631] bnx2 0000:01:00.1 eno2: 02    1b7f80  ffe8   ff    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.411214] bnx2 0000:01:00.1 eno2: 03    1f9780  fbf8   3f    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.417791] bnx2 0000:01:00.1 eno2: 04    173f80  dee0   bd    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.424363] bnx2 0000:01:00.1 eno2: 05    1ddf80  fff8   be    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.430923] bnx2 0000:01:00.1 eno2: 06    196580  f9f8   7f    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.437478] bnx2 0000:01:00.1 eno2: 07    1e5e80  fbd0   7f    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.444040] bnx2 0000:01:00.1 eno2: 08    0fbb80  3fb8   fd    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.450587] bnx2 0000:01:00.1 eno2: 09    1fff00  eff8   dd    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.457129] bnx2 0000:01:00.1 eno2: 0a    0ffe80  f378   9f    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.463660] bnx2 0000:01:00.1 eno2: 0b    0fff80  ff70   ff    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.470182] bnx2 0000:01:00.1 eno2: 0c    07f800  72d8   f5    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.476697] bnx2 0000:01:00.1 eno2: 0d    17be00  73f8   57    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.483203] bnx2 0000:01:00.1 eno2: 0e    0fff00  fbd8   a6    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.489698] bnx2 0000:01:00.1 eno2: 0f    0dff80  fff8   9f    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.496196] bnx2 0000:01:00.1 eno2: 10    1ff580  faf8   fb    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.502686] bnx2 0000:01:00.1 eno2: 11    1df680  fdf8   dd    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.509177] bnx2 0000:01:00.1 eno2: 12    1faf00  fd78   eb    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.515669] bnx2 0000:01:00.1 eno2: 13    1ffe80  fff8   77    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.522155] bnx2 0000:01:00.1 eno2: 14    1fff80  f9d8   c9    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.528636] bnx2 0000:01:00.1 eno2: 15    1fff80  fff8   f4    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.535113] bnx2 0000:01:00.1 eno2: 16    1b3f80  77f0   b7    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.541584] bnx2 0000:01:00.1 eno2: 17    13ef00  fff0   79    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.548070] bnx2 0000:01:00.1 eno2: 18    1b9d00  f798   ce    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.554546] bnx2 0000:01:00.1 eno2: 19    1b3b80  bfb0   ff    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.561026] bnx2 0000:01:00.1 eno2: 1a    1fbd80  f8d8   3b    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.567503] bnx2 0000:01:00.1 eno2: 1b    1fcf80  f838   f9    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.573977] bnx2 0000:01:00.1 eno2: 1c    1fdd80  dbf8   37    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.580455] bnx2 0000:01:00.1 eno2: 1d    0eff80  fd18   fc    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.586939] bnx2 0000:01:00.1 eno2: 1e    17ff80  9ff0   f7    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.593416] bnx2 0000:01:00.1 eno2: 1f    1ff780  bf68   bd    [0]
Aug  5 18:08:43 pve01 kernel: [ 4638.599896] bnx2 0000:01:00.1 eno2: <--- end TBDC dump --->
Aug  5 18:08:43 pve01 kernel: [ 4638.605771] bnx2 0000:01:00.1 eno2: DEBUG: intr_sem[0] PCI_CMD[00100406]
Aug  5 18:08:43 pve01 kernel: [ 4638.612777] bnx2 0000:01:00.1 eno2: DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088]
Aug  5 18:08:43 pve01 kernel: [ 4638.620654] bnx2 0000:01:00.1 eno2: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000]
Aug  5 18:08:43 pve01 kernel: [ 4638.629406] bnx2 0000:01:00.1 eno2: DEBUG: RPM_MGMT_PKT_CTRL[40000088]
Aug  5 18:08:43 pve01 kernel: [ 4638.636256] bnx2 0000:01:00.1 eno2: DEBUG: HC_STATS_INTERRUPT_STATUS[01fc0001]
Aug  5 18:08:43 pve01 kernel: [ 4638.643786] bnx2 0000:01:00.1 eno2: DEBUG: PBA[00000000]
Aug  5 18:08:43 pve01 kernel: [ 4638.649407] bnx2 0000:01:00.1 eno2: <--- start MCP states dump --->
Aug  5 18:08:43 pve01 kernel: [ 4638.656001] bnx2 0000:01:00.1 eno2: DEBUG: MCP_STATE_P0[0003e10e] MCP_STATE_P1[0003e10e]
Aug  5 18:08:43 pve01 kernel: [ 4638.664417] bnx2 0000:01:00.1 eno2: DEBUG: MCP mode[0000b880] state[80000000] evt_mask[00000500]
Aug  5 18:08:43 pve01 kernel: [ 4638.673537] bnx2 0000:01:00.1 eno2: DEBUG: pc[0800ae38] pc[0800ae38] instr[8c630004]
Aug  5 18:08:44 pve01 kernel: [ 4638.681618] bnx2 0000:01:00.1 eno2: DEBUG: shmem states:
Aug  5 18:08:44 pve01 kernel: [ 4638.687261] bnx2 0000:01:00.1 eno2: DEBUG: drv_mb[01030003] fw_mb[00000003] link_status[0000006f]
Aug  5 18:08:44 pve01 kernel: [ 4638.696467]  drv_pulse_mb[00001186]
Aug  5 18:08:44 pve01 kernel: [ 4638.696472] bnx2 0000:01:00.1 eno2: DEBUG: dev_info_signature[44564907] reset_type[01005254]
Aug  5 18:08:44 pve01 kernel: [ 4638.705253]  condition[0003e10e]
Aug  5 18:08:44 pve01 kernel: [ 4638.705260] bnx2 0000:01:00.1 eno2: DEBUG: 000001c0: 01005254 42530000 0003e10e 00000000
Aug  5 18:08:44 pve01 kernel: [ 4638.713699] bnx2 0000:01:00.1 eno2: DEBUG: 000003cc: 00000000 00000000 00000000 00000000
Aug  5 18:08:44 pve01 kernel: [ 4638.722136] bnx2 0000:01:00.1 eno2: DEBUG: 000003dc: 00000000 00000000 00000000 00000000
Aug  5 18:08:44 pve01 kernel: [ 4638.730569] bnx2 0000:01:00.1 eno2: DEBUG: 000003ec: 00000000 00000000 00000000 00000000
Aug  5 18:08:44 pve01 kernel: [ 4638.738998] bnx2 0000:01:00.1 eno2: DEBUG: 0x3fc[00000000]
Aug  5 18:08:44 pve01 kernel: [ 4638.744824] bnx2 0000:01:00.1 eno2: <--- end MCP states dump --->
The Node is an old Dell R410 with following driver:
Code:
01:00.1 Ethernet controller: Broadcom Limited NetXtreme II BCM5716 Gigabit Ethernet (rev 20)
        Subsystem: Dell PowerEdge R410 BCM5716 Gigabit Ethernet
        Flags: bus master, fast devsel, latency 0, IRQ 29
        Memory at dc000000 (64-bit, non-prefetchable) [size=32M]
        Capabilities: [48] Power Management version 3
        Capabilities: [50] Vital Product Data
        Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+
        Capabilities: [a0] MSI-X: Enable+ Count=9 Masked-
        Capabilities: [ac] Express Endpoint, MSI 00
        Capabilities: [100] Device Serial Number 78-2b-cb-ff-fe-5e-c7-9c
        Capabilities: [110] Advanced Error Reporting
        Capabilities: [150] Power Budgeting <?>
        Capabilities: [160] Virtual Channel
        Kernel driver in use: bnx2
        Kernel modules: bnx2
pveversion
Code:
pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.18-1-pve)
pve-manager: 6.0-5 (running version: 6.0-5/f8a710d7)
pve-kernel-5.0: 6.0-6
pve-kernel-helper: 6.0-6
pve-kernel-4.15: 5.4-7
pve-kernel-5.0.18-1-pve: 5.0.18-1
pve-kernel-4.15.18-19-pve: 4.15.18-45
pve-kernel-4.15.18-17-pve: 4.15.18-43
pve-kernel-4.15.18-16-pve: 4.15.18-41
ceph-fuse: 12.2.11+dfsg1-2.1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-3
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-6
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-61
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-4
pve-container: 3.0-5
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-6
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1
Udo
 
Thanks Udo.
So it was some kind of double vlan tag ? (on the host side, did you use linux brige vlan aware ? ovs ? )



I have see a user reporting also this bug (but I don't think it was related to vlan, don't have too much info)

options bnx2 disable_msi=1 to /etc/modprobe.d/bnx2.conf

fix the problem. (Looking at code, the log is when tx_timeout occur, but I think it's a bnx2 driver bug or something in last kernel.)

This fix working also some year ago (2.6.32), and it was also a bnx driver bug.
So it's more a workaround.
 
Thanks Udo.
So it was some kind of double vlan tag ? (on the host side, did you use linux brige vlan aware ? ovs ? )



I have see a user reporting also this bug (but I don't think it was related to vlan, don't have too much info)

options bnx2 disable_msi=1 to /etc/modprobe.d/bnx2.conf

fix the problem. (Looking at code, the log is when tx_timeout occur, but I think it's a bnx2 driver bug or something in last kernel.)

This fix working also some year ago (2.6.32), and it was also a bnx driver bug.
So it's more a workaround.
Hi Spirit,
I use OVS for network and it was an double vlan tag.

For safety I will try the bnx-option.

Udo
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!